[TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 3)

OpenCL - GPU Caps Viewer - NVIDIA - AMD

OpenCL Post FX demo
OpenCL Post FX demo


Related articles:

Second OpenCL Test: PostFX

This demo is a direct implementation of NVIDIA oclPostprocessGL demo you can find in NV’s GPU Comptuting SDK.

The PostFX demo applies a blur to the final scene.

The new thing of this demo is the use of local or global memory. GPU Caps has a command line option to enable the use of the local memory (/cl_demo_use_local_mem). By default the use of local memory is disabled because the use of local memory crashes on Radeon cards.

The local memory is a very fast access on-chip memory (also called scratchpad memory) and is magnitude faster than the global memory (which is localized on graphics memory outside the GPU). Original NVIDIA implementation uses only local memory with explicit workgroup size. I added another codepath to disable the use of local memory and enable or disable the explicit workgroup size.

All tests have been done with GPU Caps 1.8.2 PRO under Windows Vista SP2 32-bit, with R195.62 and Catalyst 9.12 hotfix.

All tests have been done with GPU Caps 1.8.2 PRO (GPU Caps Viewer 1.8.2 is also fine but there is no benchmarking support) with the following system:
– Windows Vista SP2 32-bit
– system memory: 2GB 1333 DDR3
– CPU: Intel Core 2 Extreme CPU X9650 @ 3.00GHz
– NVIDIA driver: R195.62
– AMD driver: Catalyst 9.12 hotfix

Here are the results for the post processing of a 600×600 window:

OpenCL Post FX demo - global memory

OpenCL Post FX demo - local memory

As for the surface deformer demo and with NVIDIA’s first OpenCL drivers, there was a huge gain in performance when explicit work group size in used. Now this statement is no longer true.

These graphs show an interesting fact: local memory has a huge impact on GeForce GTS 250, only a small one on GTX 280 and no visible effect on Radeon.

This post processing kernel is quite demanding and we clearly see the difference between a HD 5870 and a HD 5770. We also see that the GTX 280 dominates the test. The kernel comes from NVIDIA’s OpenCL SDK and I imagine that it has been optimized for NVIDIA hardware.

WARNING for Radeon users under Windows XP and Seven: the use of local memory crashes the demo and the VPU Recover resets the graphics driver. Vista users are not affected.

OpenCL - VPU recover on Radeon

5 thoughts on “[TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 3)”

  1. Pingback: [TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 2) - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

  2. RevEn

    I wonder can visualization part of the demo affect performance? Have you tried to measure only computational power of OpenCL kernels on different hardware, without any draw calls?

  3. Clay

    Catalyst 10.11 does not crash with local memory enabled on an ATI Radeon 5870.

Comments are closed.