OpenCL Post FX demo
Related articles:
- GPU Computing: GeForce and Radeon OpenCL Test (Part 1)
- GPU Computing: GeForce and Radeon OpenCL Test (Part 2)
- GPU Computing: GeForce and Radeon OpenCL Test (Part 4 and conclusion)
Second OpenCL Test: PostFX
This demo is a direct implementation of NVIDIA oclPostprocessGL demo you can find in NV’s GPU Comptuting SDK.
The PostFX demo applies a blur to the final scene.
The new thing of this demo is the use of local or global memory. GPU Caps has a command line option to enable the use of the local memory (/cl_demo_use_local_mem). By default the use of local memory is disabled because the use of local memory crashes on Radeon cards.
The local memory is a very fast access on-chip memory (also called scratchpad memory) and is magnitude faster than the global memory (which is localized on graphics memory outside the GPU). Original NVIDIA implementation uses only local memory with explicit workgroup size. I added another codepath to disable the use of local memory and enable or disable the explicit workgroup size.
All tests have been done with GPU Caps 1.8.2 PRO under Windows Vista SP2 32-bit, with R195.62 and Catalyst 9.12 hotfix.
All tests have been done with GPU Caps 1.8.2 PRO (GPU Caps Viewer 1.8.2 is also fine but there is no benchmarking support) with the following system:
– Windows Vista SP2 32-bit
– system memory: 2GB 1333 DDR3
– CPU: Intel Core 2 Extreme CPU X9650 @ 3.00GHz
– NVIDIA driver: R195.62
– AMD driver: Catalyst 9.12 hotfix
Here are the results for the post processing of a 600×600 window:
As for the surface deformer demo and with NVIDIA’s first OpenCL drivers, there was a huge gain in performance when explicit work group size in used. Now this statement is no longer true.
These graphs show an interesting fact: local memory has a huge impact on GeForce GTS 250, only a small one on GTX 280 and no visible effect on Radeon.
This post processing kernel is quite demanding and we clearly see the difference between a HD 5870 and a HD 5770. We also see that the GTX 280 dominates the test. The kernel comes from NVIDIA’s OpenCL SDK and I imagine that it has been optimized for NVIDIA hardware.
WARNING for Radeon users under Windows XP and Seven: the use of local memory crashes the demo and the VPU Recover resets the graphics driver. Vista users are not affected.
Pingback: [TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 2) - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com
Interesting series. Keep it up!
Thanks!
Last part will be posted monday…
I wonder can visualization part of the demo affect performance? Have you tried to measure only computational power of OpenCL kernels on different hardware, without any draw calls?
Catalyst 10.11 does not crash with local memory enabled on an ATI Radeon 5870.