
|
Particle billboarding is one of the important steps of particle rendering. Billboarding can be achieved by using point sprites (the billboarding is automatically done by the GPU) or geometry shaders as explained in this article. Both methods produce similar visual results. But which method is the faster if we have to render a lot of particles (1 million or more)?
To bring an element of response, I prepared a demo that performs the rendering of 1’000’000 particle. Each particle is animated in a simple way in the vertex shader. The demo is available in two versions: the first one uses point sprites, the second one using geometry shaders. |
The GLSL Hacker demo is available in the host_api/Particle_PointSprite_vs_GS_Billboarded_Quads/ folder of the code sample pack. It’s recommended to use the latest GLSL Hacker 0.7.0.3 (I fixed a small bug related to fullscreen in this version).
Testbed:
– CPU: Intel Core i5-4670K @ 3.4GHz
– Mobo: GIGABYTE G1.Sniper M5
– Memory: G-Skill 16GB DDR3 1600MHz
– Windows 8 64-bit
– Catalyst 14.4 (for Radeon cards)
– R340.65 (for GeForce cards)
– GLSL Hacker 0.7.0.3
– FRAPS for displaying the framerate in fullscreen mode.
Settings: particules: 1’000’000, resolution: 1920×1080 fullscreen.
Point Sprite | Geometry Shader | Difference | |
Radeon HD 7970 | 147 FPS | 105 FPS | -28% |
Radeon HD 6970 | 98 FPS | 65 FPS | -33% |
Radeon HD 5870 | 85 FPS | 59 FPS | -30% |
GeForce GTX 780 | 295 FPS | 279 FPS | -5% |
GeForce GTX 750 | 203 FPS | 122 FPS | -39% |
GeForce GTX 680 | 72 FPS | 71 FPS | -1% |
Quick analysis: particle rendering with geometry shader is around 30% slower than with point sprites on Radeon GPUs. I was a bit surprised by this result because Radeon GPUs have a special hardware support in the geometry shader to make the transformation of a vertex into 4 vertices more efficient (see details in the Radeon HD 2000 Programming Guide, page 9). I asked to AMD OpenGL guru about this special support and actually there’s no mention in any internal documentation of that special hardware support for the 1:4 amplification case in geometry shader. This drop in performance is absolutely normal when comparing point sprites versus GS billboarded quads.
With NVIDIA hardware, we can distinguish two types of GPUs: a first type where the difference between PS and GS is very small (GTX 780, GTX 680) and a second type (GTX 750) where the difference is similar to the one observed on Radeon GPUs.
Conclusion: it’s not a surprise, point sprite is the fastest way to render a lot of particles on both AMD and NVIDIA GPUs. And since OpenGL 3, point sprite is the default point rendering mode.
A french version of this article is available HERE.

1 million particle
Here are some numbers from a stck GTX580:
Sprites — 218 FPS;
Geometry Shader — 212 FPS;
2.7% difference.
By the way, isn’t the primary limitation on GS performance the output buffering? This should have been a solved problem a while ago. AMD had relatively good GS performance even back then with R600, while NV’s hardware was struggling until Fermi introduced a totally reworked memory pipeline.
^^^
Correct results for 1920*1080 mode:
Sprites — 134 FPS;
Geometry Shader — 132 FPS;
Why didn’t you test fixed instancing? I have read in the GCN performance tweets that for radeon cards instancing is to prefer for fixed geometry expansion. So, why not test it 😀 ?
@fellix: Has your GTX 580 reference or overclock edition?? What’s GFX driver, OS and CPU?
I got at 1080p+fullscreen running Win 7 x64 SP1 with i5-2500K@4.5GHz and GTX 580@reference-stock (340.52WHQL (HQ)):
Point Sprites — 110 FPS
Geometry Shader — 108 FPS
nuninho1980,
Mine is actually a EVGA OC model.
The system is Win8 x64, Core i7-920, v337.88WHQL driver.
@Ziple: I’ll try to do an instancing test as soon as possible.
I have new EVGA GTX 780 Ti Classified after GTX 580 3GB (freeze – RMA to GTX 770SC)! 🙂
At 1080p running Win 8.1 x64 with 347.71beta (HQ)
-Point Sprite (PS): 301 fps
-Geometry Shader (GS):296 fps
-Geometry Instancing (GI): 110 fps