(Test) AMD Catalyst 8.921.2 RC11 for Radeon HD 7900, Big Performance Boost in OpenGL Tessellation (*** Updated ***)

TessMark, OpenGL Tessellation Benchmark

AMD has released a new release candidate of the upcoming WHQL driver (Cat 12.01). This new RC11 brings important performance gains in OpenGL tessellation, especially when high level of tessellation are required. According to the release notes, there is a big boost in TessMark 0.3.0 with insane level (tess factor of 64X):

Performance highlights of the 8.921.2 RC11 AMD Radeon™ HD 7900 driver
8% (up to) performance improvement in Aliens vs. Predator
15% (up to) performance improvement in Battleforge with Anti-Aliasing enabled
3% (up to) performance improvement in Battlefield 3
3% (up to) performance improvement in Crysis 2
6% (up to) performance improvement in Crysis Warhead
10% (up to) performance improvement in F1 2010
5% (up to) performance improvement in Unigine with Anti-Aliasing enabled
250% (up to) performance improvement in TessMark (OpenGL) when set to “insane” levels

I don’t have a HD 7970 yet, only a HD 6970. So let’s see if the performance boost is also visible on a HD 6900. Here is the comparison between Catalyst 11.6 (the last Catalyst that brought important gains in OpenGL tessellation, see HERE) and Catalyst 8.921.2 RC11:

TessMark settings: map set 1, 1920×1080 fullscreen, 60 seconds, no AA, no postfx:

– tess level: moderate (X8) – Gain: -1.6%

Cat11.6: 44090 points, 735 FPS – SAPPHIRE Radeon HD 6970
Cat 8.921.2 RC11: 43364 points, 723 FPS – SAPPHIRE Radeon HD 6970

– tess level: normal (X16) – Gain: +0.4%

Cat11.6: 19398 points, 323 FPS – SAPPHIRE Radeon HD 6970
Cat 8.921.2 RC11: 19480 points, 325 FPS – SAPPHIRE Radeon HD 6970

– tess level: Extreme (X32) – Gain: +27.5%

Cat11.6: 3397 points, 57 FPS – SAPPHIRE Radeon HD 6970
Cat 8.921.2 RC11: 4334 points, 72 FPS – SAPPHIRE Radeon HD 6970

– tess level: Extreme (X64) – Gain: +80.6%

Cat11.6: 594 points, 10 FPS – SAPPHIRE Radeon HD 6970
Cat 8.921.2 RC11: 1073 points, 18 FPS – SAPPHIRE Radeon HD 6970
Cat 8.921.2 RC11: 560 points, 10 FPS – SAPPHIRE Radeon HD 6970, TessMark renamed (toto.exe)

Indeed, there is a huge performance boost with high level of tessellation (X32 and X64) while scores for regular levels of tessellation (X8 and X16) remain the same. I hope I could test a HD 7970 shortly…


UPDATE (2012.01.24): I received some explanations from AMD: the performance improvement in tessellation is generic for applications with large tessellation factors (read greater than 16, the max being 64) but is not necessarily optimal for more reasonnable settings (like X8). That’s why AMD decided to enable the performance improvement on a per application basis.

I added a score with TessMark renamed in toto.exe to show the difference between the performance improvement enabled and disabled.


You can download Catalyst 8.921.2 RC11 for Radeon HD 7900 HERE.

Catalyst 8.921.2 RC11 is an OpenGL 4.2 driver with 233 OpenGL extension:

– Drivers Version: 8.921.2.0 – Catalyst 11.12 (1-19-2012)
– ATI Catalyst Release Version String: 8.921.2-120119a-132101E-ATI
OpenGL Version: 4.2.11338 Compatibility Profile/Debug Context
– OpenGL Extensions: 233 extensions (GL=212 and WGL=21)

GPU Caps Viewer
GPU Caps Viewer 1.14.6


Compared to Catalyst 11.10 preview 3,
there is one new extension:




Here is the complete list of all OpenGL extensions exposed for a Radeon HD 6970 (Win7 64-bit):

  • GL_AMDX_debug_output
  • GL_AMDX_vertex_shader_tessellator
  • GL_AMD_blend_minmax_factor
  • GL_AMD_conservative_depth
  • GL_AMD_debug_output
  • GL_AMD_depth_clamp_separate
  • GL_AMD_draw_buffers_blend
  • GL_AMD_multi_draw_indirect
  • GL_AMD_name_gen_delete
  • GL_AMD_performance_monitor
  • GL_AMD_pinned_memory
  • GL_AMD_sample_positions
  • GL_AMD_seamless_cubemap_per_texture
  • GL_AMD_shader_stencil_export
  • GL_AMD_shader_trace
  • GL_AMD_texture_cube_map_array
  • GL_AMD_texture_texture4
  • GL_AMD_transform_feedback3_lines_triangles
  • GL_AMD_vertex_shader_tessellator
  • GL_ARB_ES2_compatibility
  • GL_ARB_base_instance
  • GL_ARB_blend_func_extended
  • GL_ARB_color_buffer_float
  • GL_ARB_compressed_texture_pixel_storage
  • GL_ARB_conservative_depth
  • GL_ARB_copy_buffer
  • GL_ARB_debug_output
  • GL_ARB_depth_buffer_float
  • GL_ARB_depth_clamp
  • GL_ARB_depth_texture
  • GL_ARB_draw_buffers
  • GL_ARB_draw_buffers_blend
  • GL_ARB_draw_elements_base_vertex
  • GL_ARB_draw_indirect
  • GL_ARB_draw_instanced
  • GL_ARB_explicit_attrib_location
  • GL_ARB_fragment_coord_conventions
  • GL_ARB_fragment_program
  • GL_ARB_fragment_program_shadow
  • GL_ARB_fragment_shader
  • GL_ARB_framebuffer_object
  • GL_ARB_framebuffer_sRGB
  • GL_ARB_geometry_shader4
  • GL_ARB_get_program_binary
  • GL_ARB_gpu_shader5
  • GL_ARB_gpu_shader_fp64
  • GL_ARB_half_float_pixel
  • GL_ARB_half_float_vertex
  • GL_ARB_imaging
  • GL_ARB_instanced_arrays
  • GL_ARB_internalformat_query
  • GL_ARB_map_buffer_alignment
  • GL_ARB_map_buffer_range
  • GL_ARB_multisample
  • GL_ARB_multitexture
  • GL_ARB_occlusion_query
  • GL_ARB_occlusion_query2
  • GL_ARB_pixel_buffer_object
  • GL_ARB_point_parameters
  • GL_ARB_point_sprite
  • GL_ARB_provoking_vertex
  • GL_ARB_sample_shading
  • GL_ARB_sampler_objects
  • GL_ARB_seamless_cube_map
  • GL_ARB_separate_shader_objects
  • GL_ARB_shader_atomic_counters
  • GL_ARB_shader_bit_encoding
  • GL_ARB_shader_image_load_store
  • GL_ARB_shader_objects
  • GL_ARB_shader_precision
  • GL_ARB_shader_stencil_export
  • GL_ARB_shader_subroutine
  • GL_ARB_shader_texture_lod
  • GL_ARB_shading_language_100
  • GL_ARB_shading_language_420pack
  • GL_ARB_shading_language_packing
  • GL_ARB_shadow
  • GL_ARB_shadow_ambient
  • GL_ARB_sync
  • GL_ARB_tessellation_shader
  • GL_ARB_texture_border_clamp
  • GL_ARB_texture_buffer_object
  • GL_ARB_texture_buffer_object_rgb32
  • GL_ARB_texture_compression
  • GL_ARB_texture_compression_bptc
  • GL_ARB_texture_compression_rgtc
  • GL_ARB_texture_cube_map
  • GL_ARB_texture_cube_map_array
  • GL_ARB_texture_env_add
  • GL_ARB_texture_env_combine
  • GL_ARB_texture_env_crossbar
  • GL_ARB_texture_env_dot3
  • GL_ARB_texture_float
  • GL_ARB_texture_gather
  • GL_ARB_texture_mirrored_repeat
  • GL_ARB_texture_multisample
  • GL_ARB_texture_non_power_of_two
  • GL_ARB_texture_query_lod
  • GL_ARB_texture_rectangle
  • GL_ARB_texture_rg
  • GL_ARB_texture_rgb10_a2ui
  • GL_ARB_texture_snorm
  • GL_ARB_texture_storage
  • GL_ARB_timer_query
  • GL_ARB_transform_feedback2
  • GL_ARB_transform_feedback3
  • GL_ARB_transform_feedback_instanced
  • GL_ARB_transpose_matrix
  • GL_ARB_uniform_buffer_object
  • GL_ARB_vertex_array_bgra
  • GL_ARB_vertex_array_object
  • GL_ARB_vertex_attrib_64bit
  • GL_ARB_vertex_buffer_object
  • GL_ARB_vertex_program
  • GL_ARB_vertex_shader
  • GL_ARB_vertex_type_2_10_10_10_rev
  • GL_ARB_viewport_array
  • GL_ARB_window_pos
  • GL_ATI_draw_buffers
  • GL_ATI_envmap_bumpmap
  • GL_ATI_fragment_shader
  • GL_ATI_meminfo
  • GL_ATI_separate_stencil
  • GL_ATI_texture_compression_3dc
  • GL_ATI_texture_env_combine3
  • GL_ATI_texture_float
  • GL_ATI_texture_mirror_once
  • GL_EXT_abgr
  • GL_EXT_bgra
  • GL_EXT_bindable_uniform
  • GL_EXT_blend_color
  • GL_EXT_blend_equation_separate
  • GL_EXT_blend_func_separate
  • GL_EXT_blend_minmax
  • GL_EXT_blend_subtract
  • GL_EXT_compiled_vertex_array
  • GL_EXT_copy_buffer
  • GL_EXT_copy_texture
  • GL_EXT_direct_state_access
  • GL_EXT_draw_buffers2
  • GL_EXT_draw_instanced
  • GL_EXT_draw_range_elements
  • GL_EXT_fog_coord
  • GL_EXT_framebuffer_blit
  • GL_EXT_framebuffer_multisample
  • GL_EXT_framebuffer_object
  • GL_EXT_framebuffer_sRGB
  • GL_EXT_geometry_shader4
  • GL_EXT_gpu_program_parameters
  • GL_EXT_gpu_shader4
  • GL_EXT_histogram
  • GL_EXT_multi_draw_arrays
  • GL_EXT_packed_depth_stencil
  • GL_EXT_packed_float
  • GL_EXT_packed_pixels
  • GL_EXT_pixel_buffer_object
  • GL_EXT_point_parameters
  • GL_EXT_provoking_vertex
  • GL_EXT_rescale_normal
  • GL_EXT_secondary_color
  • GL_EXT_separate_specular_color
  • GL_EXT_shader_image_load_store
  • GL_EXT_shadow_funcs
  • GL_EXT_stencil_wrap
  • GL_EXT_subtexture
  • GL_EXT_texgen_reflection
  • GL_EXT_texture3D
  • GL_EXT_texture_array
  • GL_EXT_texture_buffer_object
  • GL_EXT_texture_compression_bptc
  • GL_EXT_texture_compression_latc
  • GL_EXT_texture_compression_rgtc
  • GL_EXT_texture_compression_s3tc
  • GL_EXT_texture_cube_map
  • GL_EXT_texture_edge_clamp
  • GL_EXT_texture_env_add
  • GL_EXT_texture_env_combine
  • GL_EXT_texture_env_dot3
  • GL_EXT_texture_filter_anisotropic
  • GL_EXT_texture_integer
  • GL_EXT_texture_lod
  • GL_EXT_texture_lod_bias
  • GL_EXT_texture_mirror_clamp
  • GL_EXT_texture_object
  • GL_EXT_texture_rectangle
  • GL_EXT_texture_sRGB
  • GL_EXT_texture_shared_exponent
  • GL_EXT_texture_snorm
  • GL_EXT_texture_storage
  • GL_EXT_texture_swizzle
  • GL_EXT_timer_query
  • GL_EXT_transform_feedback
  • GL_EXT_vertex_array
  • GL_EXT_vertex_array_bgra
  • GL_EXT_vertex_attrib_64bit
  • GL_IBM_texture_mirrored_repeat
  • GL_KTX_buffer_region
  • GL_NV_blend_square
  • GL_NV_conditional_render
  • GL_NV_copy_depth_to_color
  • GL_NV_copy_image
  • GL_NV_explicit_multisample
  • GL_NV_float_buffer
  • GL_NV_half_float
  • GL_NV_primitive_restart
  • GL_NV_texgen_reflection
  • GL_NV_texture_barrier
  • GL_SGIS_generate_mipmap
  • GL_SGIS_texture_edge_clamp
  • GL_SGIS_texture_lod
  • GL_SUN_multi_draw_arrays
  • GL_WIN_swap_hint
  • WGL_EXT_swap_control
  • WGL_ARB_extensions_string
  • WGL_ARB_pixel_format
  • WGL_ATI_pixel_format_float
  • WGL_ARB_pixel_format_float
  • WGL_ARB_multisample
  • WGL_EXT_swap_control_tear
  • WGL_ARB_pbuffer
  • WGL_ARB_render_texture
  • WGL_ARB_make_current_read
  • WGL_EXT_extensions_string
  • WGL_ARB_buffer_region
  • WGL_EXT_framebuffer_sRGB
  • WGL_ATI_render_texture_rectangle
  • WGL_EXT_pixel_format_packed_float
  • WGL_I3D_genlock
  • WGL_NV_swap_group
  • WGL_ARB_create_context
  • WGL_AMD_gpu_association
  • WGL_AMDX_gpu_association
  • WGL_ARB_create_context_profile

Source: Geeks3D forum

10 thoughts on “(Test) AMD Catalyst 8.921.2 RC11 for Radeon HD 7900, Big Performance Boost in OpenGL Tessellation (*** Updated ***)”

  1. Promilus

    The real question is… does it provide exactly the same quality as nv & radeons on older drivers or it’s just another buggy driver that gets it’s speedup because of rendering artifacts or some tess “optimizations”

  2. fellix

    Rename TessMark EXE and see what happens. 😉

    The optimization is application specific profiling in the driver.

  3. Hippo

    renamed tessmark and got:
    (32x): 2516 (avg fps 42)
    (64x):639 (avg fps 10)

    tessmark.exe:
    (32x): 4100 (avg fps 68)
    (64x): 739 (avg fps 13)

    Radeon 5850 @936mhz core using 8.95 drivers and CAP 11.12 #3

  4. Leith Bade

    I wonder what they did?

    Possibly removed some dead code/unneeded variables?

    JeGX, what happens if you tweak the shader code slightly so the hash changes, do the performance boosts remain?

    It is a bit unfortunate if they are application specific tweaks, just means that they are expecting NVIDIA’s new hardware to be a lot faster.

  5. Lokavidu

    isn’t it kinda weak to optimize drivers to an app according to its name? why dont they use at least some sort of checksums.. anyway why dont they optimize the drivers for all opengl tesselation apps.. probably a bit more effort would be paid back by some real overall impact on performace for all radeons

    strange strange.. IM quite curious whats behind all of this

  6. jK

    “the performance improvement in tessellation is generic for applications with large tessellation factors (read greater than 16, the max being 64) but is not necessarily optimal for more reasonnable settings (like X8)”
    Shouldn’t they enable it then on a tessellation factor basis?
    And if the switching for each draw command is too high, they should/could perhaps accumulate the average tess. factor of the running app and use that instead. Binary name hacks are so ’90s.

  7. Michal

    @jK
    Tessellation factor can be computed dynamically in the shader. Every patch can have different tessellation factor. What you suggest would kill any performance benefit.

  8. jK

    @Micheal:
    glPatchParameterfv(PATCH_DEFAULT_INNER_LEVEL, foo);
    glPatchParameterfv(PATCH_DEFAULT_OUTER_LEVEL, bar);

    Sure the shader can override it, but everything is better than a binary name detection.

  9. Michal

    @jK
    Yes I know about those commands but your solution just wouldn’t work. In many cases it would decrease performance because default tessellation factor would be completely different than the real one set by the shader. I agree that binary name detection is ugly hack but at least it works.

  10. jK

    Yeah, I expected a c-interface command to define the max tessellation a shader can set, but only found that :/

    Still the GPU (& its driver) should know the size of the current tessellation buffer and how often the buffer is too small to hold additional vertices (I assume this is what they optimized), cause it needs handle this case either by resizing the buffer or by pushing the current content further down the shader queue to make free space for the remaining vertices. In both cases I would assume that somewhere in the silicon this must either touch software/microcodes or it sets performance counters. And so the gfx drivers should be able to detect at runtime if it’s advisable to increase the default tessellation buffer size for the current program.

    Still those are just assumptions, never the less it’s already hard enough to find the `fastpath` with current gfx drivers, and such things don’t make it easier. Neither do I think things like this makes maintaining the driver’s codebase easier, or do they run an artificial neural network or a genetic algorithm on their settings to find the optimum for a new app?

    Don’t understand me wrong I don’t blame anyone, it’s just small part from an (non-end-)user’s perspective, who want that the driver works w/o support by the vendor, not less.

Comments are closed.