NVIDIA Kepler GK110 Architecture Whitepaper: 2880 CUDA Cores and Compute Capability 3.5

NVIDIA Kepler GK110 die

NVIDIA has published (GTC 2012) a whitepaper that details some computing aspects of the upcoming high-end Kepler GPU, the GK110. This GPU is clearly focused on computing with its 7.1 billion transistors, 15 SMX, 2880 CUDA cores (192 CUDA cores per SMX) and 240 texture units (16 TU per SMX).

NVIDIA Kepler GK110 die

The GK110 GPU supports the new CUDA Compute Capability 3.5:

NVIDIA Kepler GK110 die

Among the new features available with the GK110, GPUDirect sounds really interesting:

When working with a large amount of data, increasing the data throughput and reducing latency is vital to increasing compute performance. Kepler GK110 supports the RDMA feature in NVIDIA GPUDirect, which is designed to improve performance by allowing direct access to GPU memory by third‐party devices such as IB adapters, NICs, and SSDs. When using CUDA 5.0, GPUDirect provides the following important features:

  • Direct memory access (DMA) between NIC and GPU without the need for CPU side data buffering.
  • Significantly improved MPISend/MPIRecv efficiency between GPU and other nodes in a network.
  • Eliminates CPU bandwidth and latency bottlenecks
  • Works with variety of 3rd party network, capture, and storage devices

NVIDIA Kepler GK110 GPUDirect technology

You can download the GK110 whitepaper HERE.

15 thoughts on “NVIDIA Kepler GK110 Architecture Whitepaper: 2880 CUDA Cores and Compute Capability 3.5”

  1. Promilus

    Nice. Yet another possible HPC bottleneck eliminated. More and more gpu clusters in top500, more and more of them tesla based 😉

  2. sqrt[-1]

    I think the ability to launch a GPU kernel from a GPU kernel is the most interesting feature.

  3. Kepler

    Ra, you obviously can’t read, they’re talking about GK104 in that article.

    GK110 is the real compute GPU.

  4. Promilus

    No, actually it is based on very same idea of SMX (so 3x number of ALUs but half speed and more software schedule & dispatch than hw). It should beat Tahiti by raw power but it’s not as efficient as Fermi or Tahiti (so needs a lot more transistors (SP) to achieve same performance). Still – AMD hasn’t show us SINGLE compute specific card while NV has already paved the way with their Teslas

  5. samsi

    This is the craziest thing I’ve ever see in my life. 7.1 billion transistors!

    r.i.p. AMD

  6. Promilus

    LuxRender is GPGPU too, tahiti doesn’t sucks there. Doesn’t sucks in AES Encrypt/Decrypt tools either. Doesn’t in winzip16.5, doesn’t in SL3 brute force. Doesn’t in milkyway, poem etc. etc.

  7. DrBalthar

    Unmanufactuable halo product (nVidia already has problems with GK104 which is smaller than AMD 7xxx series) which will only ship to a few halo customers and then will be forgotten don’t expect to be able to buy it anywhere

Comments are closed.