Nvidia unveils H100 Hopper compute GPU and Grace superchip architectures
Nvidia’s Hopper H100 AI and HPC GPUs have just been unveiled at GDC together with the Grace superchips. As always, the compute GPU models are highly scalable, and Nvidia offers various multi-GPU solutions to suit different data center needs. A single H100 Tensor Core GPU, on the other hand, comes with significant improvements over the 2020 Ampere A100 models, especially in the FP operations department.
First of all the H100 GPU is fabricated on TSMC’s 4 nm nodes and has an 814 mm² die size (14 mm² smaller than the A100). This model is Nvidia’s first to feature PCIe 5.0 compatibility and there is also an SXM form-factor that is faster. The GPU itself comprises no less than 80 billion transistors, which represents an increase of almost 50% over Ampere. It also features 132 GPU clusters with 16896 CUDA cores for the SXM standard and 14592 CUDA cores for the PCIe 5.0 standard, more than doubling the count from the previous generation.
L2 cache is upped from 40 MB to 50 MB, yet the memory bus remains the same at 5120-bit, and memory capacity is set to 80 GB HBM3 with 3 or 2 TB/s bandwidth depending on form-factor. The SXM version features 528 Tensor cores and requires 700 W, while the PCIe version only has 456 Tensor cores and is limited to 350 W TGP-wise. Nvidia claims that 20x H100 GPUs can sustain the equivalent of the entire world’s Internet traffic, but the new architecture can be scaled up to hundreds and even thousands of DGX clusters that will be used in future supercomputers.
As far as FP performance is concerned, the H100 GPU can process 4 PFLOPS of FP8 (6x over A100), 2 PFLOPS of FP16 (3x over A100), 1 PFLOPS of TF32 (3x over A100) and 60 TFLOPS of FP64/FP32 (3x over A100). Similar improvements can be obtained with Tensor Core FP calculations.
Nvidia is also planning to release a Grace Hopper superchip module that combines an H100 GPU and a Grace CPU connected together with a 900 GB/s NVLink. Similarly, there will also be a Grace superchip that combines two grace CPUs offering 144 ARM cores, 1 TB/s LPDDR5x memory bandwidth and 396 MB on-chip cache. This Grace superchip can be paired with up to eight Hopper GPUs, but such configurations will not be available sooner than Q3 2023. Nvidia’s Hopper GPUs, however, will start shipping in Q3 2022.