At the recent GTC conference, NVIDIA announced their next generation processors for AI computing, the H100 GPU and the Grace CPU Superchip. Based on NVIDIA's Hopper architecture, the H100 includes a Transformer Engine for faster training of AI models. The Grace CPU Superchip features 144 Arm cores and outperforms NVIDIA's current dual-CPU offering on the SPECrate 2017_int_base benchmark.
NVIDIA founder and CEO Jensen Huang made the announcement during his keynote presentation. The Hopper architecture for accelerating AI training includes several innovative features, including faster Tensor Cores with increased floating-point operations per second (FLOPS) as well as NVIDIA's Confidential Computing technology for increased security and privacy. The H100 GPU, built on this architecture, is the first GPU to support PCI Express Gen 5 (PCIe 5) and HBM3. The Grace CPU Superchip is a single-socket package that contains two CPU chips that are connected via NVIDIA's high-speed NVLink-C2C technology. Huang's keynote positioned NVIDIA's new chips as "the engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven businesses."
The Transformer deep-learning model is a common choice for many AI tasks, especially large language models such as GPT-3. Training these models requires massive datasets and many days, if not weeks, of computation. The H100 GPU includes a Transformer Engine, which can dynamically mix 8-bit (FP8) and 16-bit (FP16) floating point arithmetic. By operating at lower precision, and supporting increased overall FLOPS, the H100 can achieve an order-of-magnitude speedup compared to previous generation Ampere GPUs. Overall, NVIDIA claims that training a 175B parameter GPT-3 model could be sped up 6x, and up to 9x for 395B parameter mixture of experts model: reduced from 7 days to 20 hours.
The chip also contains new dynamic programming instructions (DPX) which can speed up dynamic programming algorithms by up to 7x compared to Ampere, providing increased performance in applications such as routing optimizations and protein folding. To support multi-tenant operation in a cloud environment, the H100 includes Secure Multi-Instance GPU (MIG) and Confidential Computing technologies, which allow it to be partitioned into as many as seven virtual GPUs while maintaining tenant data privacy.
The Grace CPU Superchip is the next iteration of the Grace Hopper Superchip announced last year, which combines a Grace CPU with a Hopper-based GPU in a single chip. The new chip combines two Grace CPUs, which are connected using NVIDIA's NVLink-C2C interconnect. Each CPU is based on the Arm v9 architecture, features 1TB/s memory bandwidth, and consumes only 500W of power. The chip supports all NVIDIA software stacks, including Omniverse, NVIDIA AI, and NVIDIA HPC. Using NVIDIA's ConnectX-7 NICs, the chip can support up to eight external Hopper-based GPUs.
Several users commented on the announcement in a thread on Hacker News. One noted:
NVIDIA continues to vertically integrate their datacenter offerings. They bought Mellanox to get InfiniBand. They tried to buy ARM - that didn't work. But they're building & bundling CPUs anyway. I guess when you're so far ahead on the compute side, it's all the peripherals that hold you back, so they're putting together a complete solution.
NVIDIA's GPUs are a popular choice for accelerating AI workloads. Earlier this year, InfoQ reported on the latest MLPerf benchmarks, where NVIDIA posted the best results on seven of eight tasks.