NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.
Jensen Huang, NVIDIA's founder and CEO, made the announcement at the company's GTC AI conference. The Blackwell architecture starts with two GPU dies that work together as a single unit that can support 20 petaFLOPS, "the highest compute ever on a single chip." The chip includes a new generation of NVIDIA'S Transformer Engine and new numerical precisions, both of which contribute to improved LLM performance. Blackwell also supports a Trusted Execution Environment (TEE), the first GPU to do so, to provide protection for sensitive data. NVIDIA also announced several new compute systems based on Blackwell, including the GB200 Grace Blackwell Superchip, which combines two Blackwell cores with a Grace CPU, and the GB200 NVL72 compute cluster, which includes 36 of the GB200 superchips and delivers 1.4 exaflops of compute. Huang pointed out that large models and datasets need more compute acceleration for training:
We need even larger models. We’re going to train [them with multimodality data, not just text on the internet: we’re going to train it on texts and images, graphs and charts...All of that is going to increase the size of our models, it's going to increase the amount of [training data], and we're going to have to build even bigger GPUs.
As with earlier generations of their hardware, NVIDIA named the Blackwell architecture for a pioneer in science and technology; in this case, David Harold Blackwell, a mathematician who made contributions to game theory, statistics, and probability theory. In 2022, InfoQ covered the announcement of NVIDIA's previous architecture, Hopper, and the Grace superchip that combined a Hopper-based GPU with a CPU in a single unit.
The Blackwell architecture includes several "revolutionary" features. Besides the updated Transformer engine and TEE support, these include Fifth-Generation NVLink, a RAS Engine, and a Decompression Engine. NVLink is NVIDIA's interchip communication protocol; the latest generation supports 1.8TB/s throughput among up to 576 GPUs. The RAS Engine (Reliability, Availability, and Serviceability) improves diagnostic capabilities to help identify hardware faults. The Decompression Engine supports a variety of formats, including Snappy, Deflate, and LZ4, which speeds up the performance of several database technologies, including Apache Spark.
In a discussion on Hacker News about Blackwell, one user wrote:
My take from being at the keynote and the content I've seen so far at the conference is that Nvidia's is moving up the stack (like all good hardware vendors are prone to do). Obviously they are going to keep doing bigger [things]. But the takeaway for me is that they are building "docker for llms" - NIM. They are building a container system where you can download/buy(?) NIMs and easily deploy them on their hardware. Going to be fun to watch what this does to all the AI startups.
NVIDIA also announced that the Blackwell architecture will be used in several applications, including their DRIVE Thor compute platform for autonomous vehicles, and in Jetson Thor, a system-on-a-chip designed for use in robotics, which will power AI models developed by the company's humanoid robotics project GR00T.