Nvidia has officially introduced its latest GPU architecture, Blackwell, during the GTC keynote by CEO Jensen Huang. The new architecture, which succeeds the Hopper generation, is designed to significantly enhance performance and power consumption in AI infrastructure. Nvidia’s top-tier Blackwell chips, according to the company, offer an approximate 5x increase in raw FLOPS compared to their predecessors, marking a significant leap forward in GPU technology.
Performance and Specifications
The Blackwell architecture consists of three primary components: the B100, B200, and the Grace-Blackwell Superchip (GB200). These chips share the same silicon and feature two reticle-limited compute dies connected via a 10TB/s NVLink-high-bandwidth-interface fabric. This design allows them to operate as a single, more powerful accelerator. The GPUs are complemented by eight HBM3e memory stacks, providing up to 192GB of capacity and 8 TBps of bandwidth. Nvidia claims that under certain conditions, including the use of a new 4-bit floating-point data type and liquid-cooled servers, its new chip can achieve 20 petaFLOPS.
Challenges and Competition
Despite Nvidia’s advancements, the company faces stiff competition from AMD and Intel, who have also been making strides in AI accelerator technology. AMD’s MI300-series accelerators and Intel’s GPU Max parts utilize complex chiplet designs, showcasing the intense competition in the sector. Moreover, the increasing power and thermal demands of AI datacenters pose challenges, with Nvidia’s latest GPU potentially operating between 700W and 1,200W depending on the model and cooling method used.
Nvidia’s Grace-Blackwell Superchip combines a 72-core Arm CPU with two Blackwell GPUs, aiming for peak performance in AI workloads. The company’s rack-scale systems, designed for large-scale AI deployments, demonstrate the potential for significant advancements in AI model training and inference capabilities.
Last Updated on November 7, 2024 9:39 pm CET