NVIDIA has introduced its latest advancements in artificial intelligence (AI) hardware at GTC 2025, unveiling the Blackwell Ultra GB300 and Vera Rubin superchips. These developments aim to enhance AI capabilities across various sectors.
Blackwell Ultra GB300: Enhanced Performance
The Blackwell Ultra GB300, set to ship in the latter half of 2025, represents a significant upgrade over its predecessors. The Blackwell Ultra GB300 superchip offers improved computing power and memory bandwidth, addressing the increasing demands of AI applications.
Each GB300 system integrates 72 NVIDIA Blackwell Ultra GPUs and 36 Arm-based NVIDIA Grace CPUs, collectively offering 1,400 petaFLOPS of FP4 AI performance. This configuration represents a 1.5× increase in dense FP4 compute compared to its predecessor, the Blackwell B200.
A notable enhancement in the GB300 is its memory capacity. Each GPU is equipped with 288GB of HBM3e memory, totaling over 20TB of GPU memory per system. This substantial memory boost enables the handling of larger AI models and datasets, facilitating more complex computations and faster processing times.
NVIDIA’s positioning of the Blackwell Ultra AI Factory Platform suggests incremental, rather than transformative, performance gains over the standard Blackwell chips. A single Ultra chip maintains the same 20 petaflops of AI compute as Blackwell but benefits from a 50% increase in high-bandwidth memory (HBM3e), jumping from 192GB to 288GB.
Similarly, a full-scale DGX GB300 “Superpod” still houses 288 CPUs and 576 GPUs, delivering 11.5 exaflops of FP4 computing—identical to the original Blackwell-based Superpod—though with a 25% increase in total memory, now reaching 300TB. These memory upgrades indicate NVIDIA is prioritizing handling larger models and improving AI reasoning efficiency rather than raw compute power.
Instead of focusing on direct Blackwell-to-Blackwell Ultra comparisons, NVIDIA is emphasizing how its latest platform stacks up against its 2022-era H100 chips, which still power a significant share of AI workloads. The company claims Blackwell Ultra delivers 1.5x the FP4 inference performance of the H100, but the most striking advantage is its ability to speed up AI reasoning.
For instance, an NVL72 cluster running DeepSeek-R1 671B—a massive large language model—can now generate responses in just ten seconds, down from 90 seconds on the H100.
NVIDIA attributes this improvement to a tenfold increase in token processing speed, with Blackwell Ultra handling 1,000 tokens per second compared to the H100’s 100 tokens per second. These figures suggest that while Blackwell Ultra doesn’t dramatically outperform its immediate predecessor, it offers compelling efficiency gains for companies still transitioning from previous-generation architectures.
Vera Rubin Superchip: Next-Generation AI Processing
Following the Blackwell Ultra, NVIDIA plans to release the Vera Rubin superchip in late 2026. Named after the renowned astronomer Vera Rubin, this chip integrates a custom-designed CPU (Vera) and GPU (Rubin).
The Vera CPU, based on NVIDIA’s Olympus architecture, is expected to deliver twice the performance of the current Grace CPUs. The Rubin GPU will support up to 288GB of high-bandwidth memory, significantly enhancing data processing capabilities for complex AI tasks.
The Vera Rubin architecture features a dual-GPU design on a single die, delivering 50 petaFLOPS of FP4 inference performance per chip. This design allows for more efficient processing and reduced latency in AI applications.
Additionally, the Vera CPU, succeeding the Grace CPU, comprises 88 custom Arm cores with simultaneous multithreading, resulting in 176 threads per socket. It also boasts a 1.8TB/s NVLink core-to-core interface, enhancing data transfer speeds between the CPU and GPU components.
The Blackwell Ultra GB300 and Vera Rubin Superchip represent substantial leaps in performance over NVIDIA’s previous chip architectures. The GB300’s 1.5× increase in dense FP4 compute over the B200 translates to more efficient processing of AI workloads, enabling faster training and inference times.
Similarly, the Vera Rubin’s 50 petaFLOPS of FP4 performance per chip signifies a considerable advancement, allowing for the deployment of more sophisticated AI models and applications.
NVIDIA’s aggressive development timeline, with plans for annual releases of new AI chip generations, reflects its commitment to maintaining a leadership position in the AI hardware market.