HomeWinBuzzer NewsNvidia B200 Blackwell Outshines Google Trillium TPU in Latest MLPerf Training Benchmark

Nvidia B200 Blackwell Outshines Google Trillium TPU in Latest MLPerf Training Benchmark

Nvidia’s Blackwell B200 outpaces rivals in AI benchmarks, while Google’s Trillium narrows the gap in training performance.

-

Nvidia’s B200 GPU, built on the new Blackwell architecture, and Google’s sixth-generation TPU, Trillium, have taken the stage in the latest MLPerf Training v4.1 benchmarks, setting new standards for AI training speed and efficiency. These new entries reflect the rapid technological progress in AI, with Nvidia’s B200 showcasing a significant leap over its predecessor H100 and the competing Google’s Trillium narrowing the competitive gap.

MLPerf benchmarks from ML Commons are a set of standardized tests designed to measure how well different hardware and software systems perform at running machine learning tasks. They help companies compare the speed, efficiency, and power usage of their AI systems in a fair and consistent way.

The latest version, MLPerf Training v4.1, includes new tests like the Mixture of Experts (MoE) model, which is important for advanced AI tasks, and also focuses on measuring how much energy these systems use, helping organizations optimize both performance and sustainability.

MLPerf v4.1 AI Training Benchmarks Reflect Industry Shift

The MLPerf Training v4.1 benchmarks showcased results from 17 major tech players, including Microsoft, Oracle, and Dell, testing capabilities across six tasks like GPT-3 pre-training, fine-tuning Llama 2 70B, and Stable Diffusion image generation. The benchmark suite represents a complete shift from older tests, now centered around training large language models (LLMs) and generative AI, reflecting how the industry’s needs have evolved.

All of the original benchmarks have been retired, underlining the consortium’s commitment to keep testing relevant to current AI applications.

Nvidia’s B200 Dominates with Blackwell Technology

Nvidia’s B200 GPU emerged as a standout performer in these benchmarks. Doubling the training speed for GPT-3 compared to the H100, it also showed gains of 64% in recommendation systems and 62% in image generation tasks.

The B200’s architecture incorporates 4-bit floating-point precision, which speeds up processing for demanding AI applications like ChatGPT and Stable Diffusion, demonstrating Nvidia’s approach of using lower numerical precision to enhance performance.

Nvidia had initially aimed for a mid-2024 release of the B200, but unforeseen complications during development pushed production timelines back. A design flaw discovered in the late stages required mask changes and additional validation work with Taiwan Semiconductor Manufacturing Corp. (TSMC), delaying full-scale production to early 2025. This impacted Nvidia’s major clients, including Microsoft and OpenAI, who had anticipated earlier access to the new chips.

Microsoft’s Strategic Integration of GB200

Despite these delays, Microsoft quickly capitalized on Nvidia’s GB200 variant, integrating it as the first client into its Azure cloud services. The GB200 is a so-called Grace Blackwell Superchip that combines two powerful B200 Tensor Core GPUs with a 72-core Grace CPU. This configuration delivers the highest performance in the Blackwell lineup, capable of achieving up to 40 petaflops of sparse FP4 computational performance across its two GPUs. 

This move positions Microsoft to deliver more efficient AI training, with the potential to lower operational costs and improve service delivery. Advanced infrastructure like closed-loop liquid cooling and high-speed Infiniband networking supports the performance demands of these powerful GPUs.

Satya Nadella bets on Microsoft’s strong partnership with Nvidia, contrasting Microsoft’s exclusive reliance on Nvidia’s hardware with competitors like Google and AWS, who have invested in their own proprietary AI hardware. While AWS develops custom chip solutions and Google advances its own TPU technology, Microsoft’s commitment to Nvidia provides a unique approach that reinforces its position in the cloud services market.

Google’s Trillium Shows Impressive Progress

Google’s Trillium, its latest TPU iteration, showcased a 3.8x improvement over the 2023 TPU v5e in GPT-3 training tasks. While a significant enhancement, Trillium still lagged behind Nvidia’s top-tier H100 system in direct comparisons. A 6,144-TPU setup from Google completed a GPT-3 training task in 11.77 minutes, whereas Nvidia’s 11,616-H100 system did the same in 3.44 minutes, showcasing Nvidia’s continued leadership.

The Trillium TPU features upgrades like expanded matrix units and increased clock speeds, reaching up to 926 teraFLOPS at BF16 precision. The inclusion of AMD Epyc CPUs, instead of Intel Xeons used in prior models, marks a strategic change aimed at boosting performance for large-scale AI operations. During image generation tests such as Stable Diffusion, Trillium’s 1,024-TPU system completed training in 2 minutes and 26 seconds, falling a minute short of Nvidia’s results.

Energy Efficiency: The Ongoing Challenge

Energy consumption remains a focal point as AI training scales. Dell Technologies was the only company to report energy measurements in MLPerf Training v4.1, revealing that its configuration of 64 H100 GPUs and 16 Intel Xeon Platinum CPUs used 16.4 megajoules in a five-minute Llama 2 training session, equating to an average draw of 5.4 kilowatts. This insight provides a snapshot of the power required for large-scale AI training.

The issue of energy use continues to challenge AI hardware makers, pushing them to develop technologies that balance performance with sustainability. Nvidia and other industry leaders must navigate these demands while meeting the market’s expectations for ever-faster training capabilities.

The Competitive Landscape in AI Hardware

The release of Nvidia’s B200 amid production delays adds complexity to an already competitive field, where AMD and other firms are introducing their own AI-centric solutions. These challenges come at a time when Nvidia aims to stick to an ambitious rollout schedule for new AI chips, with setbacks like these potentially impacting market dynamics.

On the other hand, Google’s advancement with Trillium signals its intent to close the gap with Nvidia. As seen with its strategic architectural changes, Google is committed to enhancing the capabilities of its TPUs for competitive AI training performance.

Last Updated on December 7, 2024 5:38 pm CET

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x
Mastodon