HomeWinBuzzer NewsNvidia Breaks AI Performance Records in Latest MLPerf Benchmarks

Nvidia Breaks AI Performance Records in Latest MLPerf Benchmarks

In the latest MLCommons benchmarks, Nvidia used 11,616 H100 GPUs, its largest deployment yet, to set records in five of nine categories.


has achieved unprecedented results in the MLPerf AI benchmarks, underscoring its leadership in machine learning capabilities. The company's systems excelled, especially in new training tests centered on large language models and graph neural networks (GNN). The were submitted by 17 organizations, reflecting over 205 results.

Powered by its Hopper architecture, Nvidia's systems came out on top in new training tests that involved fine-tuning. These benchmarks are essential for applications ranging from literature databases to fraud detection and analytics.

The MLPerf Training benchmark suite consists of comprehensive system tests that challenge machine learning (ML) models, software, and hardware across a wide array of applications. This open-source, peer-reviewed suite establishes an equitable competitive environment that fosters innovation, enhances performance, and promotes energy efficiency within the industry.

MLPerf Training v4.0 features over 205 performance results from 17 contributing organizations, including ASUSTeK, Dell, Fujitsu, Giga Computing, , HPE, Intel (Habana Labs), Juniper Networks, Lenovo, NVIDIA, NVIDIA + CoreWeave, Oracle, Quanta Cloud Technology, Red Hat + Supermicro, Supermicro, Sustainable Metal Cloud (SMC), and tiny corp.

Record-Breaking Performance

In the most recent MLCommons benchmarks, Nvidia utilized 11,616 H100 GPUs, marking its largest deployment yet, to set new records in five out of nine categories. This included the fine-tuning of the Llama-2-70B model on a government documents dataset to improve summary accuracy and the evaluation of GNNs.

The company achieved nearly linear scaling in performance, a key factor in efficiency, with a notable reduction in training times due to software optimizations post-architecture release. Enhancements included the use of 8-bit floating point operations and improved GPU communication, which alone improved GPT-3 training times by 27%.

The MLPerf 4.0 update, the first since November 2023, also included benchmarks for image generation with Stable Diffusion and further LLM training for GPT-3, showcasing significant improvements such as a 1.8x faster training time for and a 1.2x speed increase for GPT-3. 

Comprehensive Optimizations Across the Stack

David Kanter, the founder and executive director of MLCommons, highlighted the critical role of software and network efficiencies in complementing hardware advancements. Nvidia's results from the MLPerf 4.0 benchmarks demonstrated comprehensive optimizations across the stack, including highly tuned FP8 kernels, an FP8-aware distributed optimizer, and optimized cuDNN FlashAttention.

cuDNN FlashAttention is an optimized implementation designed to speed up the attention mechanism used in neural networks, specifically in Transformer models. It leverages the cuDNN (CUDA Deep Neural Network library) to improve the efficiency of attention computations on NVIDIA GPUs. FlashAttention reduces memory usage and increases processing speed by cleverly managing how data is stored and accessed during the computation.

This not only underlines Nvidia's leadership in deploying advanced GPU architectures but also emphasizes the strategic importance of continuous software enhancements. The reported advancements are crucial for organizations planning new data centers, one of which is expected to begin operations this year and another set to incorporate Nvidia's next-gen Blackwell architecture by 2025. They represent significant returns on investment for the industry, making Nvidia's efforts particularly relevant.

Last Updated on June 14, 2024 2:29 pm CEST

Markus Kasanmascheff
Markus Kasanmascheff
Markus is the founder of WinBuzzer and has been playing with Windows and technology for more than 25 years. He is holding a Master´s degree in International Economics and previously worked as Lead Windows Expert for Softonic.com.