AMD has announced its newest AI accelerator, the MI325X, which brings increased memory bandwidth and capacity. This announcement aligns with AMD's consistent annual updates for its “Instinct” series, targeting higher AI performance.
Enhanced Memory and Bandwidth
Teased during AMD's Advancing AI event in December 2023, the MI325X improves upon the MI300X. It features eight compute, four I/O, and eight memory chiplets using advanced 2.5D and 3D packaging techniques. With 288GB of HBM3e memory, it surpasses Nvidia's H200 and even the anticipated Blackwell chips in capacity. The device achieves memory bandwidth up to 6TB/sec, marking an advance from the MI300X's 5.3TB/sec and notably exceeding the H200's offerings.
AMD partner Microsoft has already announced the implementation of AMD's MI300X accelerators to its Azure cloud alongside its custom Cobalt 100 chips.
Performance and Precision Metrics
The MI325X retains the MI300X's 8 XCD + 4 IOD configuration, offering 1216 matrix cores and a peak throughput of 2614 TOPS at INT8. AMD anticipates a 35x increase in AI inference performance with the MI350 compared to the MI300X, leveraging an 8-way MI350 node running a 1.8 trillion parameter GPT MoE model. Currently, there is no plan for a PCIe version of the MI300 series, as demand for OAM modules remains strong.
Despite impressive memory upgrades, the MI325X's CDNA 3 GPU tiles have not significantly boosted floating-point operations per second (FLOPS). The chip delivers 1.3 petaFLOPS in dense BF/FP16 performance and 2.6 petaFLOPS for FP8 precision. AMD has emphasized FP16 performance due to the vLLM inference library's limited support for FP8, potentially balancing out the memory advantage as FP8 models on Nvidia's H200 require less memory.
The landscape for AI accelerators is becoming increasingly competitive. Intel's Habana Labs with its Gaudi3 accelerator and Nvidia's upcoming Blackwell chips are notable contenders. While Gaudi3 provides 1.8 petaFLOPS for dense FP8 and FP16, it does not support sparsity like AMD and Nvidia's solutions. On the other hand, Nvidia's Blackwell B200 configuration promises up to 4.5 petaFLOPS in dense FP8 and 8TB/sec of bandwidth, posing a significant challenge to AMD.
AMD is following an annual release schedule for its Instinct accelerators, similar to Nvidia's approach. The next iteration, based on the CDNA 4 architecture expected in 2025, will utilize a 3nm process for compute tiles and introduce support for FP4 and FP6 formats. AMD's roadmap through 2026 includes new architectures and products designed to maintain its position in the AI accelerator market.
Market Insights from AMD CEO Lisa Su
During Computex, AMD CEO Dr. Lisa Su addressed the rapid growth of the Instinct line, noting the MI300 series consistently exceeds sales and growth expectations. The MI325X, built on the same computational silicon as the MI300X but equipped with faster and more efficient HBM3E memory, is slated for release in Q4 of this year, aligning with Nvidia's next-generation B200 Blackwell accelerator launch.
AMD´s CDNA 4 architecture, set for release in 2025, will adopt a 3nm process and support lower precision data formats like FP4 and FP6, aiming to enhance compute throughput and decrease memory pressure. This foundation will be utilized in the MI400 series, with AMD's roadmap also planning the introduction of the CDNA “Next” architecture in 2026.