Huawei has introduced its AI CloudMatrix 384 system, a large-scale cluster designed to compete directly with Nvidia’s leading GB200 NVL72 architecture by deploying a substantial number of its Ascend 910C processors. Announced shortly after the US government effectively restricted exports of Nvidia’s H20 AI chip to China around April 15, the CloudMatrix 384 represents a domestic alternative aiming to fill the void.
It achieves performance metrics that, on paper, surpass Nvidia’s current flagship GB200 NVL72 system in several areas, but does so through a strategy favoring scale over silicon sophistication, resulting in a stark power consumption penalty.
The system’s foundation is the Ascend 910C, a dual-chiplet processor delivering 780 TFLOPS using the BF16 numerical format common in AI. The full CloudMatrix 384 cluster integrates 384 of these accelerators, and according to SemiAnalys, is reaching an estimated total of 300 PFLOPS of dense BF16 performance.
This figure exceeds the roughly 180 PFLOPS attributed to Nvidia’s 72-GPU GB200 NVL72 setup. Huawei’s design also packs considerably more memory, with 49.2 TB of total HBM (High Bandwidth Memory, a type of stacked memory providing fast data access for processors) capacity compared to Nvidia’s 13.8 TB, and 1229 TB/s of total HBM bandwidth versus 576 TB/s.
Performance Through Scale, Not Efficiency
This performance advantage, achieved by deploying over five times as many accelerators as the Nvidia comparison system, comes at a steep energy cost. The CloudMatrix 384’s total system power requirement is estimated at 559 kW, nearly four times the 145 kW consumed by the GB200 NVL72 configuration.
Calculations based on these figures indicate the Huawei system is 2.3 times less power-efficient per TFLOP of BF16 compute and 1.8 times less efficient per TB/s of memory bandwidth. Efficiency per terabyte of HBM capacity is closer, with Huawei’s system using about 1.1 times more power.
This disparity highlights a strategic adaptation to China’s circumstances – grappling with restricted access to the most advanced, power-efficient chip manufacturing while possessing ample and relatively affordable energy infrastructure. Electricity prices in parts of China have notably decreased, reportedly dropping to around $56/MWh in early 2025 from previous levels near $91/MWh in 2022, making power-hungry systems more economically feasible than they might be elsewhere.
Optics Over Copper: The Network Backbone
Key to enabling this large-scale cluster is the CloudMatrix 384’s networking architecture. Huawei has opted for an all-optical approach for both inter-rack and intra-rack communication, connecting the 384 Ascend 910C processors in an all-to-all mesh. This involves deploying a massive 6,912 Linear Pluggable Optics (LPO) transceivers, each operating at 800 Gbps.
LPO technology, often discussed in industry reports like those from LightCounting, is seen as a potentially lower-power option compared to traditional DSP-based transceivers for shorter data center reaches, potentially offering some power savings within the network fabric itself, though managing signal integrity in such a large, complex optical network presents its own challenges.
The resulting aggregate internal bandwidth surpasses 5.5 Pbps. SemiAnalysis calculates the system offers 2.1 times the scale-up bandwidth (within the 384-node cluster) and 5.3 times the scale-out bandwidth (for connecting multiple clusters) compared to the GB200 NVL72 baseline.
The overall 16-rack system design, with 12 compute racks and 4 dedicated network switching racks, bears resemblance to Nvidia’s unreleased DGX H100 NVL256 “Ranger” platform, which also featured a large, optically connected multi-rack design deemed too complex and costly for production at the time.
Navigating the Sanctions Maze
Executing this strategy depends on securing advanced components despite stringent US export controls. While China’s SMIC can produce 7nm-class chips suitable for the Ascend 910C’s compute chiplets, analysis suggests the processors deployed so far primarily utilize chiplets fabricated by TSMC.
Huawei allegedly secured these restricted wafers – potentially enough for over a million Ascend 910C processors through 2025 – via intermediaries like Sophgo, circumventing direct sanctions against Huawei. This activity reportedly led to US scrutiny, with TSMC potentially facing a significant fine as reported in early April.
Accessing essential HBM2E memory reportedly involves a similar workaround, channeling Samsung components through distributor CoAsia Electronics, whose revenue noticeably increased following HBM export controls.
This involves design firm Faraday Technology and assembler SPIL creating technically compliant intermediate packages containing the HBM, which are then shipped to China where the memory is purportedly extracted for use in Huawei’s final Ascend 910C modules. These maneuvers underscore the ongoing challenge of enforcing technology export controls.
A Calculated Gamble in China’s AI Race
The CloudMatrix 384 launch is strategically timed. The US action halting Nvidia H20 exports removed a key competitor specifically tailored for the Chinese market under previous restrictions. The H20, although a lower-performing part compared to unrestricted Nvidia GPUs and potentially Huawei’s prior 910B chip, was Nvidia’s main compliant offering for China, and the ban forced the company to take a $5.5 billion charge for related inventory.
This regulatory shift created a significant market opening, which Huawei is moving to fill not only with the CloudMatrix system but also its simultaneously announced next-gen Ascend 920 chip.
The situation prompted Nvidia CEO Jensen Huang to visit Beijing shortly after the ban, where he reportedly stated Nvidia hoped “to continue to cooperate with China.”
Analyst Patrick Moorhead predicted the outcome bluntly: “Chinese companies are just going to switch to Huawei.” This aligns with broader Chinese technology goals, seen in initiatives like the phase-out of foreign telecom chips and the substantial “Big Fund” supporting domestic semiconductor development. Huawei’s CloudMatrix 384 shows a pathway to competitive AI system performance now, accepting higher power use while navigating a complex global supply chain under geopolitical constraints.