At this year´s OCP Global Summit, a premier event in the open IT ecosystem development industry, Microsoft last weekend showed off new systems designed to deal with AI’s increasing demands. A key point of discussion was quantum security and cooling, but power solutions weren’t far behind. The tech company introduced modular systems intended to manage the heat and power needs of modern AI servers, aiming to help data centers keep up with the pace of change.
Securing AI Against Future Quantum Threats
One of the more futuristic elements of Microsoft’s announcements is the focus on quantum security. With quantum computers on the horizon, current cryptography systems may not be enough to protect data in the coming years. Microsoft introduced the Adams Bridge quantum-resistant accelerator as part of their plan to make AI systems more secure against future quantum attacks.
The Adams Bridge project is open-source and tied into Microsoft’s work on Caliptra, a so called open-source silicon root of trust (RoT). Caliptra provides a set of security properties that anchor the security of a system-on-a-chip (SOC), including CPUs, GPUs and SSDs, into the hardware. The RoT provides verifiable cryptographic assurances of the security configuration and workload protection mechanisms of an SOC, ensuring only trusted firmware can execute on the SOC.
The goal here is to protect against quantum-based hacking, which could break through traditional encryption methods easily. With the National Institute of Standards and Technology (NIST) already setting new standards for quantum-resistant cryptography, Microsoft wants to ensure its infrastructure is ready for whatever the future throws at it. On August 13, 2024, NIST officially released the first three finalized post-quantum cryptography (PQC) standards, which are designed to withstand cyberattacks from both classical and quantum computers, addressing the potential threat that quantum computing poses to current encryption methods.
Here is a video from a presentation earlier this year with more details about the Adams Bridge project.
Hardware Security Gets a Boost with New Auditing Tools
Microsoft’s security efforts also extend to the hardware side. The company co-founded the OCP-SAFE initiative, which aims to make hardware and firmware security audits more transparent and consistent. The project aligns with Microsoft’s other efforts to ensure supply chain security for the components used in its AI infrastructure.
The goal is to create a system where hardware can be more easily trusted, from the manufacturing stage all the way to deployment. OCP-SAFE, combined with Caliptra, is a move towards creating a secure environment for future AI systems.
Microsoft’s Contributions to AI Supercomputing Scale
Microsoft has also been ramping up its AI supercomputing capabilities over the last few years. These supercomputers, which are spread across its global datacenter network, are designed to handle increasingly complex AI workloads. Microsoft has been making performance improvements through hardware and software optimization, and the company is now sharing those advancements with the open-source community.
For example, Microsoft worked with major tech companies like AMD and Intel to develop new low-precision computing techniques. This effort was part of its Microscaling (MX) Alliance, which aims to make AI computations more efficient while keeping power usage low. Microsoft’s developments in custom silicon and system-level optimization have helped push AI forward, while also improving energy efficiency across its datacenters.
Global Cooling Solutions for AI Powerhouses
Microsoft has been working on some advanced cooling tech for AI systems. During the OCP Global Summit, they revealed a liquid cooling heat exchanger that’s made to be easily deployed in datacenters around the world. These devices are meant to keep up with the ever-increasing size of AI models, as server racks are getting denser and need more efficient heat management. One key detail? The system is modular, so it can be set up wherever it’s needed without a lot of customization.
For a bit of context, Microsoft’s latest move builds on last year’s Azure Maia 100 system. The company developed a closed-loop liquid cooling system back then, and now they’re expanding on it to handle larger workloads with a global reach. This new design helps deal with rising temperatures without needing massive hardware overhauls.
Modular Power: Scaling AI to New Heights
When it comes to power, AI is pushing things to the limit. To help datacenters keep up, Microsoft and Meta have joined forces to develop a modular power system. This setup allows racks to scale from hundreds of kilowatts to a full megawatt, making room for more AI accelerators in each server unit. This means datacenters can handle more intensive AI computations without needing major electrical upgrades.
The solution, known as a disaggregated 400 High Voltage Direct Current (VDC) rack design, offers flexibility. It allows power to be adjusted to fit the changing needs of AI systems—whether for training or inferencing. Microsoft says the system could increase the number of AI accelerators in a rack by as much as 35%.
Last Updated on November 7, 2024 2:30 pm CET