- Foundry GA: Microsoft made its Foundry Agent Service generally available, built on the OpenAI Responses API with multi-model support and enterprise-grade private networking.
- Hardware First: Microsoft became the first hyperscale cloud provider to power on NVIDIA Vera Rubin NVL72 systems, which deliver five times the performance of current Blackwell GPUs.
- Infrastructure Rollout: Vera Rubin NVL72 racks will deploy to liquid-cooled Azure Fairwater datacenters in Wisconsin and Atlanta over the coming months.
- Physical AI: Microsoft open-sourced an Azure Physical AI Toolchain on GitHub and deepened its integration with NVIDIA Omniverse for digital twin workflows.
Microsoft made its Foundry Agent Service generally available and powered on NVIDIA’s Vera Rubin NVL72 GPU systems in its labs, the company announced at NVIDIA GTC 2026 on March 16. With this dual release, Microsoft becomes the first hyperscale cloud provider to activate Vera Rubin NVL72 hardware while expanding the platform enterprises use to build and deploy AI agents at production scale.
Foundry Platform Expansion
On the software side, Foundry Agent Service is built on the OpenAI Responses API, making it wire-compatible with OpenAI agents. Microsoft designed the platform with an intentionally open architecture that supports models from Meta Llama, Mistral, DeepSeek, and xAI alongside frameworks like LangChain and LangGraph. Developers can mix and match foundation models within a single agent workflow, choosing the right model for each subtask rather than committing to a single provider.
NVIDIA Nemotron models join the platform’s model catalog as part of the GTC announcements, broadening the range of inference options available to enterprise customers. Moreover, a partnership with Fireworks AI brings high-performance open model inference to Foundry, enabling customers to fine-tune open-weight models like Nemotron into low-latency assets distributable to the edge. Combined with existing support for OpenAI, Anthropic, and Mistral models, Microsoft now claims the widest selection of models on any cloud platform.
By letting developers swap models at the subtask level, Microsoft shifts the competitive axis from model exclusivity to orchestration quality. Building on a deepening partnership with NVIDIA that now spans hardware, software, and model distribution, Foundry positions itself as a hedge against vendor lock-in at a time when enterprises remain reluctant to commit to a single AI provider.
For enterprises concerned about data security, Foundry Agent Service provides end-to-end private networking with bring-your-own VNet and no public egress. Private connectivity extends to tool integrations including MCP servers, Azure AI Search, and Fabric data agents. Furthermore, MCP authentication now supports key-based, Entra Agent Identity, Managed Identity, and OAuth Identity Passthrough methods, covering the full spectrum of enterprise identity patterns without requiring public internet exposure.
Yina Arenas, CVP of Microsoft Foundry, said Microsoft combines “accelerated computing with cloud scale engineering” to deliver AI capabilities to customers. Foundry Evaluations also reached general availability alongside the Agent Service, providing out-of-the-box evaluators, custom evaluators, and continuous production monitoring via Azure Monitor. As a result, organizations can track agent behavior in production and catch regressions before they affect end users.
Voice Live and Developer Tools
Separately, Voice Live API integration with Foundry Agent Service entered public preview. Voice Live provides fully managed real-time speech-to-speech capabilities with semantic voice activity detection and barge-in support, allowing agents to understand turn-taking naturally rather than relying on silence thresholds. Removing the audio processing burden from development teams, Voice Live targets developers building voice-first, multimodal agentic experiences.
In addition, Microsoft refreshed its Foundry portal with expanded integrations for Palo Alto Networks Prisma AIRS and Zenity, adding third-party security and governance layers to agent development workflows. Hosted agents are now available in preview across six additional Azure regions: East US, North Central US, Sweden Central, Southeast Asia, and Japan East. Regional expansion addresses data residency requirements that had previously limited adoption in regulated industries.
However, Corvus Energy offers an early example of how real-world deployment already looks, using Foundry to replace manual inspection workflows with agent-driven operational intelligence across its global fleet of marine battery systems. Rather than sending technicians to inspect each vessel individually, Corvus uses AI agents to monitor battery health, flag anomalies, and schedule maintenance proactively.
Meanwhile, Microsoft previously announced Foundry Local support for modern infrastructure and large AI models on February 24, 2026, signaling a push to bring agent capabilities beyond the cloud into on-premises environments.
Vera Rubin NVL72 and GPU Deployment
While Foundry handles the orchestration layer, the hardware announcements at GTC proved equally notable. Microsoft says it is the first hyperscale cloud to power on Vera Rubin NVL72 systems in its labs. Having already deployed hundreds of thousands of Grace Blackwell GPUs across its global datacenter footprint in less than a year, Microsoft established the infrastructure base for Vera Rubin adoption.
Consequently, operating some of the largest commercial InfiniBand deployments across multiple GPU generations gives Microsoft operational experience that few competitors can match at this scale.
According to Microsoft’s infrastructure planning published in January 2026, Vera Rubin Superchips will deliver 50 PF NVFP4 inference performance per chip and 3.6 EF NVFP4 per rack. That represents a five times performance jump over GB200 NVL72 rack systems, driven by sixth-generation NVLink fabric reaching approximately 260 TB/s of scale-up bandwidth.
In turn, Rubin infrastructure introduces ConnectX-9 networking at 1,600 Gb/s alongside HBM4/HBM4e memory with SOCAMM2 memory expansion architecture, addressing both the bandwidth and memory bottlenecks that constrain current inference workloads.
As the industry pivots from training-dominated workloads to inference-heavy agentic AI, the five-fold performance increase over Blackwell carries significant practical implications. For enterprise customers running large-context reasoning models, the combination of higher throughput and expanded HBM4 memory directly reduces the cost per inference query, making production-scale agent deployments economically viable in ways that prior GPU generations could not support.
Vera Rubin NVL72 racks will move into Microsoft’s liquid-cooled Azure Fairwater sites in Wisconsin and Atlanta over the coming months. Additionally, initial support for Vera Rubin on Azure Local was announced, extending accelerated AI capabilities to customer-controlled environments where data cannot leave a specific facility or jurisdiction.
Rather than building singular megasites, Azure favors regional supercomputers distributed across locations, differentiating its datacenter strategy from other hyperscalers. Distributing compute across regions reduces latency for inference workloads and provides redundancy that a single-site approach cannot offer.
Physical AI and Digital Twins
Beyond cloud infrastructure, Microsoft introduced a public Azure Physical AI Toolchain GitHub repository integrated with NVIDIA Physical AI Data Factory and core Azure services. Integration between Microsoft Fabric and NVIDIA Omniverse libraries is also deepening, enabling physically accurate digital twins and simulation workflows for industrial customers.
Notably, Omniverse integration allows engineers to validate robot and autonomous system behavior in simulation before deploying to physical hardware, reducing costly real-world testing cycles. By open-sourcing the toolchain on GitHub, Microsoft lowers the barrier for developers building simulation pipelines without assembling each component from scratch. Azure Machine Learning, Azure IoT, and Azure Digital Twins all connect into the workflow.
Microsoft’s hardware push sits within a broader surge of industry commitment to NVIDIA’s Rubin platform. Nebius and Meta Platforms announced a US$27 billion AI infrastructure agreement built on Rubin architecture, signaling demand well beyond any single cloud provider. NVIDIA also introduced Vera, a new CPU designed for agentic AI and reinforcement learning, further expanding the Rubin platform’s scope beyond GPUs alone.
Azure’s pattern of early adoption is not new. Azure was the first to deploy NVIDIA Blackwell chips, and NVIDIA’s Vera Rubin roadmap was first outlined at NVIDIA GTC 2025 in March of that year. In October 2025, Microsoft and NVIDIA jointly launched the first reported GB300 supercomputer for OpenAI, demonstrating the same consistent pattern of early hardware activation now repeating with Vera Rubin NVL72.
Looking Ahead
Together, Foundry’s general availability and the Vera Rubin power-on position Azure for the industry shift from AI training to inference. Foundry Agent Service provides the orchestration layer to manage agentic workloads, while Vera Rubin delivers the raw compute to run them at scale.
As AI agents move from experimental prototypes to production systems handling customer-facing tasks, enterprises will increasingly need both capabilities from a single provider. With Vera Rubin NVL72 racks rolling into liquid-cooled Azure datacenters over the coming months, the buildout will test whether Microsoft can sustain its hardware deployment speed advantage across yet another GPU generation.


