Microsoft has explained an Azure outage that affected many of its enterprise services. At 15:43 EDT, Microsoft 365, Dynamics, DevOps, Azure AD, and other offerings experienced significant issues until a fix at 18:35.
The Azure status history (via ZDNet) reveals the issue was caused by a faulty DNS migration. Microsoft was moving services from a legacy DNS system to Azure DNS when incorrect updates led to serious issues.
“During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated,” reads the statement. “No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.”
To solve the Azure outage, engineers simply corrected the nameserver delegation at 17:30. However, as some applications cached the domains, users didn't see a complete fix until it was flushed an hour later.
Root Cause Still Not Clear
Microsoft says it will continue to investigate the issue in a bid to find the root cause and prevent such occurrences in the future. Currently, all affected services appear to be working correctly, including Microsoft Teams, PowerBI, Planner, and InTune.
However, it's worth noting this isn't the first time the company has run into DNS issues with Azure. In January, users experienced sign-in issues in Office and Dynamics 365 after Microsoft ran into problems “with Level 3 as an internal network provider”. DNS issues also surfaced way back in 2016.
Outages happen with any service provider due to unavoidable or unexpected issues. Microsoft's Service Level Agreement (SLA) promises an uptime of at least 99.5% across many of its services. If it falls under that, customers can submit a claim for Microsoft credit of 100% for the month.