Microsoft yesterday suffered a major outage across its cloud services, including Exchange, Outlook, Teams, and OneDrive. The outage affected thousands of users around the world, who reported problems accessing their email accounts and other productivity tools.
According to Microsoft's status page, the outage started around 10:00 AM ET on Monday, June 5th, and lasted for several hours. The company said that it was caused by a network issue that impacted multiple Microsoft 365 services. Microsoft said that it reverted an update that was applied to its services and that it restored normal functionality for most users by 2:00 PM ET.
“We've confirmed that service availability has returned to healthy levels. We'll continue to monitor service health while we analyze system logs to determine the cause of the problem. More details under MO572252 in the admin center.”
However, the outage was not completely resolved, as Microsoft reported another issue with Outlook on the web around 4:15 PM ET. The company said that it was investigating the cause of the problem and that it was implementing changes to improve the user experience. As of 6:30 PM ET, some users were still experiencing issues with Outlook on the web.
At the time of writing, the Microsoft 365 Status Twitter account was back to confirm the problem was back for some users and a further investigation was underway:
We've identified that the impact has started again, and we're applying further mitigation. Telemetry indicates a reduction in impact relative to earlier iterations due to previously applied mitigations. Further details about the workstreams are in the admin center via MO572252.
— Microsoft 365 Status (@MSFT365Status) June 6, 2023
The outage has been one of the worst for Microsoft this year, as it affected several core services that are widely used by businesses and individuals. Microsoft has faced at least three outages since the beginning of the year, raising questions about the reliability and security of its cloud platform. Microsoft has apologized for the inconvenience caused by the outage and said that it is working to prevent such incidents from happening again in the future.
Recent Outage with Microsoft Azure DevOps
Yesterday we reported on a code upgrade gone wrong caused a major outage for Microsoft Azure DevOps in the South Brazil Region on May 24. The outage lasted for more than 10 hours and affected 17 production databases.
According to Eric Mattingly, Microsoft's principal software engineering manager, the culprit was a typo bug in a snapshot deletion job. The bug made the job delete the entire Azure SQL Server instead of just one Azure SQL Database. “This also deleted all seventeen production databases for the scale unit”, Mattingly wrote in a post-mortem article. No Data Loss, But Slow Recovery
Microsoft assured that no data was lost during the incident, but the recovery process was lengthy and complicated. Customers could not restore their own Azure SQL Servers, so they had to wait for the Azure SQL team to intervene. This took about an hour. Moreover, backup redundancy issues and web server problems delayed the recovery further.