Microsoft has done a lot to keep the performance of its Azure cloud services at optimum level. However, sometimes outages happen, and downtime can be costly for customers. To help mitigate situations where Azure services have issues, Microsoft says it will improve communication with its customers.
Specifically, Microsoft is pointing admins and IT professionals to the Service Health view in the Azure Portal. Here, outage information is readily available to anybody who has “owner, contributor, or reader access.”
Mark Russinovich, CTO of Azure, concedes problems occur but wants customers to feel more connected when outages happen.
“Service incidents like outages are an unfortunate inevitability of the technology industry. Of course, we are constantly improving the reliability of the Microsoft Azure cloud platform… In spite of these efforts, we acknowledge the unfortunate reality that—given the scale of our operations and the pace of change—we will never be able to avoid outages entirely. During these times we endeavor to be as open and transparent as possible to ensure that all impacted customers and partners understand what's happening.”
Microsoft has been increasing the accuracy of its outage reporting by leveraging artificial intelligence (AI) on its Azure Status page. That said, Microsoft admits the status page is more for reporting major outages that affect a large subset of customers.
Many users have become used to heading to the page for any outage information. Microsoft says the Service Health page is superior for a customer-level knowledge of downtime. Using AI, the Service Health section of Azure Portal integrates AI with DevOps, creating what Microsoft calls AIOps.
AIOps “includes working towards improving automatic detection, engagement, and mitigation of cloud outages.” It also notifies organizations when an outage may directly impact their workflow. Microsoft says many notifications will be sent to Service Health within 10 minutes of an outage.