A bug in Microsoft’s cloud services caused the loss of security logs for weeks, potentially exposing customer networks to unseen threats. Businesses using Microsoft’s Entra, Sentinel, and other services lost access to critical data, disrupting their ability to monitor for unauthorized access or potential breaches during the affected period from early September to mid-September.
Missing Data Impacts Key Services
Between September 2 and September 19, 2024, a logging failure resulted in incomplete security logs for a number of major Microsoft products. The issue was caused by a malfunction in Microsoft’s internal monitoring agents, which failed to upload log data to the company’s servers. Affected customers were notified that their logs might be incomplete or unavailable, hindering their ability to track unusual activity on their networks.
Microsoft’s internal monitoring agents are software components that collect data about the performance and health of various systems within Microsoft’s infrastructure. They gather information about hardware usage, software performance, network traffic, and other relevant metrics. Data is then sent to central monitoring systems, where it is analyzed to identify potential problems and optimize system performance.
Several important Microsoft services were hit, including Entra, which saw incomplete sign-in logs and gaps in activity data. Microsoft Sentinel users also faced problems with missing security alerts, making it harder to detect suspicious behavior during the outage. Microsoft reported that gaps in logs from Azure Monitor and Power Platform might have disrupted data exports and analytics.
Technical Issue: The Deadlock Bug
The issue stems from a bug introduced while Microsoft was fixing an unrelated problem in its log collection system. Microsoft’s fix accidentally triggered a “deadlock” in the system’s telemetry dispatch, stopping some monitoring agents from uploading logs.
Although the agents continued to gather data, they couldn’t send it back to Microsoft’s servers, and in some cases, older log data was overwritten before the agents were restarted, making it impossible to recover the lost information. The company has now fixed the bug and assured customers that the issue is resolved.
Although Microsoft detected the bug on September 5, the issue wasn’t fully addressed until October 3. Temporary fixes, such as restarting the affected monitoring agents, were rolled out from mid-September, improving log collection for some services. However, the full recovery took nearly a month, during which some customers continued to experience delayed or incomplete logs.
During the recovery, Microsoft applied a series of patches to prevent the bug from affecting other regions and services. By late September, they had restored most functionality but continued monitoring the systems to ensure the issue wouldn’t recur.
Long-Term Concerns for Businesses
This isn’t the first time Microsoft’s logging practices have come under fire. Last year, Chinese-backed hackers infiltrated Microsoft’s cloud systems using stolen credentials, gaining access to government emails.
The breach went undetected for longer than it might have, as only premium-tier customers had access to the advanced logging features that could have spotted the intrusion earlier. Following that incident, Microsoft expanded access to advanced logs in 2024, allowing more customers to monitor their systems effectively.
Microsoft’s latest outage has raised concerns among cybersecurity professionals about the reliability of cloud-based logging services, which are critical to detecting and responding to security incidents. With missing logs, businesses are left blind to potential attacks that could have occurred during the period when data wasn’t being collected.
Last Updated on November 7, 2024 2:28 pm CET