Microsoft Leak Exposed 38TB of AI Training Data with Passwords and Keys

The leaked data included the intended AI models and personal computer backups of Microsoft employees

Microsoft’s AI research division is coming under scrutiny after inadvertently exposing a staggering 38 terabytes of sensitive AI training data. Techcrunch reports that the data, which was meant to be a part of open-source training data on GitHub, included the intended AI models and personal computer backups of Microsoft employees, passwords to various Microsoft services, secret keys, and a vast archive of internal Microsoft Teams messages.

The root cause of this massive exposure was traced back to the use of Azure’s “SAS tokens”, which were configured to grant “full control” over the entire storage account, rather than the intended “read-only” access.

Misconfigured SAS Tokens

Shared Access Signature (SAS) tokens are a feature of Microsoft´s Azure Cloud that allows users to create links granting access to an Azure Storage account’s data. However, when misconfigured, these tokens can pose significant security risks. The Microsoft AI developers, in this case, included an overly permissive SAS token in the URL, which led to the unintended exposure. Cloud security firm Wiz, which discovered the misconfiguration, emphasized the challenges in monitoring and revoking such tokens. They highlighted that due to a lack of centralized management within the Azure portal, these tokens are hard to track. Furthermore, they can be set to last indefinitely, making their use for external sharing a potential security hazard.

Aftermath and Microsoft’s Response

Upon discovering the oversight, Wiz promptly reported the issue to Microsoft in June 2023. Microsoft acted swiftly, revoking the SAS token within two days, thereby blocking external access to the Azure storage account. Following an internal investigation, Microsoft confirmed that no customer data was compromised, and no other internal services were jeopardized due to the incident. As a preventive measure, Microsoft expanded GitHub’s secret scanning service to monitor public open-source code changes for potential exposure of credentials and other secrets, especially those related to SAS tokens.

Last Updated on November 8, 2024 11:18 am CET

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x