The never-ending story: Microsoft AI team accidentally exposed 38 Terabytes of internal data

3 min read

Updated October 30, 2023
Published September 27, 2023

The accidental sharing of cloud access is an all-too-familiar story.

In one latest incident, Microsoft’s AI research team accidentally exposed to the public 38 Terabytes of private data including internal messages, private keys, and passwords, according to a recent report [1]. And all it took to cause this gigantic exposure was a few errant clicks in a configuration menu.

How did it happen?

To share some of their AI models with the public, Microsoft’s AI research team configured a shared access URL to their models stored in Azure Storage and shared the URL on GitHub.

The URL was intended to access only the models, but a misconfiguration of the embedded signature in fact allowed complete access (read and write) to the entire Azure Storage account which included complete workstation backups, private keys, passwords, and hundreds of thousands of internal messages.

The mistake was enabled by Azure’s Shared Access Signature feature. To make matters worse, the existence of active Shared Access Signatures is difficult to monitor and audit. In fact, the Wiz research team that discovered the exposure believes the exposure had been in place since July 20, 2020!

We all strive to be careful in configuring cloud storage, but as the unending string of similar incidents show, costly slip-ups are bound to happen sooner or later especially in the absence of systematic guardrails in place to prevent them.

Does your organization use cloud storage? Here are my recommendations to minimize your risk of similar exposure.

How to minimize your risk of exposure

Disable Shared Key authorization for your organization’s Azure Storage accounts. Shared Key authorization makes the actual breadth of people with access opaque and difficult to govern and monitor.
For cases where anonymous public access is needed, use Azure Storage containers explicitly configured for public access.
Restrict the ability to configure public access to only dedicated accounts to be used for public sharing. Implement governance workflows to avoid the accidental sharing of private data.
If signature-based shared access is truly necessary, consider using a Service Shared Access Signature along with stored access policies that can be configured per container and monitored for compliance.
For the highest level of fine-grained control and governance over your cloud storage sharing, consider using a policy-as-code solution that can be governed by both automated and human code review to adhere to the principle of least privilege.
Putting infrastructure-as-code guardrails in place can help prevent insecure configurations from even being deployed. Consider a code to cloud policy solution that helps your organization deploy virtual infrastructure quickly and safely.

[1] 38TB of data accidentally exposed by Microsoft AI researchers

Cloud native
Authorization

Entitlement Explosion Repair

Join Styra and PACLabs on April 11 for a webinar exploring how organizations are using Policy as Code for smarter Access Control.

Speak with an Engineer

Request time with our team to talk about how you can modernize your access management.

Schedule a demo