Protect Your Keys - Lessons from the Azure Key Breach

On July 11, 2023 Microsoft released details of a coordinated attack from threat actors, identified as Storm-0558. This state-sponsored espionage group infiltrated email systems in an effort to collect information from targets such as the U.S. State and Commerce Departments. While this was a fairly sophisticated attack leveraging multiple vulnerabilities, there are multiple lessons we can take from this incident to help any DevOps and security team improve their organization's security posture.

What happened

Starting on May 15 of this year, the China-based state actor identified as Storm-0558 gained access to Azure-based Office 365 email systems. The attack was discovered after Office 365 customers began to report unusual mail activity. On June 16, Microsoft began the investigation and remediation process.

Microsoft discovered that the access keys used were not actually issued by their organization. Instead, Storm-0558 had forged keys using a stolen Microsoft account (MSA) consumer signing key to create fake Azure Active Directory (AD) private keys. At the same time, they exploited a now-patched vulnerability that had allowed keys to be used across multiple systems.

There is an excellent blog post from Prof Bill Buchanan OBE "Losing The Keys To The Castle: Azure Key Breach Should Worry Every Organisation", that goes into more detail on how token signing works and how this attack succeded. More technical details about the incident can be found in the Microsoft incident analysis.

What you can do

Microsoft reported that they found no evidence indicating any additional unauthorized access after they completed their mitigation efforts. If you are a Microsoft customer, they assure us there is no further reason for alarm.

Their team identified several places where they needed to improve security, including increased isolation of the systems, refined monitoring of system activity, and moving to consistent use of their enterprise key store. Let's take a closer look at these and other lessons we can take away from this incident to help us all stay safer.

Listen for reports of anything suspicious

One fact that stands out in all the reporting is that the attack was not discovered by internal scanning or alarms; it was first identified by users who noticed something wrong. There is real value in listening for user feedback for indications something might be happening. Just as only relying on human reporting is not a good solution, overly relying on just automated alerts is just as flawed.

Remind customers and internal team members to report anything out of the ordinary when using your product. Not only can this help with security, but it can also help find and squash other non-security bugs. Review playbooks regularly to make the process of spotting possible security issues and the escalation routes are clear and up to date. You want your customer success teams and security teams to work together to identify potential security issues.

Store all keys safely

When Microsoft’s team first investigated the situation, they assumed that the attackers were using stolen customer keys. There is a very valid reason why they would think this, as the issue of secret sprawl continues to grow. There was a key that was stolen. However, it was not the end user keys; it was the key used to verify that the credentials were legitimate. Still, no matter how powerful any credential is, it is still a key that needs to be properly stored, which means encrypting them.

While we don't know exactly how this key was acquired, based on certain statements about improved use of their "enterprise key store" can lead us to ponder how these very important signing keys were stored. Storing keys in a vault system, such as Vault by Hashicorp, Azure Key Vault, or Doppler, provide a lot of security advantages.

Encryption at rest - the keys are stored in an unreadable and unusable state.
Programmatic access - only references to the secret within the vault are ever shared in the codebase.
Encryption in transit - keys are only unencrypted upon arrival, at runtime.
Access and change management logging - from a central location, you can see who has added or modified any credential, as well as when keys were accessed.

Another approach that is worth considering is storing secrets in a Hardware Security Module (HSM). As the name suggests, an HSM requires additional hardware in your infrastructure, which can provision, store and manage cryptographic keys. HSMs can provide additional security layers within their physical architecture that can go beyond what can be accomplished in software alone.

Do not reuse keys for multiple services

Another reason the attackers were successful during this breach was that the keys for one system could grant access to multiple other systems. The lesson from this vulnerability is that every key should be unique and specific to one job, what they referred to as "isolation of the systems."

When dealing with complex DevOps environments, it can be tempting to think that because someone or something was allowed access to one trusted service, they should be allowed access to other sensitive systems. This is always a bad decision from a security standpoint. Attackers will almost always try to laterally move throughout an environment, and they know how common it is to reuse credentials. They will most assuredly try to use the same credentials that worked once at every other lock they encounter.

Tightening the scope of your credentials to only allow just the minimum access needed to complete the work is at the core of the Zero-Trust architectural philosophy. Just as reusing your passwords is bad, widely scoped credentials that allow access to multiple systems are also something to avoid.

Regularly check your logs

One of the first steps the Microsoft team took in their investigation was to examine the logs to identify when unexpected access occurred and what keys were used. Logs can provide a lot of information after an incident once the attack is over, but they can help identify an attack in progress as well.

If regularly reviewing your access logs is not part of your security practice, adding this step to your daily or weekly tasks is a great step in the right direction. There are many solutions on the market to assist with log monitoring, such as Sumo Logic and Datadog, which can identify and surface unusual activity much faster. It is your data, and you are already collecting it; make sure you are taking advantage of it.

Hone your secrets rotation plan

A major part of Microsoft's remediation of the incident was the revoking and replacement of their signing keys. No matter what key has been compromised, key rotation is central to any incident response plan. One of the clearest signs of expert Secret Management Maturity is when secrets are scheduled for regular automated rotation. Most cloud providers, like AWS, make automated key rotation a straightforward process.

Level 4 Experts in Secret Management Maturity criteria

If you are not quite ready to embrace automatic rotation, then you should be striving for at least regular rotation. The more often you rotate credentials, especially when there is no real pressure and lower stakes, will make it all the easier to rotate them when an incident occurs.

Actively monitor for credentials in your environments

Knowing about a stolen secret only after the attack means knowing too late. One of the best ways to keep safe is to be aware when your keys become exposed as plaintext at any point in the software development lifecycle. This is where tools like GitGuardian Secret Scanning are needed. GitGuardian does a historical scan of each repository in your perimeter when they are added, giving you a baseline of what secrets exist in your code history, in every branch, and in every commit going back to the beginning of the project.

After that initial historical scan, GitGuardian then scans each new commit that enters your repositories to give you immediate alerts that plaintext credentials have shown up so you can act to remediate the issue. We provide validation checks and quickly notify you if the code is public, helping your team triage the incident. Discovering and remediating an exposed secret before an attacker can attempt to use it means they can't use it.

Keeping all your secrets secret is important

While the attack on Microsoft customers from Storm-0558 relied on a number of specific vulnerabilities, there are still some valuable lessons we can all take away. Consistently storing all your secrets properly is a very good start. Remember to properly scope any credential to only one use case and never reusing them across multiple services will help make sure if an attack based on leaking a key does occur, access will be extremely limited. Regular rotation of keys as well as active monitoring for unusual activity in your logs is critical for keeping safe in a world of ever-evolving security challenges. Being alerted to plaintext credentials in your environments will help keep you one step ahead of attackers.

Remember, security is a journey; nobody knows all there is to know on the topic. Make steps in the right direction at every opportunity to increase your security posture. GitGuardian is here to help you with your security journey.

For a further look into signing authorities, check out this episode of The Security Repo podcast with guest Billy Lynch from Chainguard to hear him explain the importance of signing in the software supply chain.

Also, check out our cheat sheet for best practices for managing and storing secrets.