The Hidden Challenges of Automating Secrets Rotation: Why Automatic Credential Rotation Isn’t a One-Click Solution
When talking to potential customers at GitGuardian, we often get asked, "Can we use GitGuardian to automate the rotation process?" The person asking often has in mind a single-button solution that auto-magically marks an incident as resolved and handles the "other details" somehow. Those other details are exactly why this is such a hard problem for an enterprise to address and why this feature is not something you can ask a single vendor to provide.
We all want to automate things as far as possible, but the reality is that the larger the organization and the more developers touch a shared codebase, the harder this process becomes. Imagine trying to coordinate a single change throughout a complex, interdependent system deployed throughout several data centers worldwide–all without breaking anything. Now multiply this by the number of secrets you are trying to manage, and you see the real scope of the issue.
The good news is the answer to their original question is actually, "Yes, we can be part of your automation process, but…" It is entirely possible to automate your entire remediation process, but this requires a lot of work, coordination, and the right technologies to be in place before you should ever attempt this in production.
Let's look at what can happen if you try to automate remediation too soon. Let's also make a plan to help you go from wishing for a magic button to applying the right solutions to get you where you ultimately want to go towards complete automation of secrets rotation at scale.
What are we trying to automate with secrets?
On the surface, rotation merely replaces an existing functional credential. The deeper you look at it, though, the more steps and processes it actually takes become clear and start to add up quickly.
Rotation requires:
- Creating a new secret.
- Properly storing it.
- Potentially modifying the codebase to safely call it from the new location.
- Replacing the existing secret inside the application, which might mean a restart.
- Invalidating the old credential.
- Optionally removing the old secret from your code.
Along the way, we need to make sure nothing breaks and there is as little downtime as possible, with zero downtime being the goal. That is a tall order and is one of the largest reasons it takes, on average, 27 days for an enterprise to rotate exposed secrets. The potential for taking your mission-critical application offline is often seen as the larger risk, even when you know one of your most important keys protecting your most valuable data has been exposed publicly.
Also, the new credential needs to be scoped exactly like the old one, with the same permissions, so the application will connect and run as expected. From a security perspective, you must also be careful not to grant more access than is absolutely needed to get the job done. Setting permissions too wide can introduce new ways attackers can do harm. For example, giving "write" permissions to a non-human identity where "read-only" would have worked would let an attacker encrypt that data without any real resistance in a ransomware attack.
What can go wrong if we automate rotation too early
While there are many dangers along the way, the ultimate fear is that you will take down production if something goes wrong. This alone is the largest reason not to rush to automation, especially if you have not locked down the essentials of secrets management yet and are still a maturing organization in this area. But beyond the nightmare scenario of accidentally taking your application offline, let's take a look at what challenges people have for automation at every step.
Issues with creating a new credential
While most systems will let you make a new credential with a simple API call, this assumes you already know all about that credential in the first place: how it is scoped, what it connects to, and what depends on it working. If you don't fully know, then you can not automate this process. We need a way to inventory our non-human identities before we do anything else.
We need a way to scope the existing credentials quickly. Ideally, the original developer would have commented about the permissions they granted this non-human identity somewhere in the project, but this assumes a lot. This is one of the reasons GitGuardian is building our new Secret Analyzer (currently in private Beta) to understand what permissions were set.
This is where Secrets Managers like CyberArk's Conjur or Vault by HashiCorp come in. Having a centralized management platform for all your secrets is essential to automating the process. These tools offer a way to attach descriptions of the secret when placing it in the vault, and this can include the needed information about their scope. Without a vault to call the new secret from, you might replace a hardcoded value for another hardcoded value, not solving secret sprawl at all.
The dangers when replacing the old credential with the new one
While replacing a secret may be as simple as updating the value in your secrets manager, it depends on how the application and system were architected. If it was a Blue/Green deployment that uses a pull model, then you should be able just to handle this via your secret manager. If you have no failover built into your system, though, then you will need to wait for a maintenance window to replace it, which is highly unlikely something you can fully automate.
Also, if your application used a 'push' model, where the secret was loaded into memory when the application was built, then you will need to wait for the next deployment. These application architecture issues go well beyond the scope of just the security team and demand a larger conversation.
Deactivating the old secret is where the largest dangers lay
Just because you replaced the old secret in the application does not mean the old one automatically stops working. If an attacker still has it, and it is still valid, then the attacker does not care what the code in production is doing; they only care that they have access to your data and machine resources. You still need to revoke the old key.
The question now becomes: when can it be deleted or decommissioned from the service or system that issued it? What would happen if the codebase was rolled back to a point where it contained the old secret?
Unless you fully understand the lifecycle of your non-human identities, this is a hard question to answer. Suppose you have already standardized on a secrets management platform, and all the calls are issued to that service rather than scattered env files or plaintext secrets in your codebase. In that case, you might be at a place to think about automating the deletion of old keys, but not before that is handled.
Remove it from the codebase
You might think, "Can't we just remove the secret from our codebase on the next push?" We wish it were that simple. Just removing the secret from the next commit does not protect you if the secret is still valid. Given how Git works, this does not remove the secrets from your Git history. The secret will live there forever or until you completely rewrite your history.
There are some scripts and solutions on the market that make the process simpler. After all, rewriting your git history is a well-understood set of commands. But this assumes you have the only copy of your codebase, and no one else would overwrite your changes. Also, it assumes everyone will pull your changes and live with the modified version of history. While the commands are straightforward, getting the collective buy-in and avoiding potential git merge conflicts will take manual efforts far beyond what any platform alone can offer.
How to automate secrets rotation at scale
To be clear, GitGuardianis is very much in favor of automating the rotation process for your secrets. This is why we partner with enterprise secrets management platforms like Conjur and Vault. We know some of our customers are actively rotating secrets with a simple script run or a scheduled job call. The one thing they all have in common is that they all are at a high level of secrets management maturity.
If you want to automatically rotate secrets, here are the steps we suggest:
- Inventory your secrets - Search through code and systems involved in the software development lifecycle to identify existing plaintext credentials, gathering as much information as possible about each.
- Standardize your secrets management - Accounting for all known secrets through a centralized vault platform.
- Document all the permissions set for your secrets - Develop a process to document each permission every new credential is granted. This is critical for replacing a secret without widening the scope.
- Develop a communication plan in case something goes wrong - You and your enterprise do not want to be surprised by an unannounced system change that causes an outage. Make sure you have a communication plan in place and raise awareness that you are automating this element of DevOps.
- Make sure your applications account for automated credential swapping - Rearchitect any applications that rely on a push model and any that do not have failover for credentials in place. Ideally, credentials should be pulled in as needed, existing there for only the time needed in the application's memory.
- Get internal agreement on codebase cleanup and maintenance - While it is possible to remove the secret from your Git history locally, the process to remove it from a shared codebase is much more complex.
How GitGuardian can help
We have worked with thousands of developers at some of the largest companies on Earth to mature their secrets and security practices and reach a point where they can safely automate any and all secrets at scale.
The GitGuardian Secrets Detection platform is legendary for helping people discover their plaintext credentials throughout their codebases, project management tools, communication platforms, and other sources. Working with secret managers like CyberArk's Conjur, we can help you make sure the secrets you expect to be exclusively stored in your vault are there and only there.
We have also released the Beta version of our Secret Analyzer, which can show you scoped permissions for a select few of the most common API keys used to manage projects at scale. Many more will be supported soon, making the permissions problem as painless as possible, freeing your developers from being eternally on the hook to remember exactly how each secret was scoped, and giving your team the insight it needs to move at the speed of automation.
The GitGuardian platform is API driven and can trigger the automated communications plan you need when there is an incident, and your remediation process has started. Since you can communicate with GitGuardian from other services via this same API, managing incidents from systems like Jira and ServiceNow helps complete the loop of automation from alert on discovery, to closing the incident once you have finished rotating the secret.
You are not alone on the path to automatic remediation
Hopefully, you can now see why there is no 'automatic remediation' button in GitGuardian and why there is not likely to be one anytime soon. While we can certainly help with many parts of the remediation process, helping you prioritize and streamline steps along the way, a fully automated remediation system requires multiple parts and a lot of planning.
Remediation itself of secrets is fraught with perils. We look forward to the day when every enterprise, no matter how big or small, will reach a level of secrets management maturity to completely automate the remediation process. We will keep making progress and are there for you every step of the way as you work to get secrets sprawl under control and help you automate the parts you can today.