I recently had the opportunity to speak at PyCon Italia in the beautiful Italian city of Florence. This ancient city has played host to many fierce battles over the centuries, but in 2023 a new intense battle was shaping up, this time in the conference center halls. Should we use environment variables to store secrets or not? (and by intense battle, I really mean polite discussion).
Alexander Darby from Palo Alto Networks delivered a fascinating talk that walked through the steps that attackers make when attacking your applications and particularly Python applications. But during his presentation, Alexander suggested that using environment variables to store secrets like API keys is a bad practice. As if it was planned, immediately after his presentation, I was on stage where I shared my contradicting rationale behind why you should use environment variables to store secrets. In this article, I will break down the key arguments for both and present a totally unbiased explanation as to why I am right (or perhaps why we are both right).
What are environment variables?
At their core, environment variables are key-value pairs that are available to a running process or program. Environment variables play a crucial role in configuring applications and defining their behavior. These variables are dynamic values that are set within the operating system or runtime environment, and they can be accessed by programs to customize their functionality.
They provide a flexible way to pass information and configuration data to applications without hardcoding them into the code. This flexibility allows for easier deployment and configuration management, as the same codebase can adapt to different environments without modification. Environment variables are one of the 12 ‘factors’ that influenced much of the cloud-native paradigm.
A couple of examples of how environment variables can be used are below:
- Path Definitions - Environment variables are commonly used to define paths to important directories or files.
- Configuration Flags - Environment variables can be used to toggle certain features or modify the behavior of an application.
- API Keys and Secrets - API keys and secrets are sensitive information that should not be exposed in the source code. Environment variables provide a secure way to store and access such credentials.
- Database Configuration - Many applications rely on databases to store and retrieve data. Instead of hardcoding the database connection details into the code, developers can use environment variables to specify the database hostname, port, username, password, and other relevant parameters.
These examples represent just a fraction of the possibilities offered by environment variables. Their true power lies in their versatility and ability to adapt to various environments without code modifications. But now let’s dive into why these may help or hinder security.
The security concerns with Environment Variables
During Alexander's presentation, he showed the path an attacker might take when it comes to attacking an application. These steps were enumeration, the discovery of assets and useful information, sending the payload, injection of malicious code or exploitation of a vulnerability, credential harvesting, the discovery of secrets inside infrastructure and escalation of privileges, and moving into different areas of the application or infrastructure. The argument against using environment variables was unsurprising during the credential harvesting stage.
“Every penetration test I know runs [the command] env once they get access to a machine, it is literally the first command attackers run” Alexander Darby
“It is really easy as an attacker to get usernames and passwords because so often they store them as environment variables” Alexander Darby
Example of environment variable dump as a result of running env
Alexander stated that when an attacker makes it into the infrastructure running an application, the very first thing they will do is run env. You can try this in your own terminal, and you will find a list of all the environment variables running on your machine. This is very much true, and I can almost guarantee that in doing so, you will be able to quickly find lots of secrets like API keys to be able to move laterally into systems and escalate your privileges. When put like this, it seems that using environment variables means we are putting all our secrets, our keys, in one place in a nice little package for the attackers. All true. Except, we must remember one key thing, to do this, the attacker must already have compromised the system running the application.
They already have access to the code, the RAM, and basically everything. Essentially, to get to this point, the attacker must have already exploited a series of vulnerabilities and will certainly have lots of avenues of attack. Does it really matter then if we are using environment variables?
Alexander was asked this exact question by the audience.
“You can’t make something 100% secure, but you add friction” “If developers are lazy, then hackers are lazier” Alexander Darby
In security, we want to have the goal of making something impenetrable, but the reality is there are always vulnerabilities to exploit. A valid line of defense is to add moats to our security, the concept of something that might deter or slow down an attacker but could be bypassed. This here is the argument against environment variables. By using environment variables, we are making it very easy for an attacker to escalate quickly. By not using environment variables, we may deter a less motivated attacker or slow them down to the point where we can take defensive action to stop their movement.
If not environment variables, then what?
You can avoid using environment variables, but it will depend on the infrastructure you are hosting your application with. The example given during the presentation was using Google’s Cloud Providers built-in secrets manager, there is an equivalent tool for AWS and Azure. Here you can make an encrypted call to retrieve secrets from where they are securely stored in the cloud and load them into application memory. This approach certainly adds some additional steps for the attacker, and if they are following a standard attack path, may even be enough to stop them from finding more secrets. But as stated above, the attacker has already gained access to the infrastructure running the application, so they will certainly still be able to retrieve them by dumping the memory, for example.
There are also other, even more, secure ways of using secrets too. This includes using a tool like HashiCorp Vault, which can create dynamic secrets (ephemeral secrets that are revoked after use). In this case, even if the attacker were to find the secrets, they would be invalid.
So is this more secure? From a security perspective, absolutely, but let me lay down my arguments as to why I still think you should use environment variables.
The argument for environment variables
During my presentation at PyCon, I, too, focused on attackers, but I took a slightly different approach when it came to secrets. Alexander’s presentation focused on how attackers harvest credentials when they have already had significant movement into your systems. My approaches focused on how attackers find secrets early, even in the enumeration stage of an attack.
Secrets are notoriously leaked inside source code. And this is an asset that is quite easy, or at least easier, for an attacker to gain access to as so many employees have access to it, and it's cloned into so many different locations (developers' machines, backup drives, messaging systems, wikis, etc). The biggest factor in this is because of hard-coded credentials. GitGuardian discovered 10 million secrets in public repositories on GitHub last year, and a company with 400 developers will typically have 13,000 secrets in their Git repositories. Why is this? Because handling secrets correctly is difficult, especially when we add in complicated secrets management solutions. By using environment variables and a .env file, it is a simple way for developers to centralize their secrets to a single file which is easier to protect (even just with a .gitignore file) and means that there is no excuse for ever hard coding the secret into source code. This adds more lower-level protection against attackers finding secrets to launch attacks. Essentially it reduces the risk of secrets leakage and, thus, the attack surface
So who is correct? As tempted as I am to say that I, of course, am right on the topic, like so many things in security, it really depends. Storing your secrets as environment variables means it makes it easier for an attacker to discover them once they have already breached your infrastructure but wouldn’t stop them from discovering them in different methods. Using an environment variable can help prevent secrets from being hardcoded, giving attackers fewer opportunities in the first place.
From a pure security perspective, the ideal case would be to use vaults and secrets managers to store secrets and ensure they are not inside your source code. In this sense Alexander is right. But the reality is that while mature organizations should absolutely implement this, we are still fighting basic security mistakes where developers are mishandling secrets and adding complex solutions when simple practices are not being met, which will not result in positive outcomes. One thing is absolutely true, though, secrets management remains a persistent and prominent problem in security.