Today developers need to deal with an increasing amount of sensitive data, these include what we collectively call secrets, API tokens, security certificates and credentials to name a few. These are the modern day keys to the kingdom. Because these need to be handled regularly by developers they are commonly found within source code and subsequently anywhere source code lives. Git repositories are a very common place these secrets end up. There are 1,001 reasons why secrets leak into public git repositories, some we have covered in other articles of this blog. This article is going to review data gathered by GitGuardian over 2020 to uncover what file extensions most frequently leak secrets. In reviewing this information we share some insights into common causes why secrets leak and some general practices developers should implement.

What is GitGuardian

GitGuardian is a security company that scans git repositories for secrets and alerts developers when they are discovered. Since 2017 GitGuardian has scanned every public commit made on GitHub, this is nearly 1 billion commits scanned each year. The data gives detailed insights into how secrets are leaked. GitGuardian released their annual State of Secret Sprawl report which analyzed data gathered by scanning all public commits in 2020. This is a deep dive into results taken from that report, specifically around file extensions that commonly contain secrets.

Overview

Before we go into specifics of each file extension we will first review the results as a whole and outline shared similarities.

Most common file extensions to leak secrets

As you might expect, with the many programming languages, frameworks and coding practices adopted throughout the world, there is a very long list of extensions that can contain secrets, for this article we have only focused on the top 10.

From these results we can see:

  • Top 10 file extensions account for 81% of all the results,
  • The top 3 accounting for over 56% of the results.

We can separate the results into 3 categories:

  • Programming languages (Python, javascript, PHP, TypeScript)
  • Data serialization files (json, xml, yaml, .properties)
  • Forbidden or sensitive files (.env, .pem)


Programming language files leak secrets through both, hardcoded secrets within application scripts and hardcoded secrets in configuration files. Hardcoding secrets directly into source code is never a good practice and can expose secrets, in addition to being exposed into git repositories, hardcoded secrets can also be exposed through web browsers or reverse engineered from application files. All secrets should be configured with external variables and imported into a project through local memory. Application configuration files often contain important data relating to running and setting up an application and therefore should be included within a repository. But when secrets are mixed into these files they become incredibly sensitive, it is therefore very important to keep secrets centralized and separate from source code, including configuration files.

Data storage or serialization formats like XML or yaml files make it easy to store data in an organized structure which makes them very useful when storing configuration settings. As you will see in the results, secrets are very commonly found alongside configuration information which is why these file extensions commonly appear. As mentioned above, keeping secrets out of source code including configuration files is very important in the prevention of secret sprawl.

Forbidden files are files that include environment variable files (.env) and Privacy-Enhanced Mail files (.pem). These are files which are specifically used to store and send sensitive information, these are frequent causes of leaked secrets and should never be stored inside a git repository.

Twelve-Factor App says, “A litmus test for whether an app has all config correctly factored out of the code is whether the codebase could be made open source at any moment, without compromising any credentials.”

Best practices to prevent leaking secrets

Managing secrets can be complicated, there is not one universal set of best practices to adopt in all scenarios, it changes with the technology stack, size of teams and coding practices. You can read more about API best practices in our blog post and cheatsheet here. In saying that, there are good practices that should be applied in most cases.

1. Always use a .gitignore file.

A .gitignore file prevents certain files from entering a remote repository. This should include all forbidden files like .env and .pem, log files and other sensitive files as they relate to your project.

GitHub has a resource of .gitignore templates you can use for your project. These are a great starting point but you may wish to modify or create your own. You can use these files to exclude from your git repository: literal filenames, file directories and use wildcards to exclude file extensions.

Simple example .gitignore file

2. Use environment variables.

Hardcoding secrets either within your application script or within configuration files is a bad practice, where appropriate developers should use environment variables. Environment variables load data into local memory at the operating system level, this way data is never exposed to the client. In this case secrets are typically centralized in a .env file.

3. Implement security scanning of your git repositories.

It is impossible to reduce the risk of exposed credentials to 0, it is afterall a human error, we can however put protections in place so we get alerted when it does happen. In addition we want to be able to ban certain files from our git repositories such as .pem or .env file. We can do this by adding real time secret and rule based scanning to our repositories. GitGuardian offers a product to protect public and private repositories to developers for free.

Implement repository scanning today

Reviewing each file extension

Now we have the general results out of the way, we can dive into the details of each scenario. To discover common examples of how secrets are leaked, best practices and resources click "More Info" on the desired file extensions.

1 .py - Python files

Python is the second most popular language used in GitHub, so it is no wonder that it is featured into our list. It may however be surprising to some that python actually takes over Javascript (the most popular language on GitHub) as the most common source of leaked secrets.

Percentage of total secrets Common filenames
    .py file extensions make up 27.9%
    Python leaked secrets
    • settings.py
    • main.py
    • application.py
    • config.py

    When it comes to python, and other programming languages, we see two common culprits for exposing secrets: secrets that are hardcoded directly within the python script and secrets hardcoded in configuration files. An example of hardcoding secrets directly into a python script is below. The problem is that anyone with access to the script has access to the secrets within it.
    Hardcoded python secrets
    Because the secrets aren’t stored centrally it also increases the likelihood that they will become sprawled in multiple files and generally makes it hard to control their movement and who has access to them. Another common way to store secrets to avoid having them directly into the script is to store them in a configuration file and then importing them.
    Hardcoded python secrets - Configuration file
    Hardcoded python secrets - Imported

    Handling secrets this way means that the config file becomes very sensitive and cannot be shared within a git repository or you risk exposing the secrets. Best Practices
    Where appropriate, use environment variables to load secrets, this imports them into the localized memory at the operating system level so they do not become exposed to the public. Storing them in a .env file means they can be kept separate from the other relevant code you want to share.

    Resources


    2 .js - Javascript

    Javascript remains the most popular language on GitHub so it is not so surprising that the .js file extension is the second highest contributor for leaked credentials.

    Percentage of total secrets Common filenames
      .js file extensions make up 18.8%
      Javascript secrets leaked
      • app.js
      • config.js
      • dev.js
      Javascript remains the most popular language on GitHub so it is not so surprising that the .js file extension is the second highest contributor for leaked credentials.


      With the addition of Node, javascript became both a frontend and backend language. While hardcoding secrets in source code can happen at both the front and backend of projects, with javascript we can also often see exposed secrets inside an embedded javascript script embedded within other extensions such as HTML. These segments don’t appear within our results as they would fall in the html extension.
      Example of a inline javascript API key exposed in a .html file
      HTML Javascript Hardcoded Secrets
      In addition to being exposed within your git repository, it is important to remember that anything passed in plain text to the client or browser is visible to the public. So regardless if you are embedding the secret in a html file or hardcoding the secrets in a .js file and calling them, if not handled correctly they will be exposed to the client.
      Turning away from frontend javascript we can look into Node.js. Just like what was discussed in python, there are two core culpurates for exposing these secrets. Hardcoding them directly into the application script and hardcoding them into configuration files.
      Example node.js script with hardcoded API key
      Javascript Hardcoded Secrets

      The best practice is to use environment variables that are operating system level variables whose value can be used by one or more applications. As the value remains in the system, there's no risk of exposing credentials through your code. The most popular package for storing environment variables is the javascript dotenv package. Again, make sure the .env file is included in the .gitignore file.
      Resources
      GitHub Node .gitignore templates
      Javascript dotenv package for storing variables and environment variables


      3 .env - Environment files

      Coming in third are .env or environment variable files. Environment variables are stored at the operating system level and therefore live in long term memory. They are imported in a shorter-term memory at runtime when they are used by the application.

      Percentage of total secrets Common filenames
        .env file extensions make up 9.7%
        Environment Variables Leaked Secrets
        • .env

        Coming in third are .env or environment variable files. Environment variables are stored at the operating system level and therefore live in long term memory. They are imported in a shorter-term memory at runtime when they are used by the application.


        Strictly speaking, not a file extension, but still a dominant contributor to leaks so important to mention. Having just spoken about .env files as a best practice for storing secrets it may seem counterintuitive having these on the list, but the key distinction here is that .env should never enter a git repository. This is the first example of our forbidden files.

        Using packages like py-dotenv allows you to store secrets as variables within a .env file and load them into local memory, they can then be used throughout the project. This does however mean that the .env file is extremely sensitive. Like all things, individual requirements of the project need to be considered when deciding how and where to store secrets. In any case your .env file should never appear in your .git repository and be tightly protected. Getting into the habit of making sure this is included in your .gitignore file even if there is no sensitivity is a good habit. You should also consider implementing protections on your git repository like adding a GitHub action to prevent .env files into your repo (see GitGuardians solution for this).

        Example .env file
        Example env file with secrets


        4 .json - JSON files

        Coming in fourth is JSON, a popular data serialization file for storing and sending data. JavaScript Object Notation is an open standard file format that uses human-readable text to store and transmit data value pairs and array data types.

        Percentage of total secrets Common filenames
          .json file extensions make up 7.5%
          Secrets inside .json files
          • default.json
          • config.json
          • appsettings.json
          • credentials.json


          Many API and SaaS providers offer quick setup files to allow you to quickly download your credentials to plug into your application including Google and Salesforce to name a few. These often come in the form of a JSON file, common names for these downloadable files can include credentials.json or config.json. While these files should definitely not be included in a git repository, secrets should be centralized in a single location, such as an environment file, that is easier to track and protect. Often these files are imported into the project and then accidentally included into the git repository.

          Another common contributor, which will be sounding very familiar now, is including secrets inside application configuration files. JSON is widely used and most languages have built in JSON libraries, this means developers will often be familiar with JSON regardless of the specific technology stack. JSON carries the same issues as all configuration files. Store secrets centrally and pass them to your application at the operating system level (such as using environment variables).


          5 .properties - Properties files

          .properties is a file extension mainly used in Java related technologies to store the configurable application settings.

          Percentage of total secrets Common filenames
            .propertise file extensions make up 4%
            properties files leaked secrets
            • db.properties
            • application.properties


            A very common cause of leaked secrets in .properties files is through native android development (java) and in java based spring applications. We will focus on these two areas.
            Android: .properties files are used often with android development and tools like Android Studio make it very easy to store and visualize data inside .properties files, making it convenient for developers to store secrets within.
            The best practices for hiding secrets in Android development is to use the android keystore system which uses “cryptographic keys in a container to make it more difficult to extract from the device”. But another common way to store secrets is to hide them in a BuildConfig object which uses the .properties extension. These keys can be then used anywhere within your source code with the BuildConfig object provided by Gradle. Not only do these files frequently get committed into a repository, these files can still be decoded via reverse engineering from a compiled APK. As a rule, any files containing secrets should be added to a .gitignore file but in this case, consider using androids built in keystore to store secrets more securely. (Read more here)

            Spring: Spring applications also regularly use the .properties extension to store configuration settings which can include secrets, particularly when storing database properties. A common file to contain secrets often associated with Spring is the application.properties. This is a built-in mechanism for application configuration and is a basic key-value text file making it convenient to store secrets within. This is a poor practice to maintain and again, environment variables should be used and secrets saved in a .env that is added to the .gitignore file as a minimum. See external configuration for more information.
            Example application.properties file with database credentials
            Hardcoded secrets in properties file


            6 .pem - PEM files

            Privacy Enhanced Mail (PEM) files are today used as a format to store securely cryptographic objects ranging from private/public keys to certificates in a serialized way.

            Percentage of total secrets Common filenames
              .pem file extensions make up 3.6%
              Leaked secrets .pem files
              • private.pem
              • privtaekey.pem


              .pem files can contain both private and public keys but in all cases, these are not appropriate files to store inside .git repositories.
              .pem files should be included in your .gitignore file but you should also make sure that you are scanning your git repositories for such files so if any do get accidentally included you will be alerted.
              Add rules to protect your repository now with GitGuardian.


              7 .PHP - PHP files

              PHP is a general-purpose scripting language that is especially suited to web development and is currently the 4th most popular programming language on GitHub. Secrets are commonly found in .PHP extensions when they are hardcoded into the application files or included in a configuration file that is included in a git repository. This is the same as other programming languages but there are some additional considerations to think of that are specific to PHP.

              Percentage of total secrets Common filenames
                .php file extensions make up 2.2%
                secrets leaked through php files
                • index.php
                • config.php


                Example of secrets hardcoded into .php script

                Storing secrets in a .env file with PHP is a much better practice than hardcoding them into an application script. In addition to protecting the .env file in a git repository, you must also be careful that the .env file is not inside the root directory when it is being hosted. By default, anything that's inside of your document root is directly readable by the outside world. In this case, if the choice was made as to if the .env file was put inside of the document root. Someone could hit http://mycoolsite.com/.env and be able to access this file directly . If the root is /var/www/mycoolsite/public then you must move the file up one level at /var/www/mycoolsite/.env. This makes it so that PHP can still access the file but it can't be reached via the web.
                Example of php file with secrets imported from environment variables


                The other common file we see in PHP is a configuration file. Usually this is a database configuration and many different applications use this configuration file including wordpress (wp-config.PHP) and PHP-Nuke. If you are using a PHP configuration file make sure you still use environment variables stored in a .env file, do not hardcode the secrets directly into the configuration file and ensure the .env file is included in the .gitignore file.
                Resources
                Dotenv package for PHP applications
                GitHub collection of .gitignore templates


                8 .xml - XML files

                Extensible markup language, XML was designed to store and send data, its primary job is to separate information from presentation. It has no predefined variables meaning you can define your own, this makes it an attractive choice to store configuration data.

                Percentage of total secrets Common filenames
                  .xml file extensions make up 2%

                  • config.xml
                  • strings.xml


                  A consistent contributor to secrets within xml files is again in android development. This is a file format regularly used to store data and can be easily edited from within Android Studio. The risk with XML files is much larger than just storing them in git. Keeping secrets inside strings.xml for example allows hackers to also extract them from a compiled APK (android application file).
                  The best practices for hiding secrets in Android development is to use the android keystore system which uses “cryptographic keys in a container to make it more difficult to extract from the device”. Don’t get tempted to store secrets inside .properties file. While this is more secure than storing it in XML files, your API key can still be decoded by someone via reverse engineering. Hence, this isn't a very secure way to store your secrets.


                  9 .yml & .yaml - Yaml files

                  YAML is a human-readable data serialization standard that can be used in conjunction with all programming languages and is often used to write configuration files.

                  The recursive YAML acronym stands for “YAML Ain’t Markup Language" denoting it as flexible and data-oriented.

                  Percentage of total secrets Common filenames
                    .yaml file extensions make up 2%

                    • config.yml
                    • .travis.yml
                    • docker-compose.yml
                    • secrets.yml


                    YAML can be used with nearly any application that needs to store or transmit data. Its flexibility is partially due to the fact that YAML is made up of bits and pieces of other languages. A few examples of these similarities include:

                    • Scalars, lists, and associative arrays are based on Perl.
                    • The document separator “---” is based on MIME.
                    • Escape sequences are based on C.
                    • Whitespace wrapping is based on HTML.


                    Of course at the scale of GitHub, there are lots of examples of secrets appearing in YAML files. One of the more frequent contributors is a result of CI/CD which uses YML for declarative infrastructure. Some examples are CircleCI and Travis CI which both use .yml files to describe infrastructure configurations and testing instructions.

                    Travis CI: When you are using travis CI to deploy consider the following to define a variable: Read more about the .travis.yml file here.
                    Docker: Secrets are also frequently found inside the docker-compse.yml files. Sometimes, you may compose a docker image that needs a build secret, for example to download a package from a DevPi repository. These secrets can be added into the docker-compose.yml file, but it is a much better practice to use docker secret so that your compose file can be included in a repository without fear of leaking secrets. Learn more about using secrets in docker compose files here.
                    Ruby: Another file type that can be often found is secrets.yml (although the name should be a give away it doesn’t belong in git). Since version 4.1 rails (a ruby framework) has generated a default file in its configuration file called secrets.yml, this file contains a SECRET_KEY_BASE that is used to derive keys for encrypted cookies and HMAC signed cookies. However, you could add additional keys to this file. This file should never appear in your git repository and make sure it is always added to your .gitignore file if developing in ruby.


                    10 .ts - TypeScript files

                    TypeScript is an enhanced version of javascript with some additional features including the ability to add static typing.

                    Percentage of total secrets Common filenames
                      .ts file extensions make up 2%

                      • app.module.ts
                      • environment.ts


                      TypeScript is compiled to JavaScript. Therefore, TS can be used anywhere JS could be used: both the frontend and the backend. Because TypeScript is a subset of javascript then it also is subject to many of the common situations the .js file extensions are affected from. One of the most popular frameworks based on typescript that contribute to data leaks is Angular which can be seen with the frequency environment.ts is a culprit for leaks. This is a default file created to store environment variables within an Angular application. It is common to see secrets included in the environment.ts file but this is a bad coding practice, just like within javascript all secrets should be stored in the .env file.

                      Wrap up

                      You made it to the end! By now you can see why we only used the top 10 file extensions. Each extension, just like each language or framework has its own set of rules that relate to it. Like everything in programming, coming up with universal truths that cover everything is difficult. But in saying this, we can also find common causes of leaked credentials that apply in multiple cases.

                      If we want to prevent our secrets from leaking then we must factor in the general advice given at the start. Always include a .gitignore file and ban certain file types, centalize secrets, use environment variables where possible and implement secret scanning to detect when they leak. But after going through all the different examples, you should also see now that it is important to take a step back, and really identify a secret management system and practices that works for your team, your technology and your coding practices. It won’t be the same for everyone but secrets are the keys to the kingdom, we want to protect them carefully, this requires thoughtful consideration.