Secret detection for internal code repositories

Despite being widely considered to be a very bad practice, secrets stored in internal Version Control Systems is the current state of the world. But why is that?

API keys, database connection strings, private keys, certificates, usernames and passwords, … As organizations embrace the power of cloud architectures, SaaS integrations and microservices, developers handle increasing amounts of sensitive information, more than ever before.

To add to that, companies are pushing for shorter release cycles to keep up with the competition, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographies…

As a result, secrets are spreading across organizations, particularly within the source code. This pain is so huge that it was even conceptualized under the name “secret sprawl”.

At GitGuardian, we’ve been monitoring every single commit pushed to public GitHub since July 2017. 2.5 years later, we’ve sent over 500k alerts to developers, including pro bono alerting:

@GitGuardian is like the app that warns you that YOU LEFT A BABY IN YOUR CAR!!!!!!!!!! https://t.co/JA5GErEdSC
— Bald Mike (@BaldMikeSays) July 22, 2019

@GitGuardian is like your secret friend that will warn you before you get into trouble. https://t.co/K5sxiLsTRU
— Aditya Agarwal (@aditya81070) July 22, 2019

@GitGuardian is a life saviour for any developer! It detected and warned me about a production API key which I had mistakenly pushed on GitHub. https://t.co/117dPec7uN
— Faisal Alam (@ifaisalalam) July 19, 2019

After months of product iteration with security teams and developers, we’re now proud to officially introduce GitGuardian for internal repositories!

Credentials in private repositories: how much should you care?

Secrets stored in Version Control Systems is the current state of the world, yet VCSs are not a suitable place to store secrets for the following reasons:

Everyone who has access to the source code has access to the secrets it contains. This often includes too many developers. It would just take a single compromised developer’s account to compromise all the secrets they have access to!
You never know where your source code is going to end up. Because of the very nature of the git protocol, versioned code is made to be cloned in multiple places. It could end up on a compromised workstation, be inadvertently exposed on public GitHub, or released to customers.

Storing secrets in source code is a bit like storing unencrypted credit card numbers, or usernames and passwords in a Google Doc shared within the organization: good friends would not let you do this!

Good friends don't let their friends push code without GitGuardian @GitGuardian https://t.co/lyRmlXOR8b
— Ami Amigo (@amiamigo97) December 15, 2019

As a developer or security professional, what should I do after a secret was pushed to a centralized version control?

Every time I see a secret pushed to the git server, I consider it compromised.

From one developer to another :)

When a secret reaches centralized version control, it is always a good practice to revoke it. At this point, depending on the size of your organization, remediating is often a shared responsibility between Development, Operations and Application Security teams.

Indeed, you might need some special rights and approval to revoke the secret, some secrets might be harder to revoke than others, plus you must make sure that the secret is properly rotated and redistributed without impacting your running systems.

Apart from that, depending on your organization’s policies, you might want to clean your git history as well. This will require a ‘git push --force’, which comes with some risks as well, so there is definitely a tradeoff to consider, with no correct answer!

(Hint: if your secret is buried deep in your code, BFG Repo-Cleaner is a great Open Source project to help you get rid of it without having to use the intimidating ‘git-filter-branch’ command. Plus it is in Scala! We have Roberto Tyley to thank for this.)

When should I do secret detection?

With the nature of git comes a unique challenge: whereas most security vulnerabilities only have the potential to express themselves in the actual (and deployed) version of your source code, old commits can contain valid secrets, including deleted secrets that subsequently went unnoticed during code reviews.

First, you want to make sure that you start on a clean basis by scanning existing code repositories in depth.

Then, you want to continuously scan all incremental changes, ie every new commit in every branch of every repository.

When to do incremental scanning?

In his presentation about “Improving your Security Posture with the Cloud”, Sébastien Stormacq, Developer Evangelist @ AWS, advocates to implement security checks post-event in every case, and pre-event when possible.

We at GitGuardian share Sébastien's views. You should always implement automated secrets detection server side, in your CI/CD for example or via a native integration with GitHub / GitLab / Bitbucket repositories. Also, it is good to encourage your fellow developers to implement pre-commit hooks, but we often hear that this is hardly scalable across an entire organization.

Try it out!

Our product will allow you to scan existing code as well as incremental changes, and benefit from secrets detection algorithms that were battle-tested at scale on the whole public GitHub activity for over two years! GitGuardian has a native integration with GitHub (GitLab and Bitbucket coming soon), and there is an on prem version available.

We offer a free version of our solution for individual developers and Open Source organizations, as well as a free trial for companies that you can access in SaaS here: https://dashboard.gitguardian.com/auth/signup.