GitGuardian has been scanning every single public commit made on GitHub for secrets since 2017, now we are releasing our findings in the most comprehensive study on secrets sprawl ever conducted.
The community that has been built around GitHub, the Octoverse as it has become to be known, has been fundamental in changing how we use and build open-source components and software. Today there are more than 50 million developers using GitHub, 60 million repositories created in a single year and over 2 billion commits, the size of the Octoverse is outstanding.
GitHub today has become a place for developers to showcase their work and contribute to the millions of projects that form much of the building blocks modern software development is built upon. With such a vast resource of data publicly available, as you may imagine, there is also a huge number of sensitive data that is unknowingly or accidentally pushed to the platform, namely secrets like API keys, credentials and other digital authentication strings. These secrets can be used by attackers to gain access to infrastructure, systems and PII. When these secrets are distributed through multiple systems and services it creates a problem we collectively call secrets sprawl. Because code is so widely distributed through GitHub and because git keeps a complete record of a repository's history, a public repository is arguably the worst place for a secret to end up.
But how big of a problem is secrets sprawl on public GitHub? This has been very difficult to accurately quantify, until now!
The State of Secrets Sprawl report, measures the exposure of secrets within public repositories on GitHub and how this serious threat is evolving year to year. Through the report we can show a 20% year-over-year increase with the number of secrets found in public GitHub. Interestingly, 15% of leaks on GitHub occur within public repositories owned by organizations and 85% of the leaks occur on developers’ personal repositories, this data clearly illustrates the complexity of secrets sprawl:
The majority of leaked secrets belonging to organizations, are leaked on personal repositories. These are repositories the organizations have no authority over to implement security policies and standards.
This report has been compiled using data GitGuardian obtained by scanning every single public commit pushed to GitHub during 2020. In total it was almost 1 billion commits that were scanned.
The report also outlines:
- How many secrets were discovered everyday in public GitHub during 2020
- Percentage of secrets discovered in personal vs professional repositories
- What countries are biggest leakers of secrets
- The type of secrets that are most commonly found
- File extensions that most frequently contain secrets
- Tips to prevent secrets sprawl
Want to discover more? As you might expect, there is a lot to unpack from the report. Over the coming weeks we are going to produce deep dives into each topic starting with the Top 10 file extensions secrets are discovered within. Be sure to subscribe to the newsletter to make sure you don’t miss out on these.