Today, we're excited to share the State of Secrets Sprawl 2024 report, GitGuardian's annual deep dive into the secrets exposed on public GitHub repositories. This year's findings issue a stark reminder of the escalating challenge we face, with a staggering 12.8 million new secrets leaked in 2023—a 28% increase from the previous year.

Secrets Sprawl on GitHub

Our research, now more comprehensive than ever, shows the rapid growth of exposed secrets since our first report in 2021, quadrupling in number. With GitHub's repository count growing by 50 million in just the last year, the risk of both accidental and deliberate secret exposures skyrockets.

In 2023 alone, GitGuardian's vigilance over 1.1 billion new commits revealed that secret sprawl is not just widespread but deepening, affecting a vast range of industries from IT to Education, Retail, and Finance:

  • 7 commits out of 1,000 exposed at least one secret;
  • 4.6% of active repositories leaked a secret;
  • 11.7% of authors who contributed leaked a secret.

GenAI Secrets Leaks

A particularly alarming trend is the 1212x surge in OpenAI API key leaks, spotlighting the growing allure of AI services among developers—and the risks that accompany their popularity.

Looking at other AI services, another trend emerged: the slow but steady rise of open-source AI:

While OpenAI leads by a wide margin, more and more HuggingFace tokens have been seen on GitHub

A Hidden Threat: Zombie Leaks

An alarming revelation from our report is the persistence of "zombie leaks," with over 90% of exposed secrets remaining active five days post-leakage.

Share of credentials still valid after being exposed over time

This negligence, often resulting from deleting leaky commits or privatizing repositories without revoking the exposed secrets, creates a gaping security vulnerability.

“Developers erasing leaky commits or repositories instead of revoking are creating a major security risk for companies, which will remain vulnerable to threat actors mirroring public GitHub activity for as long as the credential remains valid. These zombie leaks are the worst,” said Eric Fourrier, CEO and Founder of GitGuardian.

And More...

Our investigation also delves into:

  • The most sensitive file types on GitHub
  • The fastest-remediated secrets
  • The use of DMCA notices to stop leaks
  • The potential of Large Language Models (LLMs) in secret detection
  • The intersection of private and public secret leaks
  • Secrets exposure within Python's official package management system, PyPI
  • Strategies to combat secrets sprawl

Read the Press Release

For developers, security professionals, and decision-makers, this report is an essential tool in safeguarding application and data integrity.

Join us for a webinar on March 28th at 11 AM EDT, where we'll present the State of Secrets Sprawl 2024 report in detail. Don't miss this opportunity to gain deeper insights and engage with experts on how to tackle the challenge of secrets sprawl effectively.

Stay tuned for more updates from GitGuardian as we continue to monitor the ever-changing threat landscape and provide the latest insights and recommendations to help you stay ahead of the curve.