Detecting secrets leaked on GitHub for 18 months

November 2018: Here is what we’ve learned, achieved, and what’s coming next.

How we reached more than 160k developers and 80 Fortune 500 companies

When Eric and I started out GitGuardian in July 2017, we were working full time together as data scientists and software engineers.

GitGuardian was that leisure activity that thrilled us every time we wrote a line of code. That thing that we were building and that made us so happy because we knew for sure we were creating some value and happiness for our users:

Black hat activity is real on GitHub. In November 2017, Uber acknowledged a massive hack that occured in 2016 and exposed data of 57M users and drivers. Just credentials uploaded on GitHub.

Since our beginning we are working hard to make bad guys' life a bit tougher. This is both fun and demanding.

We started collecting all the public commits pushed to GitHub -in real time-. With so much data at hand and a bit of manual inspection, we found orders of magnitude regarding the scope of the problem.

So how many credentials do you think we found per day?

There are more than 3,000 leaks / day out of the 2.5M public commits / day. This is more than one leak every thousand commits.

But frequency is not the only concern. Credential leaks are not easy to detect generically. There are thousands of API providers out there and a lot of keys are hardly distinguishable from unsensitive information like database IDs.

We use a combination of regular expressions, entropy statistics and Machine Learning to achieve good performance. We’re constantly improving our algorithms thanks to the labelling of our users:

Now we are at the point where we reached over 160,000 developers and 20,000 businesses of all sizes around the world including more than 80 Fortune 500 companies.

This was a great opportunity for us to share knowledge and stand points with great developers and IT professionals at Airbnb, Couchbase, Datadog, … As companies cannot ignore the issue of leaked credentials anymore, some of them go as far as developing software internally -experiencing difficulties maintaining it and updating it-.

What’s next?

#developers-first #cloud #cybersecurity

Continue to grow our developers community

If you're a developer, there is no way that secrets in source code is an issue that you are not concerned about.

The only scalable way to handle this issue and a lot of security concerns in general is to build that culture so that developers want to do the right thing.

Developers must and will own security. To achieve this we are empowering every developer with security tools that are both transparent in what they are doing and easy to use.

Transparent because we invite developers to have a look at the work we do and use this work and judge it so that it can be improved. Also share it so that it raises awareness amongst the community.
Easy to use because we respect developers’ time. The surest path to a poor security is to prevent people from doing their job.

Scale our business securing your cloud and SaaS integrations

Witnessing the massive cloud and SaaS adoption combined with an increasing development velocity and the security concerns it raises, we cannot be wrong positioning ourselves in the cloud cybersecurity market -tackling it developers-first-.

So first things first: the current state of the world is that a lot of secrets are checked in source control -private and sometimes even public-.

Let’s face it. Secrets are shared via private messages. They are found in unexpected places such as log files, command line history or the ipython history of one of your most widely accessed server (go and check!).

As often in security, there’s no silver-bullet solution. We encourage a layered security approach. Here is what we're working on:

Detect secrets in public code: the worst place for them to be
Detect secrets in private code: your first line of defense
Detect and map secrets anywhere in your containers, infrastructure, messaging systems, …

We’ve got great news, data and facts to share… So stay tuned!