Twitter’s leak 2023 - why source code should never be sensitive

What happened?

This week it was announced that internal source code from Twitter was leaked publically on a GitHub repository belonging to the GitHub user FreeSpeechEnthusiast. The user created their account on January 3rd committing only a README file but between Jan 4th and March 10th they would make 47 commits containing thousands of documents with Twitter's internal source code.

Jan 4th 30 minutes after the first commit, they committed a large amount of source code inside a directory called abp with a commit named ‘chunk 0’ then months later on
February 25 they committed hundreds of new documents across 23 commits
March 8th 3 more commits were made containing thousands of documents
March 9th10 more commits were made including new directories named ‘t-cmd’ and ‘ua-parser’
March 24th Twitter made a DMCA request to GitHub, who later removed the source code

Is this malicious or a mistake?

It is impossible to say definitely. It is easy to accidentally push code to the wrong repository and accidentally leak it publically. It is, for example, possible that this person was backing up code in personal repositories to work on it from home (this is not a good practice but also not malicious). However, the name of the user “FreeSpeechEnthusiast” is a clear reaction to Elon Musk's (owner of Twitter) comments on free speech during the controversial period when Musk bought Twitter and began firing many of Twitter's employees.

What is really interesting is that the commits were made over a long period of time (3 months) and therefore could indicate that this is a disgruntled worker that is currently or at least was recently employed at Twitter. Of course, this is just speculation.

What is the risk associated with leaking credentials?

There are lots of potential risks when it comes to leaked private source code. Public Source code (open source code) is not always a risk so long as it is intentionally public. Many people argue that open-source code is more secure than private. The problem is when code that is assumed to be private is made public accidentally or maliciously. In this case, code doesn’t come under the same level of scrutiny and could therefore expose lots of vulnerabilities.

Some common examples of vulnerabilities exposed from accidental publishing of source code are:

Exposed Secrets - it is very common for secrets like passwords, API keys and certificates to be inside source code. This is especially true when using version control like git because secrets are quite often buried in history.
Exposed Logic flaws - There may be vulnerabilities in the way Twitter handles functions and data which could be present in the source code.
Exposed Application architecture - Often we expect the architecture of our applications to be hidden, a concept called security by obscurity. It’s an additional layer of security. When source code is exposed it can lead attackers to a map of how our applications work giving them the opportunity to find hidden assets.

There are lots more examples. Key to the point is that code that is assumed private and made public often contains lots of vulnerabilities.

What is next?

At this stage, the code has been taken down from GitHub but the nature of anything that has been public on the internet means that it surely will exist in many different places outside of GitHub and Twitter as well. Therefore it is possible that we will see some security vulnerabilities be pursued by attackers assuming they exist.

The identity of the user has not been revealed however Twitter is now attempting to use a subpoena to force GitHub to provide identifying information regarding the FreeSpeechEnthusiasm user and anyone who accessed and distributed the leaked Twitter source code, which would be used for further legal action.

"All identifying information, including the name(s), address(es), telephone number(s), email address(es), social media profile data, and IP address(es), for the user(s) associated with the following GitHub username: FreeSpeechEnthusiast. Please include all identifying information provided when this account was established, as well as all identifying information provided subsequently for billing or administrative purposes.

Excerpt from a Twitter legal filing to GitHub

Key Takeaways

What is key in this story is understanding that source code is a very leaky asset but can also contain lots of sensitive information, making it a key target for attackers. Source code is accessed by many different employees, it is cloned and backed up into many different places and it has extremely limited logging capabilities to see who has accessed it when it has been leaked. Finally, it is important to remember that while rare, disgruntled employees can act maliciously leaking your code or selling access to a hacker.

To prevent source code from becoming a security risk, it is a much better option to adopt a posture that assumes your source code will be leaked than one that spends time trying to protect a leaky asset. Some key takeaways from this:

Before you go

Here is another breach investigation showing how sensitive code can be leaked: Sumo Logic Breach Shows Leaked Credentials Still a Persistent Threat

Want to learn more about the problem of hardcoded credentials? Read our State of Secrets Sprawl 2023 report or request a complimentary audit of your secrets exposure (Right into your inbox. No sales call needed).