Developers are building features at an unprecedented speed using what they need from the software ecosystem. These ever-expanding options include open-source libraries and packages, SaaS tools, deployment systems, cloud services, and more. To keep things secure, we always need the same thing: a secret.
What is a secret?
Secrets are digital authentication credentials (API keys, certificates, and tokens) used in applications, services, or infrastructures. Just like a password (plus a device in case of MFA) is used to authenticate a person, a secret authenticates a system to enable interoperability.
Why are secrets a problem in CI/CD environments?
Software engineers need to handle more and more credentials as they use CI/CD pipelines to deploy artifacts, apps, and infrastructure to multiple environments. There are many places where secrets can be insecurely exposed:
- Source code
- Build, test, or deployment CI/CD workflows
- Container image layers
- Runner console output
Leaked credentials aren’t just a security problem; rotating a leaked secret interrupts CI/CD workflows.
How do secrets end up in source code?
Two words: human error.
The vast majority of leaked credentials are mistakes and do not spring from malicious intent. Hardcoding credentials can be a temporary solution, and sometimes developers don’t realize Git actually keeps track of a deleted secret. New developers don’t know proper procedures, or a test is skipped. The list of possible mistakes is enormous.
Read more about secrets sprawl in the 2022 State of Secrets Sprawl report.
Why are hardcoded secrets different than other types of vulnerabilities?
Unlike other vulnerabilities, which pinpoint a specific weakness in code, detecting secrets requires the whole codebase history of a project.
There are two possibilities when a developer mistakenly commits a secret: either the case is acknowledged, or it is not.
In the former case, one very common mistake would be to delete it and simply commit the change. The secret disappears from the current state of the source code, but it is still in the commit history!
In the latter case, it is likely the secret will reach the remote version control system (VCS). At that point, the secret would already be considered leaked (best case scenario, it would be detected at the code review stage, but the secret may already need to be rotated at that point).
It is not uncommon to find valid secrets hidden deep inside the codebase history. Secrets detection needs to take into account this attack surface and scan for incremental changes to the repository to prevent these kinds of leaks.
Getting started with GitGuardian
In this tutorial, you will learn how to add GitGuardian real-time monitoring to a CircleCI workflow to scan every new commit for secrets.
GitGuardian detects secrets in your repositories in the history or in incremental commits. Secrets detection occurs at multiple stages of the development lifecycle: on the developer’s local machine with pre-commit hooks or a pre-push hook, in a pre-receive hook or in a CI environment.
With the GitGuardian dashboard, visibility is enabled company-wide to secure all the repositories at once.
The dashboard also empowers developers and AppSec engineers to collaborate through the full remediation process. We will not cover this in the tutorial, but you can learn more in the documentation.
To follow this tutorial, you will need:
Fork the sample repository
In this tutorial, you will use a
sample_secrets test repository from GitGuardian. This repository contains a variety of secrets for testing purposes. Fork it to your GitHub user account or to a GitHub organization where you are an admin.
Then, open the CircleCI Projects page, click the
sample_secrets name, then select Faster: Commit a starter CI pipeline to a new branch.
This creates the new branch
circleci-project-setup in the repository, containing the demo workflow
say-hello-workflow, configured by the
Create a GitGuardian API token
You need a GitGuardian API token to use the GitGuardian orb. From the GitGuardian dashboard, go to API > Personal access tokens and then click Create Token. Give the token a
scan scope and a memorable name:
Copy the token and keep it handy; it’s the only time you can view it.
Note: If you are under GitGuardian’s Business plan or the 30-day Business trial, create a service account instead of a personal access token. A service account is a special type of API key intended to represent a non-human user like a CI runner. To create one, go to API > Service accounts and follow the same steps.
From the CircleCI dashboard, click the
sample_secrets project, then Project settings > Environment Variables. Click Add Environment Variable. Name it, and give it the same value as the token you copied earlier.
- GitGuardian Personal access tokens docs
- GitGuardian Service accounts docs
- CircleCI environment variables docs
Scan incremental changes with ggshield
You now need to add a workflow in your CircleCI
config.yml to use the
Copy and replace the file with this:
version: 2.1 orbs: ggshield: gitguardian/ggshield@volatile workflows: scan_my_commits: jobs: - ggshield/scan: name: ggshield-scan base_revision: <<pipeline.git.base_revision>> revision: <<pipeline.git.revision>>
You can also find this snippet on the
ggshield orb registry page.
revision values will be populated when the pipeline is triggered:
base_revisionis the commit ID of the first commit to scan.
revisionis the ID of the last commit to scan.
In this configuration, only the latest commits are scanned, which is convenient for a CI pipeline. You might not want to scan the whole git history on every pipeline launch. The scan operates on all the commits since the last revision to ensure that no secrets were committed and then deleted.
When you are done with the
config.yml file, commit it, push it, and go to the CircleCI dashboard to watch the pipeline launch. You may have to accept using third-party orbs in Organization settings > Security > Orb Security Settings if this is the first time you have used them.
Click the job to learn that there are Commits to scan: 1.
#!/bin/bash -eo pipefail ggshield secret scan -v ci CIRCLE_RANGE: dea39f827dfe23f06f4ea63d7fb16ab0c363db9d...90220851160dcf018f372536da223dc0396aa247 CIRCLE_SHA1: 90220851160dcf018f372536da223dc0396aa247 Commits to scan: 1 Scanning Commits---------------------------------] 0%Scanning Commits [####################################] 100% secrets-engine-version: 2.71.0 No secrets have been found commit 90220851160dcf018f372536da223dc0396aa247 Author: *** Date: *** CircleCI received exit code 0
To verify the shield is working as expected, just commit a single change to one of the test repository’s files. For example, open the
sample_secrets/bucket_s3.py file and append or remove trailing whitespace, then commit this change (be sure to be on the
This will fail because ggshield will scan the latest commit and detect two secrets in the file:
#!/bin/bash -eo pipefail ggshield secret scan -v ci CIRCLE_RANGE: 90220851160dcf018f372536da223dc0396aa247...2d08c13226628ecfb3ee9a07001c185915a84adf CIRCLE_SHA1: 2d08c13226628ecfb3ee9a07001c185915a84adf Commits to scan: 1 Scanning Commits---------------------------------] 0%Scanning Commits [####################################] 100% secrets-engine-version: 2.71.0 commit 2d08c13226628ecfb3ee9a07001c185915a84adf Author: XXXX Date: XXXX 🛡️ ⚔️ 🛡️ 2 incidents have been found in file bucket_s3.py >>> Incident 1(Secrets detection): AWS Keys (Validity: Invalid) (Ignore with SHA: 9f2785cab705507aaea637b8b38d8e1ff9ce8a4334dda586187cbb018ed33163) (1 occurrence) 8 8 | 9 9 | def aws_upload(data: Dict): 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkja") |_____client_id____| 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkja") 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkjb") 11 11 | database.push(data) >>> Incident 2(Secrets detection): AWS Keys (Validity: Invalid) (Ignore with SHA: e8077f59453457d2b3d980be4d8655eaa901c7aa8810a6079b429477e07a57f9) (1 occurrence) 9 9 | def aws_upload(data: Dict): 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkja") 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkjb") |_____client_id____| 10 | database = aws_lib.connect("AKIA************WSZ5", "hjshnk5**************************89sjkjb") |_____________client_secret____________| 11 11 | database.push(data) Exited with code exit status 1 CircleCI received exit code 1
Validity: Invalid tells you two things:
- The secret could be checked (this is not always the case).
- The secret isn’t valid anymore.
Going further: scanning the commit history
But what if you would like to scan all past commits for secrets? The historical scan was done for you by GitGuardian when you forked the
sample_secrets repository (this is the default behavior).
Go to your GitGuardian dashboard and search for the
sample_secrets source on the Perimeter page. You should see that GitGuardian detected nine open secret incidents in the repository.
If needed, you can Scan the selected source again.
Click the source to display the Table of secrets. Incidents detected during a historical scan are tagged.
You can scan any arbitrary git history with the command
ggshield scan repo, but there is no dedicated orb for it.
Going further: remediation and developer workflow
If you made it this far, congratulations! You can be sure that any secret committed to this repository would break the pipeline and be reported in the dashboard, along with all the other past incidents. You can read more about how to leverage them to assign incidents, collaborate, and organize the cleaning of your repositories’ leaked secrets.
Here is a recommendation we give to all GitGuardian users: prevention will always be preferable to remediation, so aim at integrating secrets detection as early as possible in the developer workflow.
To understand why, imagine for a moment that a widely used secret is detected in the CircleCI workflow by GitGuardian. Best practice would be to immediately revoke and rotate it as if it was compromised, even if it wasn’t. But the truth is that rotating a secret is almost always a tough job. It could mean workflow interruptions for many people. It could cause unexpected failures all along the CI/CD chain, or even in production.
That’s why we always advocate for integrating GitGuardian in the developer workflow with ggshield as a pre-commit, pre-push (client-side), or pre-receive (server-side) hook, making sure no secret can reach the version control system in the first place.
You can also integrate GitGuardian natively into source control management platforms:
This tutorial demonstrated how easily secrets can be leaked. Unlike runtime vulnerabilities, leaked secrets can persist in old commits and represent a real threat. That’s why using a secrets detector in your CI workflows is a must-have for code security.
This awareness is an essential first step toward building a culture of shared responsibility between security, operations, and developers for preventing production issues, keeping pipelines running, and remediating issues as soon as possible.