On March 29, which seemed to be another normal Friday, a Microsoft developer shocked the world by revealing an XZ Utils (data-compression utilities) backdoor. This backdoor could potentially enable unauthorized access via SSH and remote code execution (read the full story here).
But wait a minute, because how on earth does compression have anything to do with SSH access? Short answer: dependencies. Part of the XZ Utils is a compression library liblzma
, which isn't used directly by OpenSSH, but Debian and several other distributions patch OpenSSH to support systemd
notifications, and systemd
links to the libsystemd
C library, which depends on liblzma
.
Got it? No? Don't worry, because dependency is complicated, and that's one of the reasons why the attack happened in the first place.
But wait a minute again, because how could the project maintainers miss the malicious commits in the first place? It appears this backdoor was well-planned three years ago when a user started working on open-source projects. Over the years, they gained control of the project by becoming increasingly involved, gaining trust, and pressuring the founder.
Luckily, the backdoor's blast radius wasn't huge because although XZ Utils is included in many Linux distributions, the malicious version hadn't been widely deployed; it was only present in development versions of major distributions. Still, the attack served as a timely wake-up call in the open-source security world.
So, today, let's examine open-source software security: what it is, why you should care, and how to improve it.
An Introduction to Open Source Software Security
What is Open Source Software Security
I could refer to Wikipedia for a formal definition of this subject matter, but I guess it won't interest you that much. Just in case you wonder what the definition is: "Open-source software security is the measure of assurance or guarantee in the freedom from danger and risk inherent to an open-source software system."
To explain it in my own words, open-source software security is two things:
- the risks that come with third-party, open-source software;
- the tools and measures taken to improve open-source software's security.
Why You Should Care about Open Source Software Security
Although the XZ Utils backdoor story sounds terrifying, many people are actually pretty confident about open-source software security for some reason.
You may think you don't use open-source software, so you don't care about its security. But the fact is, you do. Whether you like to admit it or not, as much as 90% of code in any modern software is open-source. This is especially true in cloud-native time, where almost all apps rely on some open-source components.
Or, you might be pretty confident that your applications don't have any libraries with known vulnerabilities.
I know I was, until a minute ago, because out of curiosity just now, when writing this very paragraph, I ran a check on a project I have been working on recently, only to find 6 known vulnerabilities. There was no automated check in the project to make sure there weren't any known vulnerabilities, either. God, I wish I was making this up just for the sake of this article, but unfortunately, it is very real.
And I'm not alone, because according to a survey done by Sonatype, as much as two-thirds of the surveyed users feel confident that their applications do not rely on known vulnerable libraries, despite 10% of respondents reporting their organizations had security breaches due to open source vulnerabilities in the past year.
Starting to feel a bit worried? Here are more statistics to up your anxiety levels:
- 1 in 8 open-source downloads has known risks.
- 245,000 malicious packages were discovered in 2023 alone. If you don't know that's "a lot" or "maybe not so much compared to previous years," let me put it this way: it was twice the number of previous years combined.
- 18.6% of open source projects across Java and JavaScript that were maintained in 2022 are no longer maintained today, just one year later. How can you trust something when it is not even actively maintained? What's more, how can you know if something you depend on is still maintained?
- If you think these numbers are already frightening, think this: open source software is growing rapidly - with just Java (Maven), JavaScript (npm), and Python (PyPI) three languages/package managers, the total number of projects is 3.6 million, with 54 million different versions, requested 3.8 trillion times in 2023 alone.
OK, now that I think I have scared you enough and caught your attention, next let's take a look at open-source software dependencies, which only make the open-source software security matter worse.
The Complexity of Open Source Dependencies
As mentioned in the previous section, you only write 10% code of your application, and the other 90% are actually dependencies - open-source components. The situation is more or less the same if you take a closer look at one of your open-source dependencies: the author probably also didn't write 100% of their code but used many dependencies.
Let's say you are writing a simple Python web application using Flask. If we have a look at the dependencies of Flask itself, it depends on 6 other packages:
Flask==3.0.3
├── blinker [required: >=1.6.2, installed: 1.7.0]
├── click [required: >=8.1.3, installed: 8.1.7]
├── itsdangerous [required: >=2.1.2, installed: 2.1.2]
├── Jinja2 [required: >=3.1.2, installed: 3.1.3]
│ └── MarkupSafe [required: >=2.0, installed: 2.1.5]
└── Werkzeug [required: >=3.0.0, installed: 3.0.2]
└── MarkupSafe [required: >=2.1.1, installed: 2.1.5]
The list isn't so long, but it's 1 package depending on 6. And let's not forget that Flask is a micro web framework—emphasis on micro—meaning it doesn't require particular tools or libraries. The same can't be said for many other bigger frameworks or even platforms. For example, you are creating a state-of-the-art machine learning model using TensorFlow. It seems you only have TensorFlow as your only dependency, but under the hood of TensorFlow, it depends on:
tensorflow==2.16.1
├── absl-py [required: >=1.0.0, installed: 2.1.0]
├── astunparse [required: >=1.6.0, installed: 1.6.3]
│ ├── six [required: >=1.6.1,<2.0, installed: 1.16.0]
│ └── wheel [required: >=0.23.0,<1.0, installed: 0.43.0]
├── flatbuffers [required: >=23.5.26, installed: 24.3.25]
├── gast [required: >=0.2.1,!=0.5.2,!=0.5.1,!=0.5.0, installed: 0.5.4]
├── google-pasta [required: >=0.1.1, installed: 0.2.0]
│ └── six [required: Any, installed: 1.16.0]
├── grpcio [required: >=1.24.3,<2.0, installed: 1.62.1]
├── h5py [required: >=3.10.0, installed: 3.11.0]
│ └── numpy [required: >=1.17.3, installed: 1.26.4]
├── keras [required: >=3.0.0, installed: 3.2.1]
│ ├── absl-py [required: Any, installed: 2.1.0]
│ ├── h5py [required: Any, installed: 3.11.0]
│ │ └── numpy [required: >=1.17.3, installed: 1.26.4]
│ ├── ml-dtypes [required: Any, installed: 0.3.2]
│ │ ├── numpy [required: >1.20, installed: 1.26.4]
│ │ ├── numpy [required: >=1.21.2, installed: 1.26.4]
│ │ └── numpy [required: >=1.23.3, installed: 1.26.4]
│ ├── namex [required: Any, installed: 0.0.7]
│ ├── numpy [required: Any, installed: 1.26.4]
│ ├── optree [required: Any, installed: 0.11.0]
│ │ └── typing_extensions [required: >=4.0.0, installed: 4.11.0]
│ └── rich [required: Any, installed: 13.7.1]
│ ├── markdown-it-py [required: >=2.2.0, installed: 3.0.0]
│ │ └── mdurl [required: ~=0.1, installed: 0.1.2]
│ └── Pygments [required: >=2.13.0,<3.0.0, installed: 2.17.2]
├── ... # omitted, since it's too long...
You know TensorFlow depends on Keras since you use it very often. But are you even aware that Keras, in turn, depends on rich
, which depends on markdown-it-py
, which then depends on mdurl
? Do you know what mdurl
actually is? I don't if I'm being honest.
If the dependency tree looks neat, let's convert it to a graph:
There is a colloquial term in software called "dependency hell," where users' installed software packages don't work because they depend on specific versions of other software packages.
The main software relies on a multitude of large software libraries. It might depend on product A, but A relies on B to function, and B needs C to work properly. Then there are conflicting programs, such as app X requiring some lib v1, but app Y requires the same lib in v2. There could even be circular dependencies.
Now, we don't have to work these out since in most languages, such as Python, Go, JavaScript, etc., there are package managers that do this for us. But this doesn't change the fact that our dependency depends on other dependencies which are indirect dependencies to our project, and this is known as "transitive dependencies."
This could be a special concern because they are less visible to us, security tools, and audits. The nature of the complicated dependencies means that there is a chance that your project's direct dependencies are fine, but their dependencies contain CVEs or malicious code.
Although many reasons could lead to vulnerabilities in open-source software, such as inconsistent quality, unsupported code/unmaintained packages, etc., transitive dependencies by far caused the most trouble. According to the State of Dependency Management report done by Station 9, the Endor Labs research team, 95% of all vulnerabilities are found in transitive dependencies instead of direct dependencies, making it extremely difficult for developers to assess the real impact of these issues, or whether they're even reachable.
Open Source Security Tools
I hope the open source situation doesn't scare you away from open source and lead you to move to proprietary software. Some argue that open source software is less secure because of the inconsistent quality of different contributors, human errors, etc., but the very same reasons that cause vulnerabilities in open source can exist in proprietary software as well, and proprietary software can include vulnerabilities.
It's not all bad news about open source software, though, because according to the same State of the Software Supply Chain report done by Sonatype:
- 96% of downloaded vulnerable releases have a fixed version available.
- For every nonoptimal component upgrade that could potentially cause security risks, there are 10 superior versions of components available.
We just need to figure out how to improve security with the right tools. So, next, let's look at some security tools that can help improve open-source software security.
Vulnerability Scanning
To improve the security of our projects, let's start with the vulnerabilities themselves. We need to know if there are any known CVEs in them before we can fix them. So, the first step is to get some visibility on the Known CVEs, and luckily, there are some very good tools for this.
One of the most popular open-source security scanners is trivy by aquasec. It's an easy-to-use and fast CLI tool written in Golang that can scan container images, file systems, remote git repositories, and more. Besides known CVEs, it can also scan for secrets and misconfigurations. For Mac users, you can simply install it by brew install trivy
, then trivy fs --scanners vuln myproject/
to scan for vulnerabilities in your project. And trivy can be integrated with many popular platforms and applications, such as your CI pipelines or even with your IDE.
SBOM, and Software Composition Analysis
Since we heavily rely on dependencies, most of which are open-source components, it's important to know what dependencies are there. And, open-source packages, especially small ones, are usually maintained by a small team with only a few developers (or even a single developer), if they are maintained at all. Developers of open-source packages do not commit to maintaining the software and can decide to stop maintaining them at any time, for any reason. If it happens, there will be no one updating the package to eliminate known CVEs. Therefore, organizations must be able to inventory open-source components.
SBOM - Software Bill of Materials, comes into play: It has emerged as a critical component in software security, providing critical visibility into software components and supply chains, and helping identify and avoid vulnerabilities.
Trivy can also generate SBOM in different formats, and to do so, it's one simple command: trivy fs --format cyclonedx --output result.json /app/myproject
. There is another CLI tool and library for generating a Software Bill of Materials from container images and filesystems, which is syft
. It's easy to install (for Mac users, brew install syft
), supports many ecosystems such as JavaScript, Python, Go, Ruby, etc., and can be integrated with CI. For more on SBOM, see this blog post here.
Generating SBOM alone isn't enough: we need to make sure that what's in the SBOM list (our dependencies) is OK. We need to scan the dependencies to detect vulnerabilities, and this is known as SCA—Software Composition Analysis—the process of analyzing dependencies to determine if they are affected by known security vulnerabilities.
One tool for this on the rise is OSV, a database of open-source vulnerabilities built by Google in 2021. It has a CLI tool osv-scanner
that serves as the frontend to the OSV database, and you can install it with a simple command:
go install github.com/google/osv-scanner/cmd/osv-scanner@v1
Then, your project's list of dependencies and the vulnerabilities that affect them is only one command away:
osv-scanner—r path/to/your/project
OSV also provides API access where you can query vulnerabilities for a particular project at a given commit hash or version, and currently, it's not rate-limited yet.
Finally, GitGuardian recently integrated a SCA module into their code security platform. It scans your apps' dependencies to detect vulnerabilities, and allows you to prioritize incidents by criticality:
It also allows you to generate per project SBOM really easily:
Check out the complete features here!
Licensing
Despite being open-source, most open-source applications and packages come with their own usage licenses, describing how you can use the code. Risks could occur if how you use it doesn't match its intended purposes and usages, and a single dependency component could violate laws or requirements the company needs to follow.
So, understanding different types of licenses so that the code is used compliantly, and this is no trivial effort since it requires knowledge of different types of licenses and requirements of the project throughout the software development life cycle. Given the complicated nature of dependencies, licenses of some dependency components could even be incompatible with each other, making the matter even more grim.
For container license security, trivy scans any container image for license files and offers an opinionated view on the risk, and by default, trivy scans licenses for packages installed by apk
, apt-get
, dnf
, npm
, pip
, gem
, etc.
Another tool that can help scan for license issues is fossa. It has a free plan which is limited to 5 projects. You can install the CLI by brew update && brew install --cask fossa
, getting an API key from the platform and setting it in an environment variable, then simply run fossa analyze
in the root directory of your project. It shows direct and indirect licenses used in the project and flags potential issues. For example, one dependency might declare itself with one license, but it contains copied open-source code declared with another license.
PS: GitGuardian SCA also allows to easily keep an eye on the open-source licences in use in your codebases.
Conclusion
As software development increasingly relies on open-source components, understanding the complexity and potential risks becomes crucial. The transitive dependencies nature of open source presents a significant challenge because it amplifies the potential for vulnerabilities within the software supply chain.
Fortunately, various tools and techniques, such as vulnerability scanning, license analysis, and software composition analysis, are available to mitigate these risks. By proactively addressing security concerns and adopting best practices, developers and organizations can safeguard their projects and maintain the integrity of their software ecosystem.
Embracing open-source software security is not only a responsibility but also an opportunity to build trust and collaboration in the software industry.
FAQ
Which is better for security, open source or proprietary software?
Open-source software is not less secure than proprietary code, and a commercial license doesn't guarantee security: Both can include vulnerabilities leading to security issues.
With proprietary software, the only choice is to trust the vendor; with open-source projects, you should not unquestioningly trust community-validated code. And since open source is transparent about potential vulnerabilities, you can make an effort on your part and improve its security.
How do you make open-source software secure?
Scan for vulnerabilities, inventory your project, map your open source dependencies to known security vulnerabilities, and use automation to continuously monitor for new risks.
What is the difference between Software Composition Analysis (SCA) and Open Source Software (OSS)?
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Software composition analysis (SCA) is the practice of analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain vulnerabilities, or have licensing requirements.