portrait

Tiexin Guo

Senior DevOps Consultant, Amazon Web Services
Author | 4th Coffee

A Story

It was not quite long ago, and I was working on a huge B2C financial project which impacts hundreds of millions of users in Europe.

The project had long started before I joined, and when I joined as the infra guy in July, I was told that I only got three months before the release, which would happen in October.

So, yeah, I got three months, that should be more than enough, I told myself. It wasn't a quite different project from any other modern projects: cloud-native, containerized, infrastructure as code, everything deployed in the cloud, you know, standard stuff.

I delivered the infrastructure for the dev, test, staging, and production environment way before the planned go-live date. Of course I did; that's what I do. Things should go super smooth, I told myself.

One month before the release, a "security guy" stepped in. I had no idea where he came from; I only knew he was from the same organization but maybe from a different operational unit. I also had no idea what he was working on, but I guess it was some document reviewing and some report writing, of course.

Two weeks before the release, an external QA team jumped in as well, starting to do more security-related tests. It was two crazy weeks because there was a lot of fixing and re-testing, of course.

The release was delayed, although by only one day. You would think this isn't a huge issue for such a long waterfall project, but the real nightmare was the three months that followed the big bang release: there were many issues we had to fix all the time; all the engineers who were in the on-call didn't really sleep well for months.

You would think this story happened like a long time ago, like a really long, long time ago, but sadly, it wasn't as long as you imagined. I had seen so many similar projects before this one, where security was only handled at the very end, causing problems and chaos even after the release.


How Did We Do Security in the Old Days

DevSecOps Cartoon
DevSecOps Cartoon

In the past, security-related tasks were only tackled at the very end of the software development lifecycle.

Right before it's going to be deployed, a security team, or an auditing team, sometimes even externally hired only for a short period of time, would step in, do some review, and generate some reports and improvement plans.

Then, maybe a separate QA team would also step in and try to do some tests on the topic of security, but that was all.

Security is, in fact, "Job Zero" because of its importance, so there will be consequences if you only think about it at the very end of the software development lifecycle. It has brought so many challenges.

Project Delays

Traditionally, potential security issues could lead to huge delays.

Oftentimes, the external teams don't really have an in-depth understanding of the whole system and could not possibly figure out all potential security issues. And even if they do, generating a full list of potential risks and possible improvement items for every single aspect of the system is time-consuming, not to mention to implement and fix them all.

Fixing the code and security issues right before going to production can be time-consuming and expensive, if possible at all.

It is like writing unit tests. What is the best time to write unit tests for your code? It's immediately after you just wrote your code (or even before you start to write your first line of code, aka test-driven development.) When the code is hot and fresh, it's a lot easier and quicker to write those tests because you still know the whole thing in your mind. If you do it retrospectively, you probably forget what you had in your mind when you were writing that piece of code, and you would struggle to cover all possible scenarios.

Similarly, the best time to fix a potential security issue is right after when you discovered it, not after months when it's going to be released.

Fixing an issue that was introduced months ago could have very expensive consequences because many components might depend on it, so the scope of the change is much larger if it's still possible to fix it at all. If you never did any security things and only do it once right before the release, you are going to find out a lot of issues and fixing those issues could cause delays for the release.

Not Agile

Although this traditional way is hard, it still could be somehow manageable when you only release once or twice a year, i.e., if you are doing waterfall development.

But as software developers adopted agile methodologies and DevOps culture, with the goal of reducing software development lifecycle to weeks or even days so that they can increase velocity and serve their customers better, the traditional approach to security was not acceptable anymore. What's worse, it has become an external blocker and a bottleneck.

Not Scalable

If there is another team working on another project in parallel in the traditional way and only handles security in the end, the possible chaotic situation could only be more severe. You spend more time or money by asking more security guys to step in and do pretty much the same things, and you have to do much more hotfixes right before the release.

This model simply isn't scalable when you have multiple cross-functioning teams, each working on its own product.

The way we tackle security needed to be improved. The traditional centralized security team model must adopt a federated model which could allow each delivery team the ability to factor in the correct security controls into their Agile and DevOps practices.


Enter DevSecOps

This term has gained significant popularity in the year 2020. Sometimes, it's also called Secure DevOps. It's a natural and necessary result of the software development evolution to fit the Agile methodology and DevOps culture.

From the name alone, it's not hard to see what it really is: apparently, it builds on top of the DevOps principles and best practices, so it is an augmentation of DevOps but adds an increased focus on the topic of security to allow for security practices to be integrated into the DevOps approach.

Engage in Every Stage of the Software Development Lifecycle (SDLC)

DevSecOps automatically "bakes in" security in every stage of the software development lifecycle, enabling the development of secure software at the speed of Agile and DevOps. Within DevSecOps, security is a central part of the entire lifecycle of the software development process.

DevSecOps tries to solve the security issue earlier, rather than in the end. It removes the silo that is the final security audit. Instead, security is integrated into the Agile and DevOps processes and tools, and security issues are addressed immediately when they emerge, no matter in which stage of the lifecycle, when they are still easier, faster, and less expensive to fix, rather than to fix them retrospectively much later, right before production.

Shift Left

At the very beginning of the lifecycle, when the product is only being planned, developers are responsible for thinking about security rather than leaving it alone to the auditing team right before production.

When code is being written, developers think about potential security issues, for example, where you will store the secrets and credentials and how you fetch them safely from your code.

When building virtual machine images or container images, you do the vulnerability scan on the fly for each build and fix them if any vulnerability is found; instead of a one-time scan before production, only finding out there are many things that need to be fixed and re-tested.

This is "shift left."

It means shifting security responsibilities and tasks to the "left" side of the software development spectrum rather than tackling them in the end.

By shifting left, security practices and testing are performed earlier in the development lifecycle, rather than at the end, which allows you to discover security issues earlier and fix them immediately when it's hot and fresh so it's easy and fast to fix and have a much smaller impact.

This is the essence of DevSecOps: instead of security audit as a silo at the end, do it on the fly.

Shared Responsibility: No Single Security or Auditing Team

To achieve "shift left," instead of having a stand-alone security/auditing/QA team which only steps in right before it's going to be released into production, every team and person working on a project are required to consider security.

This is the shared responsibility model of DevOps: the security of the application and infrastructure is shared between all involved members of the project, within the whole cross-functioning team, rather than a silo.

In DevOps, simply putting the Devs team and Ops team together won't give you the DevOps team. Things are the same for DevSecOps: if you simply insert the "security guys'' into your existing Devs team or DevOps team, you don't really get DevSecOps. What is essential is, in every stage of the lifecycle, every member in the cross-functioning team is responsible for considering and working on security.

Velocity: Security Automation / Security as Code / Policy as Code

The main goal of DevOps is velocity: to speed up the software development lifecycle, iterate faster, get feedback from your customers faster, and improve faster so that you can finally serve your customers better.

DevSecOps is the same: the added focus on security should not slow you down but rather speed you up with the power of automation.

DevSecOps is well integrated into the DevOps process; it automates security at every stage of the software development lifecycle, from the initial design and planning to development, CI/CD, testing, integration, and all the way to production. You automate your security policies as code so that they are enforced in every stage of the development lifecycle.

What is Policy-as-Code? An Introduction to Open Policy Agent
Learn the benefits of policy as code and start testing your policies for cloud-native environments.

A Comparison between the Traditional Way and the DevSecOps Way

Let's say you have a project where you need some S3 buckets to store some objects.

They might be created by several different teams; there might be tens or even hundreds of buckets in total.

The Traditional Way

Maybe you have a central "infra" team that is responsible for cloud resource provisioning, or maybe you have several agile teams, and each team could do it on their own. Either way, many buckets are created in the process of developing this project.

One month before the release, a security team jumps in and starts to review the whole codebase and the whole infrastructure. After the review, they pointed out that due to company policies, no S3 bucket should be open to the public internet; they should all be private.

There are some less experienced teams which have created some buckets that are open to the whole internet, though. Those teams get the security report and start to change the permission of each S3 bucket.

This isn't easy to do, because first of all, there could be tens or even hundreds of buckets separately defined in tens or hundreds of code modules or repos because you do infrastructure as code, so you need to do tens or hundreds of commits and re-test in order to fix them.

And what makes things worse is, for each bucket, you need to figure out what other components are using it, check if private access would break things or not; if yes, how to fix them, how to set proper role-based access control for each bucket.

It could take days or even weeks; you could even end up rewriting some other dependencies so that they could still work with a private bucket.

The DevSecOps Way

In the DevSecOps way, even before the start of the project, during the planning phase, you would figure out the corporate policies regarding data privacy.

Then you enforce it with automation: security as code or policy as code.

You would implement a simple logic with code:

When a bucket is created, it generates an event. The event then, in turn, triggers a simple piece of code to execute. The code checks the permission of the bucket. If it's not private, update the permission to private, then send a push notification to the Slack channel of that team who created this bucket in the first place.

Then you have no worries at all. No matter which team creates however many buckets, they will know it immediately if they violate the privacy policy before they develop anything on top of that.

The best part is, this solution is reusable because it's already automated. Next time you have another project, you can still put the same policy in place if needed with minimum to no effort to ensure it's secure.

You don't risk delaying the project, you don't need extra time for the retrospective fixes, and you potentially have just sped up the future projects.


DevSecOps Benefits

Apparently, the biggest benefit is velocity, which is the same goal as DevOps. That is why DevSecOps fits perfectly in the Agile and DevOps process. The rapid, secure delivery of DevSecOps not only saves time but also reduces costs by minimizing the need to repeat a process to address security issues and by shifting security responsibility left.

If the S3 example above isn't very convincing, think of any other scenario, for example, how you deal with container image vulnerability scanning, the traditional way before the big bang release, and the DevSecOps's proactive shifting left way.

The DevSecOps accelerates the development process. It automates everything related to security or policy, and more importantly, it's a repeatable process. The artifact is reusable for future projects and can be well integrated with your CI/CD pipelines.

Security isn't handled at the end passively by an external team because it is a requirement anymore; instead, security is enhanced proactively, dealt with much sooner, as soon as issues occur.

You might not like the word DevSecOps because it seems like wordplay on DevOps, but it doesn't matter; call it whatever you want: security as code, policy as code, shift-left, secured DevOps, etc. It's the methodology and the process that matter. It has fundamentally changed the way security is tackled in the software development lifecycle, and it will continue doing so.

Embrace security, make it your Job 0, and handle it now rather than later.

Edit 2021/11/02: if you are using Kubernetes, you might want to check our guided tour of the NSA/CISA best practices on Hardening Your Kubernetes Cluster:

Hardening Your Kubernetes Cluster - Threat Model (Pt. 1)

Hardening Your Kubernetes Cluster - Guidelines (Pt. 2)