portrait

Tiexin Guo

Senior DevOps Consultant, Amazon Web Services
Author | 4th Coffee
Edit 01/22: If you want a beginner-friendly introduction to the topic, start here
Edit 04/23: ... and don't miss out on this awesome IaC security cheat sheet!

1 - Infrastructure Security and Infrastructure Code Security

With DevOps, we try to manage our infrastructure using pure code, often to deploy it into the cloud. It’s automated, making it easier to manage, faster to deploy and reusable, hence reducing human labor and eliminating errors.

Since all our infrastructure is managed by code, the security of the code that actually manages the infrastructure is crucial.

We often say that security is job zero; when it comes to infrastructure, it’s even more so. If, for example, the database password is included in the code and someone else gets access to the code, the infrastructure, especially with cloud deployments, might be compromised, simple as that.

So, while adding security group rules to make sure only the least-privilege access is allowed to your database helps to improve the security around the infrastructure, handling the code properly (for example, not storing the database password directly in the code) to improve the security around the code is also one of our top priorities.

Today, we will have a look at possible security leaks and enhancements in the infrastructure code.

Nowadays, Terraform is so popular that it is basically the de facto tool when you are talking about orchestrating your infrastructure as code. So, we will use Terraform as an example. Still, most of the principles we will talk about also apply to other IaC tools, like CloudFormation, AWS CDK, etc.

2 - Securely Managing and Separating Multiple Environments

When working on real-world projects, unless you are working with a simple personal project or a start-up at a very early phase, the chance is, you run some tests before you actually deploy your application in a production environment. The same goes for infrastructure code. You test it first in some other places like a “develop” or a “staging” environment.

Different environments are separated from each other, and the separation makes your production environment more secure. There will be no access coming in from other ENVs to the production ENV, and the password, access keys, etc., are different in all those environments. So, if we are making the right decisions when it comes to separating different environments, we are already halfway to a secure Infrastructure as Code setup.

2.1 - Testing with Terraform Workspace

One common feature one might use is Terraform workspace for testing your infrastructure code before you deploy.

Spoil alert, you might not want to use this, and here’s why.

Terraform Workspace Explained:

Each Terraform configuration has an associated backend that defines where the Terraform state is stored. For example, you can use a default local backend, which stores your state file locally; or you can use a remote backend, such as S3, to store the state files in an S3 bucket.

The state belongs to a “workspace.” Initially, the backend has only one workspace, called “default” (although you may not know its existence), and thus there is only one Terraform state associated with that configuration.

Many backends, like local or S3, support multiple named workspaces, allowing multiple states to be associated with a single configuration.

Note that the configuration still has only one backend, but multiple distinct instances of that configuration can be deployed without configuring a new backend or changing authentication credentials.

For example, we want to create an S3 bucket in the production environment with the simple setup:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.45.0"
    }
  }
}

provider "aws" {
  region = "eu-central-1"
}

resource "aws_s3_bucket" "b" {
  bucket = "my-tf-test-bucket-${terraform.workspace}-1623578630"
  acl    = "private"
}

Before creating this bucket in the production ENV, I want to test it without any effect on the production state, and we can do it in another namespace:

`terraform workspace new test`

In this example, we are using a local backend, so the state file is stored locally. If you do a “terraform apply” in the “test” namespace we just created, you will find out that a new folder named “terraform.tfstate.d/test” is created, which has its own state file, not messing with the default one at all.

As you may have already noticed, you can interpolate the workspace’s name as part of the resource’s name, like:

resource "aws_s3_bucket" "b" {
  bucket = "my-tf-test-bucket-${terraform.workspace}-1623578630"
  acl    = "private"
}

In this way, even if the bucket is already created in the default namespace, it can be created again in the test namespace.

With the same logic, you can even write some if/else. For example, if the namespace is production, the ec2 instance count is 5; otherwise, it’s 1.

2.2 - When to Use Workspace

Named workspaces allow convenient switching between multiple instances of a single configuration within its single backend.

A common use for multiple workspaces is already shown above: to create a parallel, distinct copy of a set of infrastructure in order to test a set of changes before modifying the main production infrastructure.

2.3 - When NOT to Use Workspace

When we use Terraform to manage larger systems, we should use multiple separate Terraform configurations so that different environments can be managed separately. Workspace is not a suitable tool for system decomposition because each subsystem should have its own separate configuration and backend.

In particular, we commonly want to create a strong separation between multiple environments (like the aforementioned example, staging vs. production), and maybe they are even managed by different teams. Workspaces are not a suitable isolation mechanism for this scenario.

Maybe you don’t want the workspace’s name as part of the resource name or the logic, because after all, it adds complexity and makes it harder to read.

We need a stronger separation.

2.4 - Using Terraform Modules to Manage Multiple Environments

A better approach is to create a reusable module and use variables to manage the differences between different environments, as follows:

iexin@Tiexins-Mini ~/work/iac-security $ tree
.
├── modules
│   └── s3
│       ├── main.tf
│       └── variables.tf
├── production
│   ├── config.tf
│   ├── main.tf
│   └── variables.tf
└── staging
    ├── config.tf
    ├── main.tf
    └── variables.tf
    ```

The idea is we create a module that does the S3 bucket creation and abstracts the name of the bucket as an input variable to the module.

Then we create different folders for each environment, each containing its own configuration and reference to that module, with its own variables. In each environment, we only refer to the module:

module "bucket" {
  source      = "../modules/s3"
  bucket_name = "my-tf-test-bucket-${var.environment}-1623578630"
}

In this way, we have created a truly separated environment without duplicated code.

Ideally, we would even want to separate environments by AWS accounts and switch to a different access key when deploying to make sure the environments are truly securely separated. There are tools to make this process easier for you, like “awsume.”

3 - Code Security

In the previous example, we created an S3 bucket, no sensitive information.

What if we are creating, for example, a database where we need to set the admin password within the infrastructure code?

3.1 - Protect Sensitive Input

For example, we create an RDS instance with the following code:

variable "password" {
  type    = string
  default = "foobarbaz"
}

resource "aws_db_instance" "default" {
  allocated_storage    = 10
  engine               = "mysql"
  engine_version       = "5.7"
  instance_class       = "db.t3.micro"
  name                 = "mydb"
  username             = "foo"
  password             = var.password
  parameter_group_name = "default.mysql5.7"
  skip_final_snapshot  = true
  db_subnet_group_name = aws_db_subnet_group.default.name
}

This breaks one fundamental security rule : DO NOT store sensitive data in the code (see next chapter). Still, if you are sure you want to let this secret visible in source code, more precautions are to be taken  because the variable’s value might be printed out when you are doing a “terraform plan” or if it’s referenced as an output. Let’s say we have an output like this:

output "pwd" {
  value = var.password
}

When we apply, we will literally see it:


Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

pwd = "foobarbaz"

This isn't nice; we don’t want people passing by my screen and seeing the database password database! Even if you don’t put it as a variable but read it from the ENV vars, it doesn’t work; it’s still there in the output.

Luckily, there is a solution in Terraform 0.14 or later: the “sensitive” flag:

variable "password" {
  type      = string
  default   = "foobarbaz"
  sensitive = true
}

output "pwd" {
  value     = var.password
  sensitive = true
}

If we mark the variable and the output as “sensitive,” it won’t be printed out when you do terraform plan and apply:

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

pwd = <sensitive>

3.2 - DO NOT Store Sensitive Data in the Code

Even with the sensitive flag enabled, it’s still not secure. Because the password is still in the source code. The database password is public to anyone who has read access to your infrastructure code repo.

As mentioned, it’s possible not to store the password in the variables file but to read it from the environment variables. While this could work, it adds tons of human labor. Every time you need to run your infra code, you need to set a bunch of ENV vars. And where do you store those values anyway, in a physical notebook?

We need something better for storing those sensitive data, and luckily, there is: enter Secret Managers. For a detailed introduction to Secret Managers, check out this blog post.

In essence, we store the secrets in a remote manager and read the value using Terraform code:

# AWS Secrets Manager
data "aws_secretsmanager_secret" "my_db_secret" {
  name = "my_db_secret"
}

# secret version
data "aws_secretsmanager_secret_version" "my_db_secret" {
  secret_id = data.aws_secretsmanager_secret.my_db_secret.id
}

# store the value in a local variable
locals {
  password = jsondecode(data.aws_secretsmanager_secret_version.my_db_secret.secret_string)["password"]
}

# use it like:

resource "aws_db_instance" "default" {
  allocated_storage    = 10
  engine               = "mysql"
  engine_version       = "5.7"
  instance_class       = "db.t3.micro"
  name                 = "mydb"
  username             = "foo"
  password             = local.password
  parameter_group_name = "default.mysql5.7"
  skip_final_snapshot  = true
  db_subnet_group_name = aws_db_subnet_group.default.name
}

In this way, we avoid storing sensitive data in the code at all; we can even make our infrastructure code repo public and share it with the community if we want.

How to Handle Secrets in Terraform
DevOps engineers must handle secrets with care. In this series, we summarize best practices for leveraging secrets with your everyday tools.

3.3 - Client-side Encryption

Maybe you don’t want to pay for the Secret Managers. Maybe you don’t want to bother yourself by hosting a HashiCorp Vault and maintain it because you only have one or two secrets to store. For smaller teams, there is another possible solution: “git-crypt.”

git-crypt” enables transparent encryption and decryption of files in a git repository. Files that you choose to protect are encrypted when committed and decrypted when checked out. git-crypt lets you freely share a repository containing a mix of public and private content.

In essence, it works similarly to Ansible Vault, where the file is encrypted before committed to a git repo. It relies on GPG keys for encryption. You can add another person’s GPG key in the repo so that they can decrypt it too. And the encryption-before-commit is done automatically.

As previously said, if there are only a few secrets to store, this might work. But you need to know that this solution doesn’t scale well. It needs to be set up in every repo. For every new member of the team, they need to create a GPG key, and you need to add that key to each repo. And they need to store their keys securely and safely. For a larger scale, Secret Managers are still the way to go; I only cover this for the sake of completeness.

3.4 - Secret Scanning

No matter if you use some Secret Managers or git-crypt, chances are, you still might make a mistake by committing credentials into a git repo because we are humans, not machines, and humans make mistakes.

We all have the good intention to keep things as secure as possible and not do dangerous stuff like committing a password into a repo, but these things still happen, more often than I’d like to admit. And I’d be open enough to admit that in my personal projects, I’ve also made mistakes like these, although I knew it very well that I shouldn’t do it.

What this means is, having a good intention alone won’t work; what works is a mechanism, often an automated one.

This is where repository security scanner tools kick in. We have some tools, both open-sourced ones (like Gitleaks) and commercial ones (but free for small teams like GitGuardian), to help us scan possible secrets in the repos. They all integrate well with our CI pipelines, and they may even provide a nice UI console and alerting features for you to easily audit and view all possible leaks and notify when a new threat is found. But that's not all...

4 - Infrastructure Security

We’ve covered both environment separation and code security, but there is more: the infrastructure itself.

Each team member should continuously accumulate knowledge on this subject matter and try to follow best practices as much as possible. After all, DevOps is all about a continuous learning mindset. For example, when creating a security group, you may not want to open SSH port 22 to the world.

But, similar to security scanner tools, good intentions alone don’t work! We need automated processes to help us find possible security issues, and that’s why a static code scanner/analyzer might help.

For Terraform users, you can use the GitGuardian CLI, ggshield, for detecting 70+ security vulnerabilities. It can be used as a pre-commit check and also be integrated into your CI pipelines to prevent serious misconfigurations from being deployed. Once installed, you can start a first scan super quickly:

#Create a token to authenticate your GitGuardian workspace
ggshield auth login

#Scan your local repositories
ggshield iac scan REPO
A ggshield scan ouptut showing 2 incidents

You can read detailed information about misconfigs, see which exact policies were violated all in one place, and also export the results to a JSON file and save it somewhere else in your CI/CD for further analysis.

and learn more about it:

Introducing Infrastructure as Code Security Scanning
Now protect your infrastructure at the source code level!

For CloudFormation, there is a tool, “cfn-lint,” where you can even extend and implement your own rules to check.

5 - Summary

  • Physically and logically separating different environments help to define security boundaries and limit access between them; Terraform modules can help to do so.
  • Code security: protect sensitive variables with the “sensitive” flag; do not store sensitive data in the code; rather, use either client-side encryption or a secret manager; static code scanning tools.
  • Infrastructure Security: best practice, continuous learning, static analyzer tools.