Getting Started With SPIFFE For Multi-Cloud Secure Workload Authentication

Mattias Gees

Director of Tech Workload Identity Architecture at Venafi | Cloud Native & Kubernetes | DevSecOps |
SPIFFE/SPIRE | Workload Identity | Zero Trust | LinkedIn

SPIFFE is rapidly emerging as the preferred identity framework and industry standard for secure service-to-service communication and authentication in cloud-native environments. Its growing popularity is evident, benefiting application developers and the major cloud providers adopting it as a powerful “identity federation” tool when authenticating workloads across customer-controlled multi-cloud setups. 

SPIFFE stands for Secure Production Identity Framework for Everyone, and it is a specification that provides the following:

  • A universal way to identify workloads (e.g., applications, services, scripts,....)
  • The identity of each workload is encrypted in an X.509 certificate or JSON Web Token (JWT).
  • The identities are short-lived and constantly attested and verified. The impact of compromise is smaller.
  • Potential to federate with external systems like Cloud Providers and SAAS services

SPIFFE aims to replace the need for long-lived API keys, usernames & passwords, and other single-factor access credentials with a highly scalable identity solution. This identity solution can be applied universally across multi-cloud and hybrid environments. 

If this is your first encounter with SPIFFE, I highly recommend reading my blog post “Can SPIFFE Solve the Secret Zero Problem?” explaining the specifications.

However, many potential adopters we've spoken to are still trying to understand how to best productively use SPIFFE in their environments and need assistance evaluating its benefits. At Cloud Native SecurityCon 2024, TestifySec’s Tom Meadows and I presented some practical applications of SPIFFE in real-world environments. This blog post provides a detailed summary of our discussion, covering the following use cases:

  1. Use SPIFFE natively in your workload
  2. Use SPIFFE for applications you can’t modify with an Envoy proxy
  3. Authorize with a SPIFFE identity to AWS
  4. Use a SPIFFE Verifiable Document for PostgreSQL authorization

Use SPIFFE natively in your workload

When you have complete control over your application and use a popular programming language like Golang, Python, or Java, you have SPIFFE libraries that make it easy to integrate SPIFFE within your workload. In many cases, minimal adjustment needs to be made.

In the case of the server, it requires the following code when you use Golang:

// This gets called from the main function and actually starts an mTLS server that is SPIFFE capable.
func (b *BackendService) run(ctx context.Context) error {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	// Set up a `/` resource handler
	http.HandleFunc("/", b.rootHandler)

	// Create a `workloadapi.X509Source`, it will connect to Workload API using provided socket.
	// If socket path is not defined using `workloadapi.SourceOption`, value from environment variable `SPIFFE_ENDPOINT_SOCKET` is used.
	source, err := workloadapi.NewX509Source(ctx)
	if err != nil {
		return fmt.Errorf("unable to create X509Source: %w", err)
	}
	defer source.Close()

	// Allowed SPIFFE ID
	clientID := spiffeid.RequireFromString(b.spiffeAuthz)

	// Create a `tls.Config` to allow mTLS connections, and verify that presented certificate has SPIFFE ID `spiffe://example.org/client`
	tlsConfig := tlsconfig.MTLSServerConfig(source, source, tlsconfig.AuthorizeID(clientID))
	server := &http.Server{
		Addr:              b.serverAddress,
		TLSConfig:         tlsConfig,
		ReadHeaderTimeout: time.Second * 10,
	}

	// Serve the SPIFFE mTLS server.
	if err := server.ListenAndServeTLS("", ""); err != nil {
		return fmt.Errorf("failed to serve: %w", err)
	}
	return nil
}

We started setting up a regular Golang web server in the above example. The difference is that, as part of our tlsConfig, we supply it with a SPIFFE certificate retrieved from the workload API socket and a simple authorization rule that matches it based on a given SPIFFE identity authorized to talk to our workload.

To initiate a connection from the client side to our server, it looks like the following:

// Handles requests for connecting to the SPIFFE native backend
func (c *CustomerService) mtlsHandler(w http.ResponseWriter, r *http.Request) {
	log.Printf("Handling a request in the rootHandler from %s", r.RemoteAddr)
	mTLSCall(w, c.spiffeAuthz, c.backendService)
}

// General mTLS call to SPIFFE enabled servers. This can be either a SPIFFE native application or a webserver/apiserver that is fronted by a SPIFFE proxy like Envoy.
func mTLSCall(w http.ResponseWriter, spiffeAuthZ string, backendAddress string) {
	w.Header().Set("Content-Type", "text/html")
	ctx := context.Background()
	ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
	defer cancel()

	// Create a `workloadapi.X509Source`, it will connect to Workload API using provided socket path
	// If socket path is not defined using `workloadapi.SourceOption`, value from environment variable `SPIFFE_ENDPOINT_SOCKET` is used.
	source, err := workloadapi.NewX509Source(ctx)
	if err != nil {
		http.Error(w, fmt.Sprintf("Unable to create X509Source: %v", err), http.StatusInternalServerError)
		return
	}
	defer source.Close()

	// Allowed SPIFFE ID
	serverID := spiffeid.RequireFromString(spiffeAuthZ)

	// Create a `tls.Config` to allow mTLS connections, and verify that presented certificate has SPIFFE ID.
	tlsConfig := tlsconfig.MTLSClientConfig(source, source, tlsconfig.AuthorizeID(serverID))
	client := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: tlsConfig,
		},
	}

	// Do a GET call to the backend and get the response.
	resp, err := client.Get(backendAddress)
	if err != nil {
		http.Error(w, fmt.Sprintf("Error connecting to %q: %v", backendAddress, err), http.StatusInternalServerError)
		return
	}

	defer resp.Body.Close()
	// Read the body from the response.
	body, err := io.ReadAll(resp.Body)
	if err != nil {
		http.Error(w, fmt.Sprintf("Unable to read body: %v", err), http.StatusInternalServerError)
		return
	}

	// Retrieve the server SPIFFE ID from the connection.
	serverSPIFFEID, err := spiffetls.PeerIDFromConnectionState(*resp.TLS)
	if err != nil {
		http.Error(w, fmt.Sprintf("Wasn't able to determine the SPIFFE ID of the server: %v", err), http.StatusInternalServerError)
		return
	}

	// Showcase the retrieved information and send it back to the customer.
	fmt.Fprintf(w, "<p>Got a response from: %s</p>", serverSPIFFEID.String())
	fmt.Fprintf(w, "<p>Server says: %q</p>", body)
}


This code shows that we use a typical client that initiates a Get request, but we add the SPIFFE certificate details retrieved from its Workload API Socket. It also cross-validates the server SPIFFE Identity as part of the authorization step. This is easy to add to greenfield projects, but even retrofitting existing applications is straightforward and relatively easy to adopt.

Use SPIFFE for applications you can’t modify with an Envoy proxy

Adding SPIFFE natively to your workloads isn’t always possible; some common reasons are older applications that are not worth the effort or investment to modify. Another one is that you can’t change the workload due to not owning the source code (e.g., enterprise software).

A proxy is required​​ to use SPIFFE for these workloads. A popular proxy is Envoy, for which SPIRE has support through its secret discovery service (SDS) support.

To configure Envoy to work with SPIFFE it requires the following:

  • A connection to the SPIFFE workload API
  • Definition of the certificate to request from the SPIFFE workload API
  • Authorization of the Spiffe ID from the incoming connection

We first need to tell Envoy where it can retrieve its certificates and to do that, we need to define a cluster configuration in Envoy.

 clusters:
      - name: spire_agent
        connect_timeout: 0.25s
        http2_protocol_options: {}
        load_assignment:	
          cluster_name: spire_agent
          endpoints:	
          - lb_endpoints:	
            - endpoint:	
                address:	
                  pipe:	
                    path: /spiffe-workload-api/spire-socket.sock

Now that we have a cluster defined that points to our SPIRE agent, we can request a certificate from it with the following parameters:

 transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                tls_certificate_sds_secret_configs:
                - name: "spiffe://spiffe-demo.internal/ns/spiffe-customer/sa/spiffe-demo-httpbackend"
                  sds_config:
                    resource_api_version: V3
                    api_config_source:
                      api_type: GRPC
                      transport_api_version: V3
                      grpc_services:
                        envoy_grpc:
                          cluster_name: spire_agent

The above config will request an X.509 Certificate for the `spiffe://spiffe-demo.internal/ns/spiffe-customer/sa/spiffe-demo-httpbackend` SPIFFE Identity. The SPIFFE workload API will validate if it can hand out that identity for the Envoy Proxy and, if allowed, will return a valid SVID X.509 certificate. Now that we have a certificate, we can also Envoy to validate incoming connections to our web server and check for a specific SPIFFE ID:

 transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
                combined_validation_context:
                  # validate the SPIFFE ID of incoming clients (optionally)
                  default_validation_context:
                    match_typed_subject_alt_names:
                    - san_type: URI
                      matcher:
                        exact: "spiffe://spiffe-demo.internal/ns/spiffe-demo/sa/spiffe-demo-customer"

The following configuration with Envoy will make your server SPIFFE-aware without any modifications. It is also how service meshes like Istio work, as they use Envoy to encrypt the traffic and make authorization decisions based on rules defined in Istio. 

On the client side, you can use the same code when connecting to a SPIFFE native application; no different implementation is needed. If your client is also not SPIFFE native, an Envoy proxy can be used as an outbound proxy, as shown in the following example.

Authorize with a SPIFFE identity to AWS

Authenticating to AWS with an SVID can be done through either an X.509 certificate using the AWS IAM Roles Anywhere or with a JWT token through the OpenID Connect (OIDC) federation. The recommended way would be to use the X.509 certificate option, but we wanted to showcase the SPIRE JWT SVID capabilities for the talk, so we used the OIDC federation. Although AWS supports X.509 authentication, not all external SAAS/PAAS/IAAS services do, and thus, there are always going to be cases where JWT functionality is going to be required.

To federate with AWS (and any other JWT-supported service), we need to expose a JWKS endpoint. SPIRE can do this by deploying the OIDC Discovery Provider. The OIDC Discovery Provider will host a public endpoint that any service can query to retrieve the public keys where the matching private keys have signed our JWT tokens.

Once we have a public endpoint, we need to configure AWS so that it becomes aware of our OIDC Discovery Endpoint. The setup is only required once per SPIRE setup. We must also create the necessary IAM roles and permissions so our workload can access services like AWS S3. In Terraform, it can look like the following:

data "tls_certificate" "oidc-certificate" {
  url = var.oidc-url
}

resource "aws_iam_openid_connect_provider" "oidc-spire" {
  url = var.oidc-url

  client_id_list = [
    "demo",
  ]

  thumbprint_list = [data.tls_certificate.oidc-certificate.certificates[0].sha1_fingerprint]
}

resource "aws_iam_role" "oidc-spire-role" {
  name = "demo-spiffe-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity",
        Effect = "Allow",
        Principal = {
          Federated = aws_iam_openid_connect_provider.oidc-spire.arn,
        },
        Condition = {
          StringEquals = {
            "oidc-discovery.mattias-gcp.jetstacker.net:aud" = "demo",
            "oidc-discovery.mattias-gcp.jetstacker.net:sub" = "spffe://spiffe-demo.internal/ns/spiffe-demo/spiffe-demo/customer"
          }
        }
      },
    ],
  })
}

resource "aws_iam_role_policy" "s3" {
  name        = "demo-spiffe-policy"
  role        = aws_iam_role.oidc-spire-role.name

  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutAccountPublicAccessBlock",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "s3:ListJobs",
                "s3:CreateJob",
                "s3:HeadBucket"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::my-s3-bucket",
                "arn:aws:s3:::my-s3-bucket/*",
                "arn:aws:s3:*:*:job/*"
            ]
        }
    ]
}
EOF
}

The code will do the following:

  1. Setup OIDC federation with the SPIRE ODIC Discover Provider
  2. Create an IAM role that specifically allows connections from that federation endpoint. It also needs to match a specific SPIFfE ID and a JWT audience.
  3. Give permissions to the role to write and retrieve data from an S3 bucket.

No particular configuration is required on the application side, and we would write it as we consume it natively. The two extra things that are needed are the following:

  1. This small binary retrieves the JWT token from the SPIRE Workload API and transforms it into something AWS understands. It is based on Square's work on this.
  2. An AWS Config file that defines the credential_process to allow us to source the credential from an external source (in this case, from SPIRE)
[default]
credential_process = /usr/bin/spiffe-aws-assume-role credentials --role-arn ${AWS_ROLE_ARN} --audience demo --workload-socket /spiffe-workload-api/spire-socket.sock

And that is all that is required to make federation work with AWS through OIDC. This same method is also transposable to other services that support JWT authentication.

Use a SPIFFE Verifiable Document for PostgreSQL authorization

The last use case we will cover in this blog post is authentication to PostgreSQL with an X.509 SVID. PostgreSQL natively doesn’t support SPIFFE ID authorization; luckily, it supports certificate-based authentication. Although we can’t use the SPIFFE ID that is defined as part of the SAN of the X.509 certificate, we can use the common name for authentication and authorization within PostgreSQL. SPIRE supports setting the common name as part of the workload registration. An example of such a registration is the following.

bin/spire-server entry create \
    -parentID spiffe://spiffe-demo.local/node \
    -spiffeID spiffe://spiffe-demo.local/ns/spiffe-demo/spiffe-demo-customer \
    -selector unix:user:spiffe-demo \
    -dns customer.spiffe-demo.local

Although this solution will not use the SPIFFE ID for authorization, it will still be able to leverage a lot of the SPIRE advantages:

  • Attestation and verification of workloads
  • Short-lived identities
  • Retrieval of the SVID through the universal SPIRE socket

Another problem we must solve is that PostgreSQL can’t talk to the SPIRE Workload API to retrieve its identities. Luckily, the SPIFFE project has an answer, and it is called the spiffe-helper. The spiffe-helper can retrieve the SVID on behalf of the workload, make it available on the local filesystem, and signal the workload so it can reload its configuration with the new certificate. For PostgreSQL, our spiffe-helper configuration looks like the following:

# SPIRE agent unix socket path
agent_address = "{{- .Values.spiffe.socketPath -}}"

# psql binary path
cmd = "/usr/bin/psql"

# Query for configuration reloading
cmd_args = "-h 127.0.0.1 -p 5432 -c \"SELECT pg_reload_conf();\""

# Directory to store certificates (must match with the ssl setings in postgresql.conf)
cert_dir = "/opt/postgresql-certs"

# No renew signal is used in this example
renewSignal = ""

# Certificate, key and bundle names must match those configured in postgresql.conf
svid_file_name = "svid.pem"
svid_key_file_name = "svid.key"
svid_bundle_file_name = "svid_bundle.pem"

Thanks to the spiffe-helper, our PostgreSQL will always have an up-to-date X.509 certificate. On the application side, we need to do the following to be able to talk to PostgreSQL and authenticate ourselves:

ctx := context.Background()
	ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
	defer cancel()

	// Create a `workloadapi.X509Source`, it will connect to Workload API using provided socket path.
	// If socket path is not defined using `workloadapi.SourceOption`, value from environment variable `SPIFFE_ENDPOINT_SOCKET` is used.
	source, err := workloadapi.NewX509Source(ctx)
	if err != nil {
		return nil, fmt.Errorf("unable to create X509Source: %v", err)
	}
	defer source.Close()

	tlsConfig := tlsconfig.MTLSClientConfig(source, source, tlsconfig.AuthorizeAny())

	connStr := fmt.Sprintf(
		"postgres://%s@%s:%s/%s?sslmode=require",
		c.postgreSQLUser, c.postgreSQLHost, dbPort, dbName)

	// Parse the PostgreSQL config settings
	config, err := pgx.ParseConfig(connStr)
	if err != nil {
		log.Fatalf("Unable to parse connection string: %v", err)
	}

	// Set the TLS config to the SPIFFE TLS config that we retrieved earlier from the Workload API.
	config.TLSConfig = tlsConfig

	// Open the connection the PostgreSQL database.
	db := stdlib.OpenDB(*config)

	// Ping the PostgreSQL database to test the connection
	err = db.Ping()
	if err != nil {
		return nil, fmt.Errorf("error pinging database: %v", err)
	}

	// Return the DB connection.
	return db, nil

The following code will set up and test the DB connection by requesting an SVID from the SPIRE workload API. Once we have the SVID, we can use it as part of our TLSConfig for our PostgreSQL connection.

The above is all required to use X.509 SVIDs for authenticating and authorizing PostgreSQL.

Conclusion

The above four use cases give a minimal view of what is required to move to a Zero Trust Networking model backed by SPIFFE & SPIRE, and many of the patterns are adaptable to other workloads. If you are keen to try it out, all of the source code of this blogpost, along with instructions, can be found on GitHub.

Read more about identity management:

Non-Human Identity Security Strategy for a Zero Trust Architecture
Explore NIST-backed guidance on securing Non-Human Identites, reducing risks, and aligning with zero trust principles in cloud-native infrastructures.

Securing Your Machine Identities Means Better Secrets Management