Multicloud Security Architecture
A Brief Introduction to Multicloud Security
Six years ago, I was working on a "containers-as-a-service" public cloud project in a Fortune 500 company, providing everything related to containers' lifecycle as a service to internal users: Kubernetes clusters, building/deploying containerized applications, container security runtime platforms, secret management, automation pipelines, and more. The goal was to enable end users to deploy various applications without knowing much about container technologies and the public cloud so that they could concentrate on their applications.
Theoretically, since the users didn't need to know about the nitty-gritty implementation details to run their workloads, we could have chosen to run everything in one single public cloud service, right?
Yes, but No. The company chose to do it in not just one, but two public cloud service providers, for some solid reasons: avoiding vendor-lockin, data storage requirements, meeting the need of users' requirements, cost optimization, and more.
This is one of the early examples of multicloud - the use of multiple clouds. But using multiple cloud service providers isn't all benefits, it has its challenges. Today, let's have a look at multicloud: What it is, what are the challenges, especially security challenges, and what are the best practices towards a secure multicloud architecture.
1. What is Multicloud?
Multicloud is, as the name and the example shown in the beginning suggest, the use of cloud infrastructures and/or services offered by different providers (such as Azure, AWS or Google Cloud).
Multicloud is more and more popular: according to research done by 451 Research (part of S&P Global Market Intelligence; read the full report here), multicloud is no longer a choice, we are already living in it: 98% of the enterprises (the research studied 1,500 companies) using or planning to use public cloud in the next six months have already adopted a multicloud strategy. The top 5 reasons are:
- data sovereignty (the data organizations use is subject to the legal and regulatory regimes of the localities where it is collected)
- cost optimization
- best-of-breed cloud services and applications
- business agility and innovation
- vendor lock-in concerns
For sure, multicloud architecture empowers enterprises, but with great power, comes great responsibilities, and so multicloud architecture also brings its unique challenges.
Different public clouds may look the same in the way that they provide similar offerings regarding infrastructure and higher-level services, but they are different regarding how you create resources with automation, how you configure access and identity, how you enforce security policies, how you deploy, etc. Because after all, they are different public clouds, and this great complexity can lead to security gaps.
Since multicloud is already essential in many enterprises and a fact to live with, next, let's have a look at the challenges and possible solutions.
2. Multicloud Security Challenges
Compared to a single cloud, multicloud brings many security challenges.
2.1 Greater Attack Surface
The attack surface of a multicloud architecture is bigger.
An attack surface is all the points an attacker can exploit to enter a system. So, by definition, the more moving pieces there are in the infrastructure, the bigger the attack surface is, and the more vulnerabilities there potentially could be. Unfortunately, for security, more providers could mean more problems.
When adopting a multicloud architecture for its benefits, we have to accept the fact that will have a greater attack surface. Of course, some measures can be taken to mitigate the risks, but the fact is that multicloud = bigger attack surface. Being aware of is important because it reminds us to more attention and handles multicloud security with more care.
2.2 Identity and Access Management (IAM)
Cloud providers rely on different IAM frameworks. Although functionally speaking they do more or less the same things, in the same way, there are differences in the configuration and automation details. For example, in AWS, the IAM has its unique concepts and best practices, which might not all apply in another public cloud, say, GCP.
The absence of a centralized IAM system that can bridge different clouds together brings significant challenges, because, without it, DevSecOps may struggle to effectively define and manage user identities, access levels, trusted devices, password policies, password rotations, and more, which could introduce security vulnerabilities and increase the risk of unauthorized access.
2.3 Infrastructure as Code (IaC) Security
With DevOps, we manage our infrastructure using code for automation. Since the infrastructure is orchestrated by code, the security of the IaC tool and IaC code is crucial, because if not properly configured and secured, security vulnerabilities can be introduced in the very early stage of a project at the infrastructure level.
And yes, that's right, with multicloud, IaC misconfigs could also have a bigger security impact. We have to work out how to automate not just one, but two (or even more) clouds, and automate them securely no less. This could potentially mean that there is a knowledge and expertise gap, which requires continuous learning to be closed.
2.4 Insufficient Visibility/Monitoring
Another security challenge of multicloud lies in visibility and centralized monitoring. Or, to be more precise, the lack of it.
Although different cloud providers have comprehensive monitoring solutions, they only work within their cloud. For example, in AWS there is CloudWatch, and in Azure, there is Azure Monitor, and they both do more or less the same thing, except you can't view your Azure workload status in AWS CloudWatch.
There is essentially no centralized place to monitor across all clouds, meaning that we can't gain a comprehensive view of our entire cloud infrastructure and workload, leading to blind spots and potential security gaps.
Although, theoretically, we can configure different watches and rules in different clouds, that not only adds operational overhead but also means we don't have a central bird-eye overview, leading to the inability to promptly detect and respond to security incidents from different clouds or even malicious activities going unnoticed.
2.5 Lack of Standardized Security Policies and Auditing/Security Scanning Cross Clouds
There are certain security best practices that we would like to enforce no matter where we run our workloads.
For example, we probably don't want SSH to be opened for most of our VMs, we want to limit what ports can be opened on certain VMs, and we want to configure data encryption security policies for both in transit and at rest (for more information on the last point, see this blog here).
Technically, we can achieve all these in all clouds, but they are configured differently. Take SSH as an example, we can accomplish this with the security group in AWS, and the network security group in Azure, but they differ in definition and configuration, no matter if you do it in the web console or using Terraform code.
There isn't a universal configuration language that allows you to define this security rule as a policy which can be translated into all public clouds' implementation. There is no single source of truth, we must define the same rules again and again in different clouds, and we need to implement different auditing and scanning for these rules and policies in different clouds, which all bring overhead and challenges.
2.6 Secret Management Cross Clouds
Applications require configurations, and according to the twelve-factor app methodology, the modern and DevOps-y way to do so is to store config in environment variables, and if there is sensitive information, we need to fetch them from a secure place, like a secret manager.
For more information, see my series of blogs on secret managers and how to use them in various places:
- How to Handle Secrets with AWS Secrets Manager
- How to Handle Secrets with Google Cloud Secret Manager
- How to Handle Secrets with Azure Key Vault
Although there are solutions in all public clouds, the problem remains: what if I have the same workload running in different clouds requiring the same configuration? Do I use AWS Secrets Manager from an app running in Azure? How do I grant access? How do I transfer sensitive information from one cloud to the other? These are all the challenges we need to solve in a multicloud architecture.
2.7 Data Transfer Security
Securely transferring data between different cloud services is also a key challenge.
As mentioned in the previous section, maybe you need to fetch secrets defined in one cloud and use it in another. Or, maybe some data stores are configured in one cloud and need to be backed up to another or consumed in another.
These all involve securely transferring data from one cloud to another, which isn't a problem before the adoption of multicloud.
2.8 Deployment/Workload Security
Last but not least, let's take a closer look at our workload.
A multicloud deployment means when deploying stuff, we need to interact with several different vendors, each with different processes and services. For example, deploying a container in AWS can vary drastically compared to deploying a container in Azure, and this is something to be worked out if we want to operate efficiently in multiple clouds.
3. Multicloud Security Best Practices
After reading all these challenges of a multicloud architecture, you might feel discouraged and ask yourself: is it all worth it?
I feel that too, but don't worry, because if handled properly, we can mitigate the security risks and maximize the benefits of multicloud.
Next, let's have a look at some of the best practices you can adopt to improve your quality of life.
3.1 Use Centralized IAM/SSO
While it’s challenging to manage multiple accounts in different clouds, le problem can be mitigated to some extent by using single-sign-on (SSO). Major public cloud providers like AWS, GCP, and Azure all allow you to configure SSO with an external identity provider. You can map different identities to different roles in the cloud so that the access is managed at the single source of truth: your external identity provider (IdP).
This only solves half of the problem, because we still need to define and manage roles and access rules in different clouds, probably using different sets of IaC code. Still, this is already quite helpful because at least our identities are centralized.
I sincerely hope the next big thing in the cloud-native, open-source, DevSecOps landscape will be a unified IAM/security policy engine that supports “define once, apply multiple times in different clouds” - one human-readable configuration to rule them all, by defining roles and access and security policies and network policies then converting them to automation in different clouds. Something like this would solve so many problems once and for all. While it seems there are a few commercial products for this, there hasn’t emerged an open-source solution yet.
3.2 Build Centralized Monitoring
To address the challenge of effectively monitoring and gaining insights into multiple clouds, we need to build some centralized monitoring solution that aggregates data from all cloud platforms to provide a real-time (or at least near real-time) unified view of all cloud environments, enabling proactive threat detection and swift incident response across the multicloud perimeter.
And luckily, this is already a reality with open-source/CNCF technologies, and yes, that is Prometheus/Grafana.
For example, in AWS, the CloudWatch agent with Prometheus support automatically collects metrics from several services and workloads, and CloudWatch Container Insights monitoring for Prometheus automates the discovery of Prometheus metrics from containerized systems and workloads.
It’s the same story with other clouds. For example, with Azure, you can use the azure metrics exporter
for Prometheus to collect and send Azure metrics to your Prometheus server, achieving continuous visibility and monitoring from a centralized monitoring solution for all your cloud environments.
In this way, if you are using multiple clouds, you don’t have to rely on each cloud’s own monitoring/alerting system (like AWS’s CloudWatch), but rather, use the Prometheus/Grafana stack which you probably already have and are familiar with.
3.3 Use Open-source, Cloud-Agnostic Tools and Solutions
As always, open-source software is crucial in today's cloud-native world, and it could be the key to many problems we have mentioned above:
- On the workload side, we can use Kubernetes to even the gap between different cloud services workload platforms. Instead of using ECS in AWS, Cloud Run in GCP, or Container Apps in Azure, we can use Kubernetes in them all to orchestrate our workloads so that we don’t depend on any specific fully-managed platform that is only available in a certain cloud.
- On the deployment security side, we can use a unified continuous integration/continuous deployment solution that fits all clouds. For example, we can achieve this by using GitHub Actions for CI, and Argo CD for continuous deployment/GitOps. They are all cloud-agnostic CI/CD solutions that will work with any cloud, allowing a loose coupling between your code and where you run it.
- On the workload security side, we can utilize open-source solutions, like native Kubernetes network policies and other container runtime platforms (for example, sysdig/falco).
- On the security scanning side, it’s the same story: Although different cloud providers have different offerings (for example, AWS has a fleet of security-related services like inspector, guard duty, shield, and Azure has defender for cloud, sentinel, etc.), certain tools work cross-cloud. For example: for hard-coded secrets and misconfigurations scanning, there are
ggshield
, SecretScanner, trivy; for CVE scanning, there are trivy (yes, it does multiple things), clair, and grype; and for software composition analysis, there are tools like snyk and osv-scanner. - On the auditing side, there are even some specific multicloud security auditing and posture management tools out there like ScoutSuite and cloudsploit.
- On the secrets management side, there are also cloud-agnostic secret managers like the HashiCorp Vault.
Using cloud-agnostic, open-source tools can greatly improve your multicloud experiences because, in that way, it’s as if you are using just one unified cloud instead of many.
3.4 Infrastructure-as-Code (IaC) Security
Since we need to use IaC to orchestrate multiple clouds, there is no shortcut; continuous learning is the only solution because we need expertise in multiple clouds to make the IaC part of it more secure.
Although there are differences regarding the implementation details for different clouds, there are still some very good generic guidelines and best practices regarding IaC security that you could benefit from.
And, for a specific cloud, there are best practices for managing it with IaC tools, like Managing AWS IAM with Terraform. It may not 100% apply to GCP or Azure, but it's a good starting point that you can build upon and port the principles to other clouds.
Summary
As businesses increasingly adopt multicloud strategies to enhance flexibility and resilience, understanding the unique security challenges and implementing robust best practices is key. It's all about knowing the risks, staying on top of things, and putting good habits in place.
Multicloud environments offer good benefits but also pose complex security risks that require comprehensive solutions. By prioritizing visibility/monitoring, automation across all cloud platforms, and using open-source tech to help out, we can fortify our defences in the ever-evolving landscape of multicloud. Just a few simple steps can go a long way in making sure our system stays intact in this multicloud adventure.
If you enjoyed reading this article, please share it and subscribe. See you in the next piece!