Small Team, Big Wins: Why Size Doesn’t Matter for Self-Hosted
|
Philippe GablainPhilippe is Engineering Lead at GitGuardian.He likes to share his passion for engineering and a more meaningful technology. |
You may not believe me, but during the last millennium, our ancestors used to ship software on physical devices like floppy disks or CD-ROMs (Windows 95 famously shipped on a whopping 35 floppy disks). We forgot all that when SaaS appeared in the early 2Ks, but back then, when you were sending out 10,000 disks with your new software, you had to be sure it was working flawlessly (unless you were prepared to ship another 10,000 disks with a “service pack”). Moreover, you had no means of investigating issues that cropped up on your customers' systems.
Today, when we think about Self-Hosted, we usually envision a similar scenario, and it’s not far from reality. Self-hosted is complex to distribute and troubleshoot. Most of the time, you have no visibility on what happens on your customers' installations. That’s why many SaaS companies can be reluctant to offer a self-hosted version of their product.
At GitGuardian we recognized very early on that self-hosted could be a crucial requirement to meet the needs of our largest customers, so we were determined to find a solution. And we did. For the past three years, we've been releasing a self-hosted version of our platform every single month.
In this post, we’ll try to explain how we've accomplished this.
Self-Hosted is Hard
There are many reasons why SaaS had such instant success in the industry. First, it eliminates the need for software distribution, as new versions are deployed centrally. Second, you own the platform, giving you precise control over the software environment. You also have extensive visibility into how your software is performing (or not). All these technical advantages have been a key driver in one of the most significant changes in the engineering process: the rise of Agile methodologies. When releasing is cheap, you can do it often and iterate frequently.
In contrast, with self-hosted software, you don’t control when customers will install your software or where they will run it. Fixing bugs is costly, so iterations must be carefully planned.
Even if the rise of containerization and orchestrators like Kubernetes allows you to have more control over how your software will run, you still need to ensure compatibility across each of your customers' clouds, each with their own network and security policies. So our first challenge is to guarantee software quality and reliability in a context we don’t fully control.
Our second biggest challenge was to be able to help customers at any stage of their GitGuardian journey. How do you troubleshoot issues when you don’t have access to the platform? The third, and perhaps less expected, challenge was to integrate our self-hosted requirements into our development and delivery processes. The two approaches are fundamentally different, so how do you make them work together harmoniously?
Getting the right team
A mix of competencies
Having the right people on board is the key to success. But what should the self-hosted team look like?
Should it be a developer team (we provide setup and configuration tools to our customers)? Or more of a DevOps team (we deal with Helm charts, Docker images, network, and security)? And what about reliability? Should it be a QA team (automating the testing and ensuring the quality of each release)? You guessed it, the answer is all of the above: full-stack, DevOps, and QA engineers, but first and foremost, a product team. Self-hosted is considered a product in its own right, with DevOps users and decisions driven by product and customer needs. Mixing such diverse talents allows the team to cover complex subjects and fosters a positive emulation.
Sharing the burdens, ensuring the expertise
To keep the team small (under 10 people), we need to ensure that all the product expertise is shared between the team members. That's why the team handles level 3 customer support on a rotating basis, with a designated "support master" who's always in charge of answering customer questions at any given time. It allows us to be more efficient at fixing issues and to stay customer-focused. The monthly release process works similarly, allowing team members to take turns writing the "New release available!" Slack message that details the latest feature pack.
However, we can't take all the credit for making the self-hosted release possible. We benefit from the expertise of all the engineering departments. The Dev Efficiency team supports our work on the CI, while the platform teams (SRE, Security) share knowledge and provide feedback. Other teams within the department are responsible for porting their features to self-hosted. And dozens of engineers at Replicated and Chainguard are creating the tools we rely on to distribute, troubleshoot, and secure our application.
Choosing the right tools
Leveraging Kubernetes, Helm, and the right partners is what made Self-Hosted possible for us. We can concentrate on application architecture and configurations while benefiting from Replicated license management, distribution, and troubleshooting capabilities, as well as Chainguard’s secured, FIPS-compliant base images.
Replicated also enables us to reach a wider audience by proposing packaging options like “embedded” (where you can pop a full-fledged Kubernetes with our application running on it with just one line of Shell from a bare Linux instance), or air-gapped (where you can deploy our software from a single tarball without being connected to the internet). It also provides an optional web UI for configuration, allowing teams without Helm knowledge to install and maintain the software.
Troubleshooting is made easier with Replicated integrated Support Bundles, which allow customers to package logs, Kubernetes metrics, and custom diagnostics into a tar file they can securely share with us.
As a security company, protecting our software against Common Vulnerabilities and Exposures (CVEs) is paramount. By partnering with Chainguard and using their distroless base-OS images to build the GitGuardian images we provide to our customers, we can release software with nearly zero vulnerabilities. This saves us valuable time that would otherwise be spent constantly addressing new CVEs
Walking The Tightrope to Balance Self-Hosted and SaaS
Automate everything
We made an intense effort to automate everything possible to eliminate error-prone manual tasks and to extend test coverage. GitLab CI/CD pipelines manage the release process, while Terraform and Ansible tools have been developed to spin up temporary customer-specific environments and run smoke tests against them. On each release, we’re now able to test installations and upgrades on most cloud providers with various configurations (embedded, air-gapped, etc.). Thanks to automation, validating a release candidate takes no more than a day. Additional nightly jobs ensure that the DB compatibility matrix is respected and alerts if a released Docker image is affected by a new CVE.
Supporting upgrades from old versions
In self-hosted, unlike SaaS, you have to support the fact that some customers will upgrade only once in a while, from a very old release to the latest version. Supporting such migration paths can be challenging for developers, who need to deliver code that supports these paths, and for QA, who must test multiple migration scenarios. That’s why we decided to narrow the problem down by making every 3 versions ‘required.’ If a customer wants to upgrade from an earlier version, the deployment will stop and ask them to upgrade to the next required version first, then the following, until the application reaches its final state. Thanks to these required versions, QA only needs to test upgrades from the latest required version to ensure all upgrade paths are possible. As for the developers, a required version is where they can clean up their code and manage breaking changes in the database or other stateful components.
Self-Hosted 101 for SaaS developers
As SaaS and Self-Hosted have different timeframes, it can be very challenging for SaaS developers to understand how their features will function in self-hosted. Enabling them to understand the self-hosted lifecycle and foresee their changes’ behavior is a key success criterion. In that sense, releasing a self-hosted version of your software may impact most of your development processes. At GitGuardian, developers own their features from design to support, and porting to self-hosted is an important part of that. Strict rules have been established, unlocking better ownership and quality:
- No breaking changes outside required releases
- Ownership of customer troubleshooting for their features
- Every change is behind a feature flag. Don’t bother asking if you should cherry-pick (or revert) a change to self-hosted; just activate the feature flag when the Product Manager says so. This greatly reduces the risk of drift between SaaS and self-hosted, as long as you remember to activate your feature flags.
- Troubleshooting and telemetry are part of the Definition of Done. Developers must provide ways for the customer to understand what’s wrong and for GitGuardian to gain usage insights.
- Health Checks and Telemetry frameworks are delivered for these purposes.
Thorough and clear documentation
Since we can't directly access customers' environments, we must rely heavily on the Sales team and documentation to give customers the tools and knowledge they need to install and configure the application themselves. Documentation must be considered part of the product, designed with users in mind, and thoroughly tested.
Due to the diversity of environments where a self-hosted application can be deployed, it is key to detail all the necessary requirements. For example, what database is needed? (industry-leading PostgreSQL and Redis for caching are used) What version of Kubernetes is supported? How can the application be deployed without a cluster available or when using specific tools like the popular ArgoCD? We provide all this information in our public documentation where users can find details on installing, managing, and troubleshooting the GitGuardian application.
Simple is beautiful
On a broader scale, we must make sure new features don’t damage an instance’s stability. This is perhaps the most important challenge: in an era of microservices and hexagonal architectures, a product that wants to be shipped as self-hosted must remain boringly simple. You must be able to run it on a small infrastructure, and you won’t succeed if you have dozens of microservices, each with its own database, event brokers, etc. At GitGuardian, we only ship two images; frontend and backend, relying on a single PostgreSQL database and a Redis instance for caching and brokering. Thanks to Helm's powerful templating capabilities, the application topology allows scalability, but the smallest customers only need 9 pods to run the GitGuardian Platform. Each new feature (and the teams are adding a lot of them) must be carefully designed to comply with simplicity requirements.
It turns out that self-hosted and SaaS are not so opposed, after all. They can actually benefit each other, with self-hosted bringing robust processes and high technical standards to the table. And when a SaaS has those in place, even a small team can successfully deliver to the biggest and most demanding customers.