Terraform vs Google Cloud Deployment Manager

January 20, 2021

This short article contains a few practical observations on how Google Cloud Deployment Manager (DM) compares to Terraform as an Infrastructure-as-Code (IaC) tool on Google Cloud Platform (GCP). It uses the deployment of a demo web application (the free and open-source feed reader Miniflux) as an example. These observations were the result of a technology evaluation study I did at work to choose between these two tools. I won’t dwell too much on the the procedural aspects, such as step-by-step setup instructions, which can be found in the accompanying Github repo.

IaC is commonly understood to mean the process of managing and provisioning cloud computing resources through declarative, plain-text definition files rather than physical hardware configuration or interactive configuration tools. If you come from a front-end background, you can think of IaC as “React for cloud infrastructure.” You submit to the IaC tool a declarative definition of what cloud resources you want. The tool then performs a “diff” of the desired state of resources against the actual state and performs reconciliation (by creating, destroying or updating cloud resources as necessary) to make the two match. On GCP, the choice of IaC tools boils down to either Terraform or DM. Additionally, if you use Google Kubernetes Engine, Config Connector is another IaC option because it allows you to declaratively manage GCP infrastructure using Kubernetes. However, I won’t discuss Config Connector here because many GCP users won’t ever need to touch Kubernetes.

Miniflux is chosen as the demo web application for this case study because it’s both simple and complex in just the right ways:

First, Miniflux is laser-focused on a small core set of functionality, which keeps it small and simple. In fact, it’s unabashedly minimalist and rejects feature creep.
Second, despite the simplicity, Miniflux has many of the ingredients that a realistic web application has, such as a browser-based user interface, a web server and a database. On top of this, my setup adds security-minded features such as completely firewalling the database from the public internet. Taken together, these features allow me to demonstrate the usage of GCP resources that would be used by a realistic production web application, such as virtual private cloud (VPC) networks and CloudSQL databases. Furthermore, in addition to GCP resources that are well-supported by DM like VPC networks, my setup also uses less well-supported resources like service networking connections and serverless VPC access connectors, which allows me to “stress test” DM.
Third, Miniflux is developed according to the 12-factor principles. This makes it ready for cloud deployment out of the box and saves me from having to figure out how to retrofit a legacy web application to work in the cloud. This allows me to focus on the IaC tools and how they interact with the cloud infrastructure instead of focusing on the demo application and its functionality.
Fourth, Miniflux is a useful application in its own right. This means that instead of being a purely academic exercise, this article can give you something that’s personally useful. In fact, I use Miniflux myself and contributed the Terraform module created in the course of writing this article to their documentation.

Architecture

Below is a diagram showing how all the cloud resources are wired together based on the sample Terraform and DM configurations:

The infrastructure requirements to run Miniflux are fairly minimal: a Linux operating system and a PostgreSQL database. The end-user accesses Miniflux through a browser-based user interface served by an App Engine instance on a public IP address. My setup uses private services access, which allows service providers (Google itself in this case) to provide services (a PostgreSQL database) on internal IP addresses (192.168.16.3). This is a win in terms of security (my database is never exposed to the public internet and its associated risks), performance (communication using private IP addresses has lower latency than that using public IP addresses) and costs (no network egress traffic is charged). The database’s privacy is guaranteed because it actually resides in a completely separate VPC network managed by Google. That VPC network is in turn created by a project also managed by Google. Communication between my project’s network and the Google-managed VPC network containing the database is enabled by VPC network peering. To make this peering work, a private IP address range (192.168.16.0/20) is reserved in my VPC network so that Google can use that range to provision an IP address for the database (notice that 192.168.16.3 is within 192.168.16.0/20). Because the App Engine instance is not part of the VPC network and the database is only reachable via an internal IP address, I also create a serverless VPC access connector to allow the App Engine to communicate with the database. You can think of this connector as a tiny network address translation (NAT) machine just for the App Engine instance. (In fact, this connector is priced by Google as “one e2-micro instance per 100 Mbps of throughput”.)

Observations

Terraform pros:

It’s much more mature than DM.
It can be executed on any platform and can provision resources for all major cloud providers.

Terraform cons:

Writing custom imperative provider code in Go is more complex. As a result, cloud providers need to write Go code to support. This is never a problem in practice.

DM pros:

In theory, it can manage any kind of resource, not just cloud infrastructure, that exposes CRUD functionalities through OpenAPI- or Google Discovery-compatible APIs. DM makes it fairly easy to create a custom type provider for such APIs. However, realistically, you’ll only use it to manage GCP resources.
Unlike the restricted syntax of HCL (Terraform’s language), DM templates enjoy the full power of a general-purpose programming language (Python).

DM cons:

It can only be executed on GCP infrastructure and require GCP credentials to use.
It’s not very well supported even by Google. For example, using DM, serverless VPC access connectors and service networking connection require undocumented hacks called “actions” (which was supposed to be released in 2018 but never seemed to make it). In contrast, these GCP resources are supported out of the box with Terraform.
The reliance on CRUD REST APIs mean that if one or more of the CRUD capabilities are missing, management of that resources will get very tricky. For example, because the service networking API only provides the C (“create”) in CRUD, it’s not possible to tear it down with DM. In fact, deleting a deployment that uses a service networking with DM always errors out whereas this is not a problem with Terraform, which I suspect is due to some imperative logic to handle this special case.
Google itself doesn’t recommend DM for serious usage. Google published a very well-written Google Cloud security foundation white paper, which describes a comprehensive IaC strategy using Terraform. There’s no equivalence guidance using DM. If even Google recommends using Terraform over their own product DM, it’s a good strike against DM.

My recommendation: choose Terraform instead of Google Cloud Deployment Manager.