Why Your Cloud is a Mess
Any sizable cloud infrastructure maintained manually is usually a mess.
What I’ve seen happen is this: A developer manually sets up an instance or two to get his/her project going. He/she manually set up firewall rules (security group on AWS) and subnets to go with this.
Then someone else comes along and brings up some other instances, and adds some other subnets and firewall rules.
Once this goes on a few times, you have lots of firewall rules and subnet configurations. If you let this go on, this grows to the point where you have a big mess that no one knows how anything is connected to what.
Quite often, everyone is too scared of touching anything because it might break something. So we keep inserting compute/DB instances and subnets and firewall rules making matters worse.
How do I know all this? Well, because I’m often that guy who set things up by hand in the very beginning. The cliché, of course, is I’ll just get by
Guilty as charged, but I’m sure you’ve been there too, no?
Start Off Right
So how do we I avoid ending up with this mess?
The root cause is that the initial setup is not automated and you start off creating instances by hand.
We all know DevOps automation is a tremendous help in maintaining Cloud infrastructure. You really need to start off using automation from the start. It really isn’t that hard.
We all area lazy creatures, but if we make this so it’s easier to just add a few lines of code to automation script than to futz with the cloud console UI, we probably will do the right thing.
So I wrote a simple set of TerraForm scripts to bootstrap a generic environment, so no manual setup is ever needed.
What follows is an example using TerraForm on Google Cloud Platform. In the example below, we bootstrap a multi-region network (VPC) with public/private subnets and instances.
Creating Environment
The example creates a standalone environment that contains the following on Google Cloud Platform:
- A VPC Network
- Two public subnets, one in region 1 and another in region 2
- Two private subnets, one in region 1 and another in region 2
- A firewall rule to allow traffic between all subnets
- Firewall rules to allow ssh and
http - A compute instance-1 in region 1 on public-subnet-1
- A compute instance-1 in region 2 on public-subnet-2
TerraForm Scripts
Pre-requisite
In order to run the TerraForm scripts, you need Google Cloud Platform
You also need a service account with proper permission and its credentials downloaded locally. The credential file is referred to in main.tf.
Directory Organization
To make things easier, TerraForm scripts are separated into:
- main.tf – loads TerraForm configuration providers for GCP
- variables.tf – defines variables used by scripts
- main.tf – loads TerraForm configuration providers for GCP
- vpc.tf – Defines VPC and firewall rules
- r1_network.tf – Defines the subnets for region 1
- r2_network.tf – Defines the subnets for region 2
- r1_instance.tf – Defines the instance to start in region 1
- r2_instance.tf – Defines the instance to start in regoin 2
main.tf
We define two providers for GCP. A lot of features Google Cloud SDK are made available while in beta. This requires us to define the beta provider in order to access these beta features.
The credential is for the service account that is allowed to create/delete the compute resources in the GCP project.
provider "google" { project = "${var.project}" credentials = "${file("your_service_acct.json")}" } provider "google-beta" { project = "${var.project}" credentials = "${file("your_service_acct.json")}" }
variables.tf
The variables should be pretty obvious. Note that these can be
variable "project" { default = "gcp-project-id" } variable "region1" { default = "us-west2" } variable "region2" { default = "us-central1" } variable "env" { default = "dev" } variable "company" { default = "akiatoji" } variable "r1_private_subnet" { default = "10.26.1.0/24" } variable "r1_public_subnet" { default = "10.26.2.0/24" } variable "r2_private_subnet" { default = "10.28.1.0/24" } variable "r2_public_subnet" { default = "10.28.2.0/24" }
vpc.tf
We create the global VPC network. We also define and assign firewall rules to it.
resource "google_compute_network" "vpc" { name = "${format("%s","${var.company}-${var.env}-vpc")}" auto_create_subnetworks = "false" routing_mode = "GLOBAL" } resource "google_compute_firewall" "allow-internal" { name = "${var.company}-fw-allow-internal" network = "${google_compute_network.vpc.name}" allow { protocol = "icmp" } allow { protocol = "tcp" ports = ["0-65535"] } allow { protocol = "udp" ports = ["0-65535"] } source_ranges = [ "${var.r1_private_subnet}", "${var.r1_public_subnet}", "${var.r2_private_subnet}", "${var.r2_public_subnet}" ] } resource "google_compute_firewall" "allow-http" { name = "${var.company}-fw-allow-http" network = "${google_compute_network.vpc.name}" allow { protocol = "tcp" ports = ["80"] } target_tags = ["http"] } resource "google_compute_firewall" "allow-bastion" { name = "${var.company}-fw-allow-bastion" network = "${google_compute_network.vpc.name}" allow { protocol = "tcp" ports = ["22"] } target_tags = ["ssh"] }
r1_network.tf
As you most likely already know, with GCP you can create multiple VPC networks within a project. If you are coming from AWS, this looks the same at first, but there is a big difference. VPC network is global on GCP where VPC is regional on AWS.
This means your environment is multi-regional from the very beginning on GCP. To place resources in multiple regions, you need to create subnets in each region. All subnets route to each other globally by default, all you have to do is create subnets in regions of your choice. (As a side note, each project comes with a default network that covers every GCP region.)
In this example, we create two networks (private/public) in two regions (r1/r2). Regions are defined in variables.tf.
The idea here is to attach private instances (with no public IP address) to private subnet, while public instances are assigned to the
This is one area where we often see manual configurations get out of hand because intended network topology isn’t always followed. Having these subnets and instances defined in the Terraform script makes it far easier to maintain this.
resource "google_compute_subnetwork" "public_subnet_r1" { name = "${format("%s","${var.company}-${var.env}-${var.region1}-pub-net")}" ip_cidr_range = "${var.r1_public_subnet}" network = "${google_compute_network.vpc.name}" region = "${var.region1}" } resource "google_compute_subnetwork" "private_subnet_r1" { name = "${format("%s","${var.company}-${var.env}-${var.region1}-pri-net")}" ip_cidr_range = "${var.r1_private_subnet}" network = "${google_compute_network.vpc.name}" region = "${var.region1}" }
r1_instance.tf
A compute instance is brought up in each region, and Nginx is installed. These instances are handy when developing TerraForm script. We can log into these instances to test the network configuration, and also these instances can serve as boilerplate code for creating more instances. GCP does give you a pretty good indication of applied firewall rules in the
resource "google_compute_instance" "web_ssh_r1" { name = "${format("%s","${var.company}-${var.env}-${var.region1}-instance1")}" machine_type = "n1-standard-1" #zone = "${element(var.var_zones, count.index)}" zone = "${format("%s","${var.region1}-b")}" tags = [ "ssh", "http"] boot_disk { initialize_params { image = "debian-9-stretch-v20190326" } } labels { webserver = "true" } metadata { startup-script = <<SCRIPT apt-get -y update apt-get -y install nginx export HOSTNAME=$(hostname | tr -d '\n') export PRIVATE_IP=$(curl -sf -H 'Metadata-Flavor:Google' http://metadata/computeMetadata/v1/instance/network-interfaces/0/ip | tr -d '\n') echo "Welcome to $HOSTNAME - $PRIVATE_IP" > /usr/share/nginx/www/index.html service nginx start SCRIPT } network_interface { subnetwork = "${google_compute_subnetwork.public_subnet_r1.name}" access_config { // Ephemeral IP } } scheduling { preemptible = true automatic_restart = false } }
Running TerraForm
Once the TerraForm scripts are in place, bringing up the environment is easy:
$> terraform init $> terraform apply
This also executes really fast on GCP. It took mere just 70s…!
Once created, you can see in the console that two sets of subnets are created in two regions.
Also, two instances are created in two regions. At this point you can SSH to one instance and access the other.
You can delete this environment with:
$> terraform destroy
Summary
There aren’t as many examples of setting up a GCP environment with TerraForm. So initially, I had some issues and was a bit skeptical as to how well TerraForm would work with GCP.
Once I got the basics up and running though, I was pleasantly surprised by how well it worked, and also how fast GCP created resources and spinned up instances.
These TerraForm scripts also allow you to create a completely separate clone VPC with a mere command line argument. You can now build dev, test, integration or production environments with ease.
All this, of course, is possible if you start off your project with an automated build of the cloud environment.
Cheers,
This is a handy guide, but it misses one big part of automation: the ability for others to use it. The terraform state will be saved onto the executing machine, meaning that without adopting the instances later, there’s no way for another developer to deploy infrastructure updates.
Do you have any basic bootstrap code for setting up remote state in GCP (such as TF code to create the necessary users, roles, bucket, etc)?