Using Terraform to provision infrastructure resources

As we create more and more new service and require more and more infrastructure resources to support those services, we have started to use Terraform to manage our infrastructure.

In this article, I would like to give an overview of how we structure our Terraform setup.

It’s designed to build up a common vocabulary and understanding of why we do things the way we do them and provide a little bit of background information how and why we made the decisions that lead to the current setup.

As we’re using AWS to deploy our cloud infrastructure, most of the examples will relate to AWS but in principle should be provider-agnostic and can apply to other providers as well.

Structure

Environments

The first level of structure within our Terraform configurations is the environment.

There are multiple ways how to structure Terraform configuration, separating by environment is just one of them. We also considered separating them by usecase but a couple of factors lead us to choosing environment base configuration. One of the key contributors to that decision was that we want to avoid leaking anything from our staging environment to our production environment to make sure that whatever we do is verified, safe and secure before it touches the production setup. There is an an excellent article describing why separating the configuration by environment is a good idea.

For us an environment is a logical grouping of resources that are related to (and may depend on) each other.

We currently have three environments that we setup using Terraform:

Production

The production environment is - as the name suggests - the real thing: All resources created with Terraform for the production environment are used to support our day-to-day business. It’s where our services and data live.

Staging

The staging environment is our shared test system that is used both by us (the product development team) as well as other teams within BetterDoc that try new functionality to give us feedback before the feature is promoted to staging. We try to have the staging environment mirror the production environment as closely as possible.

A word on data security: We try to have the staging environment mirror the production environment as closely as possible. This also means that we have the same data security procedures in place for the staging environment as we do for the production environment.

Playground

The playground environment is where we try out new things regarding our configuration. It gives us a safe space to experiment with setting up new infrastructure components, finding out what we actually need and how to configure new resources. It will not mirror either the production or the staging environment but instead might look like some sort of a mess - which is okay, as it’s for us to experiment on.

Systems

The second level within our configuration hierarchy is called a system. A system is dependent on a business functionality and leads to infrastructure resources that are necessary to support that business functionality.

An example of a system is the website or the patient system. The website system groups all resources that are necessary to support a website (which could be resources like an API gateway, a webserver but also things like TLS certificates or some file storage).

A system is always connected to an environment, so the actual infrastructure resources for a system are living in an environment. As described in the environments section, we have two environments which are used for our day-to-day business: The production environment (where our actual patient data is stored and where the “real” processes live) and the staging environment in which we can test new processes and logics.

Taking the example of the website as described above, all infrastructure resources needed for the website would be existing on both environments. For example, there would be one webserver that is associated to the production environment and a second webserver that is associated to the staging environment (which is true for all other resources as well).

All resources should have a clear reference to their system and their environment. So if you see a webserver within the list of all AWS resources you should be able to infer (from the way the webserver is named and configured) to which system the webserver belongs (in our example the website) and to which environment the webserver is associated (e.g. staging).

AWS basic settings

Region

AWS provides the option to host resources in different regions. A region specifies the location where the actual physical hardware is located.

As we’re not only processing personal data but also personal health data we need to take every effort to make sure that both the data storage as well as the data processing is as secure as it can be.

This means that all data that we store, as well as all resources that access data need to be located in the EU. AWS provide a series of regions within the EU. For all our resources created at AWS we only use the eu-* regions.

We prefer the eu-central-1 (Frankfurt) region for our staging and production resources as it’s closest to where most of our users are located.

There are very few exceptions to this rule: For example TLS certificates need to be hosted in the us-east-1 region so that they can be used with Cloudfront. If we encounter such scenarios we need to make an informed and conscious decision to depart from our general rule that all resources must be located in the EU. Whatever that decision may be we still enforce that patient data is never stored, processed or accessed from any infrastrucure resource that is outside the eu-* regions.

As the whole Brexit drama is still unfolding and the legal situation is unclear we also exclude the eu-west-2 (London) from our hosting options.

Terraform configuration

Statefile

Terraform stores its state (what resources have been created, how they have been configured) in the stafefile (see https://blog.gruntwork.io/how-to-manage-terraform-state-28f5697e68fa). To make the state of our environments available not only on a single machine but to all potential users, we use a series of private AWS S3 buckets as storage backend for the statefile.

A separate buckets is used for each environment.

Within the bucket for an environment, the object identifier itself defines the actual storage location. The name of the object identifier is the prefix statefile/ plus the name of the system.

Let’s take an example: The bucket name for all systems in the staging environment is bd-blog-example-terraform-aws-staging. The system with all resources relevant for the website is called website so the object name of the statefile within the S3 bucket is statefile/website.

When configuring Terraform to use this statefile from the S3 bucket the correct way to define the statefile storage backend is:

terraform {
  backend "s3" {
    bucket = "bd-blog-example-terraform-aws-staging"
    key = "statefile/website"
    region = "eu-central-1"
  }
}

Storage

All Terraform configuration files for AWS resources are stored in a GitHub repository.

The repository is structured using the following directory structure:

Environments

Every subfolder within the environments folder contains the resources for a specific environment as defined in the environments section above, so the first and second level structures within the repository are:

ROOT
- environments
  - staging
  - production

The next level below a specific environment is the system for which the resources are defined. Assuming we have the two systems website and case the file structure looks like this:

ROOT
- environments
  - staging
    - case
    - website
  - production
    - case
    - website

Below the directory of each system the actual .tf configuration files are stored:

ROOT
- environments
  - staging
    - case
      - main.tf
      - vars.tf
    - website
      - main.tf
      - vars.tf
  - production
    - case
      - main.tf
      - vars.tf
    - website
      - main.tf
      - vars.tf

When applying a specific configuration you always move into the system directory and call the Terraform commands from that directory (this also implies that a Terraform configuration is always applied for a system as a whole).

So let’s assume that we want to update resources for the website system on the production environment. We will need to cd into the staging/website folder and execute the Terraform command:

$ cd environments/staging/case
$ terraform apply

Modules

A module represents a set of resources that are reusable (see https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d for a great introduction).

In our configuration setup we typically include modules from a system configuration inside an environment. The combination of an environment and a system defines key properties and settings that are passed to the module.

An example would be a database. Setting up a database might require several resources on AWS but the setup of these resources would always be the same and only differ in tiny details. So what we would do is create a module that contains the details of all the resources needed and reference that module from within the .tf files for an environment/system combination.

The modules themselves live inside the first level modules folder:

ROOT
- environments
  - ...
- modules

There is no hard rule of how the subfolders of the modules folder must be structured but we try to keep them logically grouped.

For example if we have two database modules, where the first module would contain all the resources for a PostgreSQL database and the seond module would contain all the resources for a MySQL datgabase, the folder structure inside the modules folder should look like this:

ROOT
- environments
  - ...
- modules
  - databases
    - mysql
      - main.tf
      - vars.tf
    - postgresql
      - main.tf
      - vars.tf

References to a module from within the environment/system configuration should be made relatively to the overall GitHub project directory.

So let’s assume that we want a MySQL database for the case system on the staging environment.

The configuration file for the environment/system combination would be stored at:

ROOT
- environments
  - staging
    - case
      - main.tf

The content of the main.tf file could look like this:

module "database" {
  source = "../../../modules/databases/mysql"
  database_name = "case-staging"
}

Resources

Tagging

AWS allows adding arbitrary tags to most resources. Terraform also supports adding tags when defining the resources. To allow a clear identification where a resources belongs to (which environment and which system) all the resources created by Terraform should have the tags betterdoc-environment and betterdoc-system added (with the corresponding name for the environment and the system).

In addition to signalize that the resource was created using Terraform a third tag named betterdoc-creator should always be set to terraform.

For example, creating an S3 bucket for the website system in the staging environment should contain the following tags:

resource "aws_s3_bucket" "betterdoc-example" {
  bucket = "bd-blog-example-staging-website-example"
  tags = {
    "betterdoc-creator" = "terraform"
    "betterdoc-environment" = "staging"
    "betterdoc-system" = "website"
  }
}

Naming

As AWS enforces distinct names for certain kinds of resources, both the environment as well as the system should be part of the name for a specific resources, like in the example above for the S3 bucket where the name is bd-blog-example-staging-website-example where both the environment (“staging”) as well as the system (“website”) are included.

The combination of environment and system should always be close together, separated by a dash (e.g. staging-website). Depending on the actual resource type and the usage it might make sense to put this on the beginning, the end or somewhere in the middle of the resource name. We try to use good judgement but keep the naming scheme consistent for one type of resource.

This article was originally posted at BetterDoc Product Development Blog