Serverless Terraform Repository

Up until recently, we have been using Terraform modules in a fairly “normal”, basic way. We either use direct references, or we use git references, depending on if the module is in the local repository or not.

Out folder structure ends up like this:

project/
  environments/
    test/
    prod/
    dev/
  modules/
    foo/
    bar/

So the module imports end up looking like this:

module "foo" {
  source = "../modules/foo"
  ...
}

module "bar" {
  source = "[email protected]:hashicorp/example.git"
  ...
}

This is “fine”, but presents a few problems. The first only works if everything is in the same repo, and we have at least 3 repo’s which pull things in from our central repository.

The second works well for cross-repository use, and you can also specify a tag for the version. However, if you pull in 50 copies of the same module, or a module from the same repository, you get 50 copies of the entire repository with history in your .terraform state folder. A 2GB .terraform folder which takes a good 40 mins to init from nothing isn’t too uncommon. Plus, even with an terraform init or get, these do not update to the latest version in a sensible manner.

Hashicorp have the concept of a terraform registry, too - its a specific api, over https, which lets you request a specific version of a module, query it for newer versions, and manage your modules. Hashicorp provide their own public (free) and private (paid) module repository with their Terraform Cloud offering.


# public registry
module "foo" {
  source  = "fastchicken/api-gateway/aws"
  version = "1.0.5"
  ...
}

# named private registry, normally at tf.fastchicken.co.nz, but you can
# configure that.
module "bar" {
  source  = "tf.fastchicken.co.nz/fastchicken/api-gateway/aws"
  version = "1.0.5"
  ...
}

This is a lot better. It can request the version you ask for, do nice server things with the version for updates, it only stores that version (and only one copy of it if you reference it multiple times), and its as fast as the back end server allows. You can so the usual semver things with the version number, like ~> 1.0 and the like.

Hashicorp wrapped their original one up in a paid plan, and backed it onto Github, but last year, they also put it on the free tier, as well as their “Team” tier. As we didn’t want the other features of their cloud, the cost for us would have been somewhat prohibitive for just the registry.

As the protocol for the registry is public and published, a few people have had a run at writing a server for it. We looked at a few previously, but they were either unmaintained with bugs we’d have to fix, or not quite complete.

Then we found https://github.com/apparentlymart/terraform-aws-tf-registry, which is a no-code implementation of the registry. It uses AWS services, and is setup using Terraform. This ticks almost all our boxes, as thats largely our stack for everything else we do.

The basic architecture is

API gateway providing the front end
The API Gateway DynamoDB integration querying a fairly basic Dynamo table
Static storage of the files in S3.

While there is a bit of setup, the layout is pretty straightforward and zero maintenance.

There are a couple of problems with this for us. We need to have some form of authentication, as the service needs to be on the public internet. And we need a way to get module code into the registry in the first place.

API Gateway supports calling a lambda for authentication, so we wrote a simple lambda which looks for a given secret which we store centrally, and matches it with the provided secret. We then did some tooling for developers to pull that secret down (using their IAM credentials), and another lambda to generate a new token every week. This works rather well. AWS have samples of it in all the major Lambda languages. The original repo documentation also details it.

Largely, that’s it for the front end part - weekly generated, token based authentication, and nice and fast semver-based module download.

Getting the modules into the repository is always going to be a problem, regardless of which registry we used.

A module is just a zip file, often in a specific name/path. In most cases, the path is the combination of the name, provider and namespace, and the filename is just the version. The zip contains the .tf files (with no embedded folders) which make up the module.

/
  fastchicken/
    api-gateway/
      aws/
        1.0.56.zip
        1.0.57.zip
    alb/
      aws/
        1.0.67.zip

We use Github to store our main repository, and Jenkins to build most things. At the root of each module we want to expose, we’ve added a tf-module.json file, which is a concept we've created rather than a terraform thing, and it looks like so:

{
  "version": "2.0.x",
  "name": "api-gateway",
  "provider": "aws",
  "namespace": "fastchicken"
}

Jenkins is setup to, for each PR merged, look for any folders with a tf-module.json file in the change set, building a list of folders from that.

Those folders are then passed to terraform validate to ensure they are valid terraform, and finally, the result is zipped, uploaded to a specific location in S3 based on the module properties, and a record is added to Dynamo which tells the API Gateway code that the version exists. Those are just done using the AWS CLI. The version is just the Jenkins build number, combined with the version above.

To get the modules down, we have tell terraform where to look.

Terraform sees the registry name of tf.fastchicken.co.nz it looks in the ~/.terraformrc for a host matching that name. That host name doesn’t need to be real, it can map to another url if needed.


# Associate the name with our API Gateway endpoint
host "tf.fastchicken.co.nz" {
  services = {
    "modules.v1" = "https://xxxxxx.execute-api.us-west-2.amazonaws.com/live/modules.v1/"
  }
}

credentials "tf.fastchicken.co.nz" {
  token = "xxxxxx" # the token, which is passed to the lambda function eventually
}

Terraform then goes off to query the modules.v1 endpoint, which returns a list of available modules, amongst other bits. Hashicorp has good documentation around the API for this.

The end url that the API serves is a direct link to the module in S3 - we could do a pass-thru proxy, but we didn’t think it was worth it from a risk perspective. We have a policy of not having secrets in Terraform (enforced by code review), so the risk would only be someone getting the source, not the parameters we pass in, any of the naming, account numbers or other secrets.

This has, so far, proven to be quite a good way to do this. The tools Hashicorp provides for most people could be suitable, but they didn't really fit our needs. I could also see a situation using Github Actions rather than Jenkins, or any other CI system.