A common way to use AWS CloudFront is to use it as the front end for an Application Load Balancer. You would normally set it up as so:

Person talks to CloudFront talks to ALB talks to Containers/Backend

CloudFront is doing the perimeter work, including caching and WAF, which it then passes to the Origin - the ALB - which distributes it to the back end, in this case, a set of containers.

However, there is nothing stopping someone scanning the AWS IP space, connecting to every IP on port 80 and 443 to see what they can find. If they find your server, they would have bypassed the WAF, circumventing SQL Injection and Cross Site Scripting protection amongst other features. This is usually something you want to avoid.

We recently implemented one of the techniques below, with logging, and it was surprising how much of this happens in the “background noise” of operating a service on the internet. The best way to stop someone breaking in is to not let them get to the front door in the first place, so let's try to stop that.

Option 0: If you like it put a hostname on it

The ALB rules allow you to specify a hostname, so you could set that to your domain name, and blackhole anything which doesn’t match. This covers about 80% of cases, as all they have is the IP and a port, but doesn’t stop someone with some basic knowledge of your infrastructure from bypassing the WAF. This is really a good idea regardless.

locals {
  environment = "test"
  
  domains = [
    "foo.com",
    "bar.com,
  ]
  
  route_priority = 10
  vpc_id         = "vpc-123456"
}

resource "aws_lb" "alb" {
  name = "${local.environment}-alb"

  ...
  
  load_balancer_type = "application"
  
  ...
}

resource "aws_lb_listener" "https_listener" {
  load_balancer_arn = aws_lb.alb.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-2015-05"

  certificate_arn = var.certificate_arn

  # return a 403 by default
  default_action {
    type = "fixed-response"

    fixed_response {
      content_type = "text/plain"
      status_code  = "403"
    }
  }
}

resource "aws_lb_target_group" "target_group" {
  name     = "${local.environment}-service-name-tg"
  protocol = "HTTPS"
  port     = 4430

  health_check {
    ...
  }

  vpc_id = local.vpc_id
}

resource "aws_lb_listener_rule" "application_rule" {
  count        = length(local.domains)
  listener_arn = aws_lb_listener.https_listener.arn
  priority     = local.route_priority + count.index

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.target_group.arn
  }

  #you can only have one host header per rule, so make multiple rules?
  condition {
    field  = "host-header"
    values = [element(local.domains, count.index)]
  }
}

Option 1: CloudFront IP list

The official AWS way to do this is, of course, with a Lambda.

AWS publish a list of CIDR ranges for their services - EC2, CloudFront and others - and they send an SNS message when this list is updated. You can trigger a lambda from that SNS message, and update the Security Group on the ALB to only allow traffic from those ranges.

This works, however it has a couple of problems.

  • If the SNS message is dropped, you may have new or changed IP ranges which are sending you traffic, but you’re not allowing traffic from. AWS add new CloudFront POP’s on a regular basis, so this list is changing fairly often.
  • The list is fairly long, and a security group can only have 50 entries, so you have to manage splitting it over a number of security groups.

AWS have handled the second one in their sample lambda, but this has a number of moving parts which would be difficult to diagnose in the event of a problem.

I’d also recommend you add your own IP’s to this list, in case you need to bypass CloudFront for some reason, or put them in another security group.

Option 2: Injected header and WAF

CloudFront also has the option to inject a header with a fixed value before calling the origin. This header can be anything - prefixing it with x- is recommended - and the value can be anything. This makes it easy, for example, to inject a header with

x-my-token: a1b2c3d4

You can then pick that up on the ALB side with a valid WAF which blocks all traffic, allowing only traffic with that header.

You do need to plan to keep the header field in sync, but as it’s not a real secret, you can put it into Terraform, or manually switch them out to rotate it.

Note that the value may end up in your logs, if you are logging headers. This can be more a volume/size issue than anything - I’ve had to avoid calling it x-cloudfront-token: 7c9c2d3a-8033-4295-9b28-af8b0683353c , which makes sense as it’s readable and a UUID is fairly unique, but it also adds 60-odd bytes (or 120 if everything is UTF-16 encoded) per request. It’s not massive, but it’s redundant and unnecessary.

I’d strongly suggest you set this up by hand - add the header in CloudFront, verify that it’s being sent correctly, then connect the WAF, possibly in COUNT mode rather than BLOCK, until you’re confident that it’s working.

locals {
  environment        = "test"
  origin_domain_name = "loadbalancer_origin.foo.com"
  
  cloudfront_header_name = "x-cloudfront-token"
  cloudfront_token       = "7c9c2d3a-8033-4295-9b28-af8b0683353c"
}

resource "aws_cloudfront_distribution" "cloudfront" {
  
  ...
  
  enabled = true
  
  ...
  
  origin {
    domain_name = local.origin_domain_name
    
    
    ...
    
    custom_header {
      name  = local.cloudfront_header_name
      value = local.cloudfront_token
    }
  }
  ...
}

resource "aws_alb" "alb" {
  ...
}

resource "aws_wafregional_web_acl" "alb_waf" {
  metric_name = "albwaf"
  name        = "${local.environment}-web-alb-waf"

  default_action {
    type = "BLOCK"
  }

  rule {
    priority = 100
    rule_id  = aws_wafregional_rule.header_match_rule.id

    action {
      type = "ALLOW"
    }
  }
}

# bind it to the ALB
resource "aws_wafregional_web_acl_association" "web_waf_binding" {
  resource_arn = aws_alb.web_alb.arn
  web_acl_id   = aws_wafregional_web_acl.web_alb_waf.id
}

resource "aws_wafregional_rule" "header_match_rule" {
  depends_on  = ["aws_wafregional_byte_match_set.header_match_set"]
  name        = "${local.environment}_web_alb_header_match_rule"
  metric_name = "albheadermatchrule"

  predicate {
    negated = false
    type    = "ByteMatch"
    data_id = aws_wafregional_byte_match_set.header_match_set.id
  }
}


resource "aws_wafregional_byte_match_set" "header_match_set" {
  name  = "${local.environment}-web-alb_header_match_set"

  byte_match_tuples {
    text_transformation   = "NONE"
    target_string         = local.cloudfront_token
    positional_constraint = "EXACTLY"

    field_to_match {
      type = "HEADER"
      data = local.cloudfront_header_name
    }
  }
}

If you need to bypass this with CURL, you can use the header option to set the value

curl -k -H 'x-cloudfront-token: 7c9c2d3a-8033-4295-9b28-af8b0683353c' https://origin.server.com/foo/bar

This is the option we went with, and it's been working rather well. Disabling it if needed is a matter of just disconnecting the WAF-ALB association.

Option 3: Injected header and ALB rules

The last option combines a few of the above options. It’s a little less flexible, if you are using ALB routing rules for other things, but it’s a lot cheaper (WAF costs per request, tho not a lot) and it has one fewer moving part - no WAF involved.

This involves setting the same header in CloudFront as in Option 2, but checking it in the Target Group routing rule, so you are matching on both the host name and the presence of the header value.

You need to have a fall thru for this - we’d normally just return 403 or 444 if nothing matches - but it’s fairly simple to implement and reason about.

You would setup CloudFront in the same way as Option 2. The ALB no longer needs a WAF, you'd just add more conditions to your listener rule

At present, Terraform doesn't support this - they only have host-header and path-pattern - but I'd assume that it's coming (or I need to write some Go code)


There are a couple of ways you can stop people bypassing your CloudFront protections, all of which are easy to implement and reason about, and very robust in production.

They will keep your back-end more secure, and make sure that all visitors to your sites pass thru the correct checks and filters.