A guide to EC2 disaster recovery using Terraform

Tech Community • 4 min read

16 February 2024

In the fast-paced world of cloud computing, everyone's looking for streamlined ways to manage their tech. Enter Terraform, a tool that's changing the game by simplifying how we handle infrastructure with its cool Infrastructure as Code (IaC) features.

In this article, we're diving into a practical Terraform success story that breaks the mould - using Terraform for ec2 disaster recovery.

The task is as follows: the infrastructure (dozens of services hosted on ec2), which cannot use multi-zone deployment for some constraints, must be restored from snapshots in the secondary availability zone if the primary one fails. This infrastructure is (you guessed it) managed by Terraform.

So, we need to develop such logic for terraform so that it can determine whether restoration from a snapshot is required and whether restoration is possible.

This is a basic TF configuration (EC2 instance with additional EBS volume) that we will enhance step by step to get to the desired functionality.


We assume that snapshots are already configured for root and additional volumes, so this part of the configuration is not included in this article.

resource "aws_instance" "ec2_instance" {
 ami = local.ami
 instance_type        = var.instance_type
 subnet_id            = var.subnet
 key_name             = var.ssh_key
 root_block_device {
   delete_on_termination = true
   volume_size           = var.root_volume_size
   volume_type           = "gp3"
   tags = {
     Name = "root_disk"
   }
 }
}
resource "aws_ebs_volume" "ebs_disk" {
 availability_zone = var.availability_zone
 size              = var.data_volume_size
 type              = "gp3"
 tags = {
   Name = "data_disk"
 }
}


resource "aws_volume_attachment" "ebs_disk_att" {
 device_name = var.ebs_mount_point
 volume_id   = aws_ebs_volume.ebs_disk.id
 instance_id = aws_instance.ec2_instance.id
}

First, we want to recover the additional EBS from the snapshot when:

  1. EBS volume does not exist
  2. Snapshots for this EBS volume exist

We can do this by adding the following to our initial TF configuration.

#check whether ebs volumes exist
data "aws_ebs_volumes" "volumes" {
  filter {
	name   = "tag:Name"
	values = [“data_disk“]
  }
  filter {
	name   = "availability-zone"
	values = [var.availability_zone]
  }
}
data "aws_ebs_volume" "volume" {
  count   	= length(data.aws_ebs_volumes.volumes.ids) != 0 ? 1 : 0
  most_recent = true
  filter {
	name   = "tag:Name"
	values = [“data_disk“]  }
  filter {
	name   = "availability-zone"
	values = [var.availability_zone]  }
}

#check whether snapshots exist
data "aws_ebs_snapshot_ids" "snapshots" {
  filter {
	name   = "tag:Name"
	values = [“data_disk“]  }
}
data "aws_ebs_snapshot" "ebs_snapshot" {
  count   	= length(data.aws_ebs_snapshot_ids.snapshots.ids) != 0 ? 1 : 0
  most_recent = true
  filter {
	name   = "tag:Name"
	values = [“data_disk“]  }
}

#Define a local variable for the snapshot id
locals {
  snapshot_id = length(data.aws_ebs_volumes.volumes.ids) == 0 && length(data.aws_ebs_snapshot_ids.snapshots.ids) != 0 ? "${data.aws_ebs_snapshot.ebs_snapshot[0].id}" : ""
}

resource "aws_ebs_volume" "ebs_disk" {
...
  snapshot_id   	= local.snapshot_id
...
}

This configuration leverages data resources for EBS and Snapshots to determine whether the resources exist. As you can see first we have to search for “aws_ebs_volumes” and only then for “aws_ebs_volume”, this is because the “aws_ebs_volumes” returns an array of EBS ids, and it is fine when it is empty and terraform will not fail, but the “aws_ebs_volume” returns only one id and terraform will raise an error if it is null (not found).

The same story is with "aws_ebs_snapshot_ids" and "aws_ebs_snapshot".

The snapshot_id attribute specifies the ID of an EBS snapshot from which the volume should be created. The value is assigned from the local variable local.snapshot_id.

This attribute will be ignored if its value is set to “”.

Now we can do something similar for root volume.

We want to recover EC2 from the snapshot when:

  1. Snapshot for the root volume exist
  2. AMI for EC2 is registered
  3. EC2 (target VM) does not exist
#check whether snapshot exist ami is registered
data "aws_ebs_snapshot_ids" "root_snapshots" {
  filter {
	name   = "tag:Name"
	values = [“root_disk“]  }
}
data "aws_ebs_snapshot" "ebs_root_volume" {
  count   	= length(data.aws_ebs_snapshot_ids.root_snapshots.ids) != 0 ? 1 : 0
  most_recent = true
  filter {
	name   = "tag:Name"
	values = [“root_disk“]
  }
}
resource "aws_ami" "ami" {
  count           	=  length(data.aws_ebs_snapshot.ebs_root_volume) != 0 ? 1 : 0
  name            	= “custom-ami"
  root_device_name    = "/dev/xvda"
  ebs_block_device {
	device_name = "/dev/xvda"
	snapshot_id = data.aws_ebs_snapshot.ebs_root_volume[0].id
	volume_size = var.root_volume_size
  }
}

#Define a local variable for ami and prevent the ec2 re-creating on ami change

locals {
  ami = length(aws_ami.ami) == 1 ? "${aws_ami.ami[0].id}" : "${var.ami}"
}

resource "aws_instance" "ec2_instance" {
...
  ami = local.ami
  lifecycle {
	ignore_changes = [
  	ami
	]
...
  }

ami = local.ami: The ami attribute specifies the Amazon Machine Image (AMI) to use for the EC2 instance. In this case, it's assigned the value of the local variable local.ami. 

lifecycle { ignore_changes = [ ami ] }: This block configures the lifecycle behavior of the resource. Specifically, it instructs Terraform to ignore ami changes to prevent Terraform from attempting to recreate the EC2 instance.

That’s it. As you can see the solution is simple and pretty straightforward.

The provided Terraform code example has been successfully tested. Terraform, recognized as a key player in Infrastructure as Code (IaC), demonstrated its versatility by addressing the non-standard use case of EC2 disaster recovery in a real-world project.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

Ilja Summala
Ilja’s passion and tech knowledge help customers transform how they manage infrastructure and develop apps in cloud.
Ilja Summala LinkedIn
Group CTO