How to create stateful clusters in AWS

Post • 3 min read

With stateful clusters, the idea is to create storage and network interface before a VM is created. The storage and ENI are then associated with the VM on start up.

Why do we use ENIs?

We use ENI's instead of reserved IP addresses because we cannot know if the IP address we specify in the template or template parameter will actually be available when the instance or ENI is created.  When the ENI is created, it will be assigned an IP address from the subnet. As long as the ENI is not deleted, the IP address will be reserved and associated with the ENI.

When a security group is assigned to an instance it is actually assigned to the first ENI on the instance. This means that we can create the security group for the cluster when we create the ENI's . It's also a good idea to create a client security group that is allowed in the ingress rules of the cluster/servers.

When the instances are created you shouldn't assign them to any Security groups or subnets as this all comes with the ENI that is attached to index 0.

Creating Storage Volume

To maintain the local state on each machine we need to also create a storage volume for each instance. This is not the root file system volume but an additional volume. You also want to have the option to either use a blank volume or one created from a snapshot.

It's important to check the volume on boot up if it has a filesystem on it or not.  If it doesn't have one, then this should indicate that it's blank and should be formatted.  If it has a file system it means that the disk was created from a snapshot. In this case, the file system should be grown so that it uses all the available space on the volume. This is because it's possible to create volumes that are larger than the snapshot.

Scaling your Storage

The storage can be scaled up in both size and/or IOPs. If you increase the volume size then you would also need to resize the filesystem. To do this we need to trigger resize2fs after the volume has been updated. To watch the volume for updates we need to configure the cfn-auto-reloader as described below.

Implementation

This pattern is not dependent on any tooling. However, depending on what tool is used, additional features might be available.

Deploying new AMIs

By separating the ENI and disk from the instance we can easily perform a rolling update by having one CloudFormation parameter for each instance AMI. You'll then be able to just update the stack three times, changing the AMI parameter for each instance per update.

CloudFormation:

DeletionPolicy

DeletionPolicy="Snapshot" is used on volumes, so in case CFN deletes the volumes it will create a final snapshot automatically.

CreationPolicy

The instance will not reach CREATE_COMPLETE state until it signals healthy

Online Scaling of Storage

Configuring cfn-hup to watch the volume associated with the instance enables us to scale up the storage without any outage. The storage can be scaled up in size or in iops.

To watch the volume we need to configure the cfn-auto-reloader as described below.

"/etc/cfn/hooks.d/cfn-auto-reloader.conf": {
    "content": Join("", [
        "[cfn-auto-reloader-hook]\n",
        "triggers=post.update\n",
        "path=Resources.{}\n".format(volume),
        "action=/opt/aws/bin/cfn-init ",
        " --stack ", Ref("AWS::StackName"),
        " --resource {} ".format(instance),
        " --configsets update ",
        " --region ", Ref("AWS::Region")
    ]),
    "mode": "000400",
    "owner": "root",
    "group": "root"
},

When the volume reaches the UPDATE_COMPLETE stage it will trigger the configset update that will grow the file system.

resize=cloudformation.InitConfig(
    commands={
        "resize": {
            "command": "/sbin/resize2fs /dev/xvdh",
            "env": {"HOME": "/root"}

        }

    }
),

The model would look something like this:

If you want to find out more about stateful clusters on AWS and how to create them, get in touch here. 

Martin Kåberg
Martin KåbergPrincipal R&D Architect
Related topics

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

Ilja Summala
Ilja’s passion and tech knowledge help customers transform how they manage infrastructure and develop apps in cloud.
Ilja Summala LinkedIn
Group CTO