Part 1 – GCP Networking Philosophy

This is a 2-Part Series on GCP Networking.

Part 1 – GCP Networking Philosophy

When working with cloud architecture, it’s important to see the world from different perspectives.  In software development, we often use personas, and that works very well for cloud architecture as well. However, I tend to use views from different roles and their perspectives. In this story I will use three views.   


Developers working against cloud environments are a de facto DevOps in the sense that they need programmable platforms for scaling, testing, availability, deployment etc. However most developers do not have years of system operations experience under their belt so it’s therefore important to remove things like networking from their view.

Network Administrator

The network administrator role in the cloud is extremely important.  At first glance it might seem that the role has become less important since you now can create complex networks with declarative infrastructure as code and the hyperscalers have virtually unlimited capacity. However a network is one unit ( mostly ) and trying to align multiple dev teams on network management is a recipe for failure. It is also not only a single vendor public cloud network, but very often also hybrid and multi-cloud.  The role of the network administrator is to provide networking as a service, that should just be there for every use case.

Security Architect

The final view is that of the security architect.  This view is all about ensuring and proving that all the controls are in place to be compliant with company security policy and standards.

As an example: the company network policy is:

  • Egress traffic to the internet must be filtered
  • Ingress traffic must have some form of DDoS protection
  • Ingress is only allowed to specified ports and protocols ( usually 80,443 )
  • Any traffic crossing our zones must be monitored by IPS/IDS, the following zones exist
    • Production
    • Non Production
    • Internet

Developer view on networking

  1. Should be out of scope
  2. Should not be a blocker
  3. Should provide connectivity to systems your application depend on
  4. Should be secure
  5. Should have ability to control ingress along with DNS and certificates

Network Administrators view on networking

  1. Only my team should touch networking
  2. We  can’t spend all our time on service requests
  3. We need visibility

Security Architects view

  1. Have provable compliance
  2. Secure defaults

Shared VPC

A shared VPC makes it possible to separate the network´s control plane into its own project.  This project is called the host project. Your workloads will run in service projects. A VM in a service project can place a NIC in a subnet belonging to a VPC in the host project.  There are three controls that will decide: 

Who in what project can use what subnet.

  • Who is controlled by Cloud IAM. Sufficient permissions in service projects to manage the VM like the Compute Administrator role. In the host project that same principal needs the Network User role on the subnet.
  • A pairing is needed to be done between the host and service project, declaring which one is host and which one is service.  A service project can only have  ONE host project. Pairing requires organization level permission.
  • Organization policy enforces what subnets can be used by what project.  Without this policy a user with access to non-prod and prod subnets via IAM  would be able to put workload in production projects in the non-prod network. Organization policy require organization level permission.

With just the shared VPC we have solved the following requirements.


  • Should be out of scope – All you need to do is setting the tags on your VM
  • Should not be a blocker – You have full control over your ingress and DNS, all you need to wait for is new firewall configurations
  • Should provide connectivity to systems your application depends on – Network admin ensures that the necessary routes are in place in the host project.
  • Should be secure – Now I can’t make it insecure…
  • Should have ability to control ingress along with DNS and certificates –  That’s all done in my service project

Network Administrator

  • Only my team should touch networking – Host project make the separation
  • We can’t spend all our time on service requests – Developers can launch VM’s in subnets provisioned for them as well as manage their own public DNS and load balancers. We only need to create firewall rules for them.

Shared VPC Diagram

Learn more at:

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    Part 2 – Two Different Types of GCP Network Designs

    This is a 2-Part Series on GCP Networking.

    Part 2 – Two Different Types of GCP Network Designs

    When designing your network in GCP, you need to decide if you want to go fully GCP native or use a virtual network appliance to manage your VPCs. GCP only has simple layer 4 firewalls, and we need to have all traffic going between our zones to go through our firewall for IPS/IDS functionality. There are two ways to do this:

    GCP Native Solution

    We set up 3 VPC’s in the host project: prod, shared services(sst) and non prod. We peer the prod and non prod to the sst.  In the sst network we create a VPN connection to our on premise where we have a firewall. Since it’s not possible to traverse a VPC in GCP the prod and the non prod can not reach each other. We setup the default route to go via on premise, from there traffic can be routed back GCP or to the internet. To limit traffic going through the VPN we enable private google access in the VPCs. 

    • This design is simple and the only additional cost is that of the VPN/Interconnect
    • It is compliant with company policy
    • On-prem becomes single point of failure if internet/other zone connectivity is needed

    GCP Native Diagram

    Firewall Appliance Solution

    With the firewall in the cloud our GCP networking gets more complex but we are not anymore dependent on the on premises connection. For this we need to add four more VPC’s and remove the sst. We create a service project that will host the firewall VM. The VM will have one NIC in each VPC except Connection HUB, but that VPC will be peered to hybrid and management. 

    • DMZ –  The NIC in this VPC have an external IPv4 address and act as NAT
    • Prod / Non Prod – The two internal zones
    • Management  – Its good practice to use a separate nic for management of the firewall
    • Hybrid – This is for our on premises connections
    • Connection Hub –  This VPC functions like the SST in the GCP native solution

    Each VPC except DMZ & Connection Hub have their default route set to the corresponding NIC of the firewall.  We keep the private google connectivity in all the VPC to minimize traffic through the VM.

    Firewall Appliance Diagram

    To keep this blog post simple, some important topics have been skipped.

    • It’s possible to exfiltrate data via a GCP service like cloud storage, this can be tightened by using VPC Service Controls
    • Multiple firewall VM’s should be used,  these can be placed behind an internal load balancer that can be setup as gateway in the VPC  routing
    • Cloud NAT can be used if egress to internet can be unfiltered
      • If cloud NAT is used, the default gateway will be the internet gateway. This means that public IP addresses can be added to VM’s.  Use organization policy to prevent  public IP’s on VM’s

    Learn more at:

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

      Next generation networking, food for thought?



      A few of the large announcements included Anthos and Cloud run. It is easy to get overwhelmed by the sheer amount of presentations and announcements.

      This year there were two presentations that I felt may have flown under the radar, but would be a shame to miss out on.


      Istio Service mesh for VM’s

      Service meshes and overlay networking have been around for a while. Tools like Istio and such, enabled engineers to create overlay networks between containers. These networks allow for software-based networking between services and higher level features like:
      circuit-breaking, latency-aware load balancing, and service discovery.

      One of the drawbacks of these tools was the fact that most of the relied on sidecar containers. As a result, setting this up for non-container workloads like VM’s was pretty difficult. In this talk Chris Crall and Jianfei Hu show an easy way of integrating Istio with VM’s. This means that we can now integrate almost anything into our service mesh. This includes things like databases, legacy workloads or anything else that runs on a VM.

      Even though it might seem like a minor feature, this is pretty game-breaking. Imagine migrating a large application landscape critical of legacy workloads into containers: Istio can do weight-based routing. This means that we can set up many endpoints for the same service, all receiving only part of the traffic. By doing this for an application we’re trying to migrate, we can compare the performance of the old- to the new containerised version.


      Zero-trust networking and PII

      Another video that would be easy to miss, but definitely worth a watch is the one by Roy Bryant from Scotiabank. They’ve started shifting recently from a financial institution to ‘a tech company that also does banking’. As shown by them starting to push code open-source to GitHub.

      Being a bank, they deal with a lot of PII (Personally identifiable information). As a result, security is one of their main concerns.  In the video they mention that besides using ML to tokenise things like CC numbers, they leverage intent-based zero trust networking. This might sound complex but in reality it is quite elegant.

      Traditionally, access between services or computers is enforced through firewalls and network configurations. With the emergence of software-defined networks, and layer-7 routing we can start thinking about other ways.

      In the video, they mention that instead of configuring firewalls, they started expressing intent: “I want service A to be able to read 10 records per second from service B for the next 5 minutes”

      By versioning these intents and abstracting the logic behind it away into libraries, we are no longer maintaining complex sets of firewall rules. Access is now governed in a transparent maintainable manner, allowing for an intuitive way of approaching security.



      A blogpost like this can only cover so much ground, and these are complex subjects. I recommend watching the videos mentioned here, and checking out the links in the reference below. I’d like to end this post with some food for thought:

      Currently in modern clouds, a large part of the security model relies on network security through firewalls and NACLs in addition to IAM.

      With the increasing usage of layer-7 overlay-networking I expect to see these two amalgamate into new multi-disciplinary security mechanisms.


      Get in Touch.

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

        If your cloudformation deployments are failing, this is why



        Update [16:00UTC]: AWS were quick to release a fix (aws-cfn-bootstrap-1.4-26) and -25 is still in the yum repositories. Unless you were unlucky and froze your environment today, the problem should solve itself.

        The latest version of aws-cfn-bootstrap package aws-cfn-bootstrap-1.4-25.17.amzn1.noarch that was signed November 2 around 21:00 UTC changed how cfn-signal works. cfn-signal now picks up the the instance profile role’s api keys and try to sign the request by default. This causes the signal to fail if the instances IAM role does not have cloudformation:SignalResource permission.

        cfn-signal has always supported signed requests but if access keys were not provided the following authentication method was used.

        cfn-signal does not require credentials, so you do not need to use the –access-key, –secret-key, –role, or –credential-file options. However, if no credentials are specified, AWS CloudFormation checks for stack membership and limits the scope of the call to the stack that the instance belongs to.

        This will only affect users that either build ami’s or update system packages on bootup. If you normally do a yum update replace it with yum -y upgrade –security or yum -y upgrade –exclude=aws-cfn-bootstrap

        You could also add the Iam policy statement below to your instance role.


        “Action”: [




        “Effect”: “Allow”,

        “Resource”: {

        “Fn::Sub”: “arn:aws:cloudformation:${AWS::Region}:${AWS::AccountId}:stack/${AWS::StackName}/*”



        Please contact Nordcloud for more information on CloudFormation

        Get in Touch.

        Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

          How to create stateful clusters in AWS



          With stateful clusters, the idea is to create storage and network interface before a VM is created. The storage and ENI are then associated with the VM on start up.

          Why do we use ENIs?

          We use ENI’s instead of reserved IP addresses because we cannot know if the IP address we specify in the template or template parameter will actually be available when the instance or ENI is created.  When the ENI is created, it will be assigned an IP address from the subnet. As long as the ENI is not deleted, the IP address will be reserved and associated with the ENI.

          When a security group is assigned to an instance it is actually assigned to the first ENI on the instance. This means that we can create the security group for the cluster when we create the ENI’s . It’s also a good idea to create a client security group that is allowed in the ingress rules of the cluster/servers.

          When the instances are created you shouldn’t assign them to any Security groups or subnets as this all comes with the ENI that is attached to index 0.

          Creating Storage Volume

          To maintain the local state on each machine we need to also create a storage volume for each instance. This is not the root file system volume but an additional volume. You also want to have the option to either use a blank volume or one created from a snapshot.

          It’s important to check the volume on boot up if it has a filesystem on it or not.  If it doesn’t have one, then this should indicate that it’s blank and should be formatted.  If it has a file system it means that the disk was created from a snapshot. In this case, the file system should be grown so that it uses all the available space on the volume. This is because it’s possible to create volumes that are larger than the snapshot.

          Scaling your Storage

          The storage can be scaled up in both size and/or IOPs. If you increase the volume size then you would also need to resize the filesystem. To do this we need to trigger resize2fs after the volume has been updated. To watch the volume for updates we need to configure the cfn-auto-reloader as described below.


          This pattern is not dependent on any tooling. However, depending on what tool is used, additional features might be available.

          Deploying new AMIs

          By separating the ENI and disk from the instance we can easily perform a rolling update by having one CloudFormation parameter for each instance AMI. You’ll then be able to just update the stack three times, changing the AMI parameter for each instance per update.



          DeletionPolicy=”Snapshot” is used on volumes, so in case CFN deletes the volumes it will create a final snapshot automatically.


          The instance will not reach CREATE_COMPLETE state until it signals healthy

          Online Scaling of Storage

          Configuring cfn-hup to watch the volume associated with the instance enables us to scale up the storage without any outage. The storage can be scaled up in size or in iops.

          To watch the volume we need to configure the cfn-auto-reloader as described below.

          "/etc/cfn/hooks.d/cfn-auto-reloader.conf": {
              "content": Join("", [
                  "action=/opt/aws/bin/cfn-init ",
                  " --stack ", Ref("AWS::StackName"),
                  " --resource {} ".format(instance),
                  " --configsets update ",
                  " --region ", Ref("AWS::Region")
              "mode": "000400",
              "owner": "root",
              "group": "root"

          When the volume reaches the UPDATE_COMPLETE stage it will trigger the configset update that will grow the file system.

                  "resize": {
                      "command": "/sbin/resize2fs /dev/xvdh",
                      "env": {"HOME": "/root"}

          The model would look something like this:

          If you want to find out more about stateful clusters on AWS and how to create them, get in touch here. 

          Get in Touch.

          Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.