Migrating on-prem to AWS with ease, using CloudEndure


BlogTech Community

The Why

We at Nordcloud have a multitude of customers that turn to us for migrating their on-premise workloads to a public cloud provider. Every migration case is different, each with their own unique starting points, technical challenges and requirements. Starting with an in-depth assessment of the current environment, we agreed with the customer that most of the workloads will be rehosted to AWS. During this process, the team has assessed the available tooling and services that could best facilitate a smooth and efficient migration. We found CloudEndure to be the best tool for lift-and-shifting more than a hundred servers.

The How

Lift-and-shift, aka. rehost means taking servers as they are in the current environment, and spinning up exact copies in the new environment. In our case the migration was from an on-premise datacenter to AWS. The on-premise datacenter architecture relied on VMware to virtualize the workloads but the decision was made not to migrate to VMWare on AWS, but directly to EC2. CloudEndure’s strength lies exactly in this. It makes the process of creating a snapshot of the server, transferring it to AWS, performing OS level changes – that are required because of the underlying infrastructure change – and ultimately starting the new instance up.

It executes the whole process without affecting anything on the source instance, enabling for minimal cutover downtime. The source doesn’t need to be powered off, restarted or modified in any way shape or form for a fresh copy of the instance to be started on AWS. Our cutover processes fell into a couple categories. For example, if the server in-scope was a stateful one and also required data consistency (eg. self-hosted databases), we first shut down the processes of the application, waited a very short time for CloudEndure to sync the changes to the target environment, then started up the instance in the new environment.

The way CloudEndure works is really simple. You execute the provided script on the source machine which installs and configures the CloudEndure agent. This agent in-turn will start communicating with CloudEndure’s public endpoints, aka. “Control Tower”. You set up a project within the console and the instances will automatically get added in the console. After you see your instance, the replication will start. CloudEndure takes a 1:1 snapshot of the instance and continuously replicates new changes, as long as it’s not stopped manually or the instance is powered off. It offers you an always up to date copy – with approx. 5 min latency – that you can spin up any time with a single button. During the cutover process, the OS level changes are performed. It will also ensure that Windows Servers will be on a licensed AMI. This is especially useful because it eases the burden of licensing, which many customers face. This functionality however is only supported for Windows and no other operating systems. (more on this in the Challenges section)

The Tech

From a technical perspective, CloudEndure’s console [Control Tower] makes API calls on your behalf to AWS services. It doesn’t deploy a CloudFormation stack or need anything special. It uses an IAM access and secret key pair, for which there is a pre-made IAM policy template following IAM best practices. After you set up your project, a Replicator EC2 instance will be spun up in your target region. This instance will be handling the temporary storage of the to-be-rehosted instances’ snapshots. Each disk on each source server [you deploy the agent to and start replication] will be created as an EBS volume and attached to the Replicator instance. You can specify what instance type you want your Replicator instance(s) to be, and I would highly recommend going with AMD based instances. In our experience, they have been noticeably faster in the cutover, plus they’re simply cheaper. Since your Replicator instance(s) will be running constantly throughout the whole period of the migration – which can take a long time – make sure to adjust the sizing to your needs. The second part of the game is the Machine Converter instance, which does the heavy lifting for the changes required on the OS and disk level. It is amazing how well it works, especially for Windows instances. It performs all the necessary modifications that are required by the complete change of the underlying hypervisor. Windows does not lose its domain join in the process either, the new server stands up as if it was a reboot.

From the moment you initiate a cutover from the console [or through the API], it generally takes ~20-30 minutes for Windows instances, and 5-10 minutes for Linux instances to be up and running in AWS. The latest snapshot at the start of the cutover is taken, the machine converter does its black magic, and voila! The new server is up and running.

CloudEndure Demo

In this short demo I’ll demonstrate the process of an instance’s migration. I’ve set up a simple web server on a RHEL7 instance in GCP, which we’ll be migrating to AWS. Here’s the instance in the GCP console, and the contents of its web server.

The source instance in GCP
A very simple web server running on the source

First you create a new project in the CloudEndure console after you sign up. With the plus sign you can create multiple projects. Each project is tied to a single target AWS account. If you want to spread your workloads through multiple accounts, this is a must. Then you paste the pair of access and secret keys for your IAM user, which has the following IAM policy attached.

Paste the credentials of the IAM user here

Then you’re presented with the configuration. This is where you set the target region in AWS and your…well…settings. As I mentioned earlier, I recommend AMD instances and a beefy Machine Converter, as the latter will only run for short periods of time but makes a big difference in speed. Sometimes, you can throw money at the problem. You can also set it to use the dedicated Direct Connect connection so the migration doesn’t impact your regular internet. If you do need to use the internet, you can also restrict the max bandwidth CE will use.

Define your replication settings

Under “Machines” you’ll get the instructions on how to deploy the agent on the source server(s). It’s a simple two liner and it can be non-interactive. It’ll automatically assign the server with the console and even start replication to the target region. Or not, as you can turn off pieces of this automation by using flags during the execution of the install script.

Instructions on how to install the agent on the source machines

In this example I’m running it manually via a terminal on the source machine. You can however put this into your existing Ansible workflows for example.

After the agent is installed, CE can configure everything automatically and also begin the replication by deploying the Replication Server in your region – if there isn’t one already. Each disk on the source servers will be mounted on the Replication Server as a separate EBS volume. Since there is a maximum amount of volumes you can mount to an instance, multiple Replication Servers will be deployed if needed. CE will replicate the contents of each disk to each EBS volume, and continuously update if there are changes on the source disk.

You can monitor the replication in the console

By clicking on the instance in the console, you get to configure the Blueprint for the cutover. This is where you define your target instance’s details. The options are essentially 1:1 to what you’d be able to customize when launching an EC2 instance directly. It can also add an IAM role to the instance, but be aware that the provided IAM policy does not allow that (gotcha!).

Use your preconfigured resources or create new, all can do

After the console reports “Continuous Data Replication” for the instance, you can initiate a cutover any time with the “Launch target machine” button. You can choose between “Test Mode” and “Cutover”, however the only difference between the two is how the history is displayed in the CE console. There is no technical difference between the two cutover types. You can monitor the progress in the “Job Progress” section on the left.

Fire away and lean back to witness the magic – for 8 minutes

This is what the process looks like on the AWS side. Firstly, there was a snapshot taken of the respective [source] EBS volume from the Replication Server. Then the Machine Converter comes in and terminates as it finishes. Finally the cloudendure-source instance is started (even the hostname persists from source).

What a beautiful herd

Navigating to the DNS name or IP of the new server, we can see that the same page is served by the new instance. Of course this was a very limited demo, but the instance in AWS is indeed an exact replica of the source.

Learnings about CloudEndure

  • We only learned about CloudEndure’s post-launch-script functionality quite late in the project. We utilized it once to run a complex application’s migration, including self-hosted databases and multiple app servers. It allowed us to complete the cutover and all the post-migration tasks in under two hours. We have set up and tested the cutover process in-depth. At the time of the cutover this complex environment started in AWS with all the required configuration changes, without any need for manual input. With necessary preparation, this can allow for minimal service disruptions when migrating traditional workloads to the cloud.
  • CloudEndure has more potential. While our team has not explored other usage patterns, it could also be implemented as a disaster recovery tool. Eg. your EC2 instances in Frankfurt could be continuously replicating to Ireland. In case there’s a region-wide, business impacting outage in Frankfurt, you could spin up your servers in Ireland reliably and [most importantly] fast.
  • Windows is better supported than Linux based instances. The migration of Windows Server (from 2008 R2) will also make sure that the AMI of the AWS instance matches the OS. This is important for licensing, but it’s unfortunately not supported on Linux based instances. There were quite a lot of Red Hat servers in the scope of this migration, and we realized that AWS was missing the knowledge that they were running RHEL (the “AMI” in the instance details reported “unknown” essentially). Therefore licensing was not automatically channeled through AWS, and the team had to “fix them” in a timely manner. When we became aware of the issue, we learned that CloudEndure is capable of using pre-made instances as targets instead of creating brand new ones. This way, we could specify the AMI by creating a new target instance with the required details and CE would replace the drives of the instance only, thereby keeping the AMI and licensing. We have tried to use this “target-instance” functionality before but we received errors every time. This isn’t mentioned in the docs, but we found that the IAM policy the CE user has assigned in the AWS account, has limited access to EC2. It uses conditionals that restrict EC2 actions to resources tagged with CloudEndure Creation Time (however, any tag value works). Therefore both the instance and its current disk has to have that tag, otherwise CE will not be able to detach the existing, and attach the new disks to the instance.

CloudEndure; the good, the bad and the small amount of ugly

CloudEndure makes lift-and-shift migrations [to AWS] quite effortless. As we discovered however, it is not as mature as other AWS services, given that CE has been acquired by AWS in 2019. We could rely on it for all our tasks but were presented with multiple shortcomings.

One of these is regarding documentation. It is extensive and covers most scenarios, but not-so edge cases like the “target instance cutover” lacked a crucial part of information (tagging). The other major pain point was the inconsistency of the web interface. There were multiple times where instances reported an up-to-date “continuous replication” state, but suddenly jumped to “replicating…10 minutes left”. This would have been understandable in cases where there was a lot of data change ongoing on the source servers, but it occurred many times where that wasn’t the case. The cutover still proceeds successfully, but seeing the instance suddenly jump to “wait, let me catch up, it’ll take 10 minutes” and shortly after going back to “nope, all good, just kidding” was quite frustrating at times. This was especially nerve wrecking just before a sensitive cutover. The UI can also have inconsistencies, eg. when you save the blueprint, make sure to come back to it and double check that your configuration has been saved. Getting the AMI correct is also limited to Windows Server. Support for other major operating systems would be great, such as industry standard RHEL. The documentation should describe how to fix the AMIs after. Or better yet, how to use the Target Instance ID functionality to avoid this issue.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    Extending CloudFormation with custom resource types


    AWSTech Community

    In this post I show how you can extend AWS CloudFormation with new resource types and use them in the same way as AWS native resources. And do this without running your own Lambda functions or EC2 instances like custom resources would require you to do.

    Accessing information outside of CloudFormation stack

    In the real world it is often not possible to have 100% of your cloud infrastructure defined and maintained in code. Ideally all your cloud infrastructure would be sharing the single state, and allow references to any resource needed.

    In more realistic scenario your stack is one of many and some resources aren’t managed in code at all, or are created with different IaaC tool.

    Making references to resources, or data, outside of the stack, has never been the strongest point of CloudFormation. You can pass information between stacks with export/import, nested stacks, SSM parameter store, stack sets or sometimes even with copy-pasting.

    What has been missing is the ability to reference external resources same way as Terraform data sources do.

    Extending CloudFormation

    For a long time it has been possible to extend CloudFormation with custom resources. SAM templates made it possible to combine the infrastructure and logic of custom resource into single template, assuming your code was compact enough to fit into template and didn’t depend on libraries outside of standard lambda runtimes.

    More modern way to expand CloudFormation is to use custom resource types. They are real 1st class citizens comparable to AWS provided resources. Major difference between these options is who is responsible of running the code.

    You must have a lambda function or an EC2 instance for a custom resource but CloudFormation service runs your code for resource type. This makes resource types, combined with CloudFormation Registry, much easier to share and consume across multiple projects and AWS accounts.

    I would recommend reading Managing resources using AWS CloudFormation Resource Types to understand how both models works.


    What would be the most simple data provider to test resource type development? I came up with the idea of a pseudo resource that doesn’t do anything but allow setting the state when resource is created/updated and return the set value with GetAtt -call.

    Here is a sample template using Nordcloud::Dataprovider::Variable -resource. This is to show how your resource types can be used in templates exact the same way as native AWS resources.

    AWSTemplateFormatVersion: 2010-09-09
    Description: Nordcloud-Dataprovider-Variable
        Description: MyVar Content 
        Type: String
        Default: HelloWorld
        Type: Nordcloud::Dataprovider::Variable
          Content: !Sub "Simple reference to ${MyValue}"
        Description: Content of MyVar
        Value: !GetAtt MyVar.Content

    Source code for Nordcloud::Dataprovider::Variable is available in GitHub.

    Resource type development

    Resource type development workflow is

    • Install CFN CLI and dependencies
    • Initialize a new project
      cfn init
    • Write the resource schema and generate handler skeletons
      cfn generate
    • Implement the logic in handler functions
    • Validate the resouce type
      cfn validate
    • Deploy the new version of resource type and set it as default
      cfn submit --set-default
    • Deploy a template using the resource type
    • Remember to cleanup old versions of resource type

    Nordcloud::Dataprovider::Variable doesn’t have much code in handlers. All other handlers are really just a dummy functions returning success, except the read_handler that will return Content value of resource Metadata. Storing resource state in Metadata I didn’t need to deploy any AWS resources for storing value of variable.

    What Next?

    I have a long list of ideas for more serious and useful data providers. Finding an AMI, VPC or subnet ID based on given attributes, or mapping between HostedZoneName and HostedZoneId. Or maybe, instead of creating separate types for each use-case, build a generic data provider that can get attributes of any resource.


    • CloudFormation Provider Development Toolkit and repos for Java/Python/Go -plugins. Java seems to be the most mature language for resource type development and popular for AWS resources. It is more difficult to find good examples written in Python or Go.
    • Build your first AWS CloudFormation resource provider re:Invent 2020 session describes the details how resource types works. I found this helpful in understanding the callback mechanism that will be necessary for any non-trivial (de)provisioning processes.

    For further insights follow Petri’s private blog in https://carriagereturn.nl

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

      Controlling lights with Ikea Trådfri, Raspberry Pi and AWS


      BlogTech Community

      A few months back we purchased Ikea Trådfri smart lights to our home. However, after the initial hype, those were almost forgotten as controlling was just too complicated via ready-made tools. For example, the Ikea Home app works only on mobile and it’s only possible to use it when connected to wifi. Luckily, Trådfri offers an API for controlling the lights, so it was possible to build our customized UI.

      Connecting to Trådfri

      With quick googling, I found a tutorial by Pimoroni that helped me get started. I already had Raspberry Pi running as a home server, so all it took was to download and install the required packages provided by the tutorial. However, at the time of writing this article (and implementing my solution), the Pimoroni tutorial was a bit outdated. Because of that I just couldn’t get the communication working, but after banging my head against the wall for a while I found out that in 2017 Ikea changed the authentication method. I’ve contacted Pimoroni and asked them to update the article.

      After getting the communication between Raspberry Pi and Trådfri Gateway working I started writing the middleware server on Raspberry Pi. As I’m a javascript developer I chose to build this with NodeJS. Luckily, there is a node-tradfri-client package available that made the integration simple. Here’s a code example of how I’m connecting to Trådfri and storing the devices in application memory.

      I also added ExpressJS to handle requests. With just a few lines of code, I had API endpoints for

      • Listing all devices in our house
      • Toggling a lightbulb or socket
      • Turning lightbulb or socket on
      • Turning lightbulb or socket off
      • Setting brightness of a lightbulb

      Writing the client application

      As we wanted to control the lights from anywhere with any device, we chose to build a web app that can be used on a laptop, mobile, and tablet without installing. After the first POC, we decided on the most common use cases and Eija designed the UI on Sketch. Actual implementation was done using ReactJS with help of Ant Design and react-draggable.

      Source codes for the client app is available in my Github.

      Making the app accessible from anywhere

      In Finland, we have fast and unlimited data plans on mobile and because of that we rarely have wifi enabled on mobile (nothing is more irritating than bad wifi connection on the yard). To solve this, we chose to publish the app to the public web. As the UI is built as a single-page app, it’s basically free to host it with AWS S3 and Cloudfront. Since Cloudfront domains are random strings, we decided that this is enough security for now. This means that knowing the Cloudfront domain, anyone can control our lights. If this becomes a problem, it’s quite simple to integrate some authentication methods too.

      The app is also hosted on the Raspberry Pi on our local network, so guests can control the lights if they are connected to our wifi.

      The bridge between physical & digital world is not yet seamless

      Even with this accessible application we quickly figured out that we still need physical control buttons for the lights. For example, when going upstairs and not bringing the phone with you, you might end up in a dark room without the possibility to turn on the lights. Luckily Ikea provides physical switches for the Trådfri lights, so we had to make one more Ikea trip to get the extra controller upstairs.

      Another way to reduce the need for physical switches would be using a smart speaker with voice recognition. Unfortunately, only Apple Home Pod is the only speaker that currently understands Finnish and it’s a tad out of our budget and probably not possible to integrate into our system either. Once Amazon adds Finnish support for Alexa we’ll definitely try that.

      …and while writing the previous chapter, I figured that since Apple supports Finnish, it’s possible to create a Siri Shortcut to control our lights. With few more lines of code in the web app, it now supports anchor links from Shortcut to trigger a preset lighting mode.


      It’s great that companies like Ikea provide open access to their smart lights since at least for us the ready-made tooling was not enough. Also with the help of the AWS serverless offering, we can host this solution securely in the cloud for free. If you have any questions about our solution, please feel free to get in touch.

      For more tech content follow Arto and Nordcloud Engineering in Medium.

      Get in Touch.

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

        How Do Medium-Sized Companies Adapt Amazon Web Services With Success?



        How do companies successfully adapt Amazon Web Services? Which AWS customer did it how? How can Nordcloud help medium-sized companies in particular to be successful in the cloud? On October 26th, this and other questions from Nordcloud were answered together with AWS as part of a specialist lecture on “SMEs in the clouds – If cloud, then right”. The participants in the workshop on the premises of the IT system house LEITWERK in Appenweier are made up of IT managers from regional companies and employees of the host.

        Together with Christopher Ziegler (Industry 4.0 Lead from AWS Germany), our Thomas Baus showed the participants of the workshop, in addition to a general introduction to Amazon Web Services, some ways to success in the cloud. The focus was on the following topics, among others:

        A holistic view of cloud projects  is the key to success, because cloud is not a purely technological topic. It’s not just about replacing virtual servers with cloud-based instances. A cloud – in particular AWS Cloud – offers much more than just infrastructure and requires much more than a purely technical migration. The successful and sustainable use of cloud services is an end-2-end transformation and should therefore not be driven as a pure IT project.

        Bi-modal IT organization  as a concept for dividing corporate IT into modern and traditional subject areas to enable fast and efficient adaptation of new paradigms such as that of cloud services. Our customer Husqvarna was mentioned as an example. There, together with AWS and Nordcloud, a digital IT service unit has been established as a backend for all cloud-based innovation topics such as IoT and analytics.

        Successful case studies  from the local market were also considered. In the course of this, some of the typical concerns of medium-sized companies (security, costs, employees …) were taken up and eliminated – in the truest sense of the word – by intelligent approaches. In this context, the great case study by IDC on AWS use of Deutsche Bahn was also referenced. You can find them  here .

        Licensing – a key issue in cloud migrations

        In addition to AWS and us, there was also a LEITWERK speaker on “Licensing in the Cloud”. The possible advantages and challenges with regard to their licensing when moving workloads to the cloud were discussed. Subsequently, the topics of finger food and cold drinks were further deepened in a relaxed atmosphere and concrete use cases were discussed.

        At AWS and Nordcloud, we deliberately want to make small and medium-sized companies in the German-speaking world sustainable through our services around Amazon Web Services. The innovative strength and agility of the numerous hidden champions is very great, for example, in the area of ​​manufacturing companies.

        Get in Touch.

        Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

          Problems with DynamoDB Single Table Design


          BlogTech Community


          DynamoDB is Amazon’s managed NoSQL database service. DynamoDB provides a simple, schemaless database structure and very high scalability based on partitioning. It also offers an online management console, which lets you query and edit data and makes the overall developer experience very convenient.

          There are two main approaches to designing DynamoDB databases. Multi Table Design strores each database entity in a separate table. Single Table Design stores all entities in one big common table.

          This article focuses mostly on the development experience of creating DynamoDB applications. If you’re working with a large scale project, performance and scalability may be more important aspects for you. However, you can’t completely ignore the developer experience. If you apply Single Table Design, the developer experience will be more cumbersome and less intuitive than with Multi Table Design.

          Multi Table Design Overview

          DynamoDB is based on individual tables that have no relationships between each other. Despite the limitation, we tend to use them in the same way as SQL database tables. We name a DynamoDB table according to a database entity, and then store instances of that database entity in that table. Each entity gets their own table.

          We can call this approach Multi Table Design, because an application usually requires multiple entities. It’s the default way most of us create DynamoDB applications.

          Let’s say we have the entities User, Drive, Folder and File. We would typically then have four DynamoDB tables as shown in the database layout below.

          The boldface headers are field names, and the numbers are field values organized into table rows. For simplicity, we’re only dealing with numeric identifiers.

          UserId(PK)  DriveId(SK)
          1           1
          1           2
          UserId(PK)  FolderId(SK)  ParentDriveId
          1           1             1  
          1           2             2
          UserId(PK)  FileId(SK)    ParentFolderId
          1           1             1
          1           2             2
          1           3             2

          Note: PK means Partition Key and SK means Sort Key. Together they are the table’s unique primary key.

          It’s pretty easy to understand the structure of this database. Everything is partitioned by UserId. Underneath each User there are Drives which may contain Folders. Folders may contain Files.

          The main limitation of Multi Table Design is that you can only retrieve data from one table in one query. If you want to retrieve a User and all their Drives, Folders and Files, you need to make four separate queries. This is particularly inefficient in use cases where you cannot make all the queries in parallel. You need to first look up some data in one table, so that you can find the related data in another table.

          Single Table Design Overview

          Single Table Design is the opposite of Multi Table Design. Amazon has advocated this design pattern in various technical presentations. For an example, see DAT401 Advanced Design Patterns for DynamoDB by Rick Houlihan.

          The basic idea is to store all database entities in a single table. You can do this because of DynamoDB’s schemaless design. You can then makes queries that retrieve several kinds of entities at the same time, because they are all in the same table.

          The primary key usually contains the entity type as part of it. The table might thus contain an entity called “User-1” and an entity called “Folder-1”. The first one is a User with identifier “1”. The second one is a Folder with identifier “1”. They are separate because of the entity prefix, and can be stored in the same table.

          Let’s say we have the entities User, Drive, Folder and File that make up a hierarchy. A table containing a bunch of these entities might look like this:

          PK        SK         HierarchyId
          User-1    User-1     User-1/
          User-1    Drive-1    User-1/Drive-1/
          User-1    Folder-1   User-1/Drive-1/Folder-1/
          User-1    File-1     User-1/Drive-1/Folder-1/File-1/
          User-1    Folder-2   User-1/Drive-1/Folder-2/
          User-1    File-2     User-1/Drive-1/Folder-2/File-2/
          User-1    File-3     User-1/Drive-1/Folder-2/File-3/

          Note: PK means Partition Key and SK means Sort Key. Together they are the table’s unique primary key. We’ll explain HierarchyId in just a moment.

          As you can see, all items are in the same table. The partition key is always User-1, so that all of User-1’s data resides in the same partition.

          Advantages of Single Table Design

          The main advantage that you get from Single Table Design is the ability to retrieve a hierarchy of entities with a single query. You can achieve this by using Secondary Indexes. A Secondary index provides a way to query the items in a table in a specific order.

          Let’s say we create a Secondary Index where the partition key is PK and the sort key is HierarchyId. It’s now possible to query all the items whose PK is “User-1” and that have a HierarchyId beginning with “User-1/Drive-1/”. We get all the folders and files that the user has stored on Drive-1, and also the Drive-1 entity itself, as the result.

          The same would have been possible with Multi Table Design, just not as efficiently. We would have defined similar Secondary Indexes to implement the relationships. Then we would have separately queried the user’s drives from the Drives table, folders from the Folders table, and files from the Files table, and combined all the results.

          Single Table Design can also handle other kinds of access patterns more efficiently than Multi Table Design. Check the YouTube video mentioned in the beginning of this article to learn more about them.

          Complexity of Single Table Design

          Why would we not always use Single Table Design when creating DynamoDB based applications? Do we lose something significant by applying it to every use case?

          The answer is yes. We lose simplicity in database design. When using Single Table Design, the application becomes more complicated and unintuitive to develop. As we add new features and access patterns over time, the complexity keeps growing.

          Just managing one huge DynamoDB table is complicated in itself. We have to remember to include the “User-” entity prefix in all queries when working with AWS Console. Simple table scans aren’t possible without specifying a prefix.

          We also need to manually maintain the HierarchyId composite key whenever we create or update entities. It’s easy to cause weird bugs by forgetting to update HierarchyId in some edge case or when editing the database manually.

          As we start adding sorting and filtering capabilities to our database queries, things get even more complicated.

          Things Get More Complicated

          Now, let’s allow sorting files by their creation date. Extending our example, we might have a table design like this:

          PK      SK        HierarchyId                      CreatedAt
          User-1  User-1    User-1/                          2019-07-01
          User-1  Drive-1   User-1/Drive-1/                  2019-07-02
          User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03
          User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04
          User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05
          User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06
          User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07

          How do we retrieve the contents of Folder-2 ordered by the CreatedAt field? We add a Global Secondary Index for this access pattern, which will consist of GSI1PK and GSI1SK:

          PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK
          User-1  User-1    User-1/                          2019-07-01  User-1/           ~
          User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02
          User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~
          User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04
          User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~
          User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06
          User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07

          We’ll get to the semantics of GSI1PK and GSI1SK in just a moment.

          But why did we call these fields GSI1PK and GSI1SK instead of something meaningful? Because they will contain different kinds of values depending on the entity stored in each database item. GSI1PK and GSI1SK will be calculated differently depending on whether the item is a User, Drive, Folder or File.

          Overloading Names Adds Cognitive Load

          Since it’s not possible to give GSI keys sensible names, we just call them GSI1PK and GSI1SK. These kind of generic field names add cognitive load, because the fields are no longer self-explanatary. Developers need to check development documentation to find out what exactly GSI1PK and GSI1SK mean for some  particular entity.

          So, why is the GSI1PK field not the same as HierarchyId? Because in DynamoDB you cannot query for a range of partition key values. You have to query for one specific partition key. In this use case, we can query for GSI1PK = “User-1/” to get items under a user, and query for GSI1PK  = “User-1/Folder-1” to get items under a user’s folder.

          What about the tilde (~) characters in some GS1SK values? They implement reverse date sorting in a way that also allows pagination. Tilde is the last printable character in the ASCII character set and will sort after all other characters. It’s a nice hack, but it also adds even more cognitive load to understanding what’s happening.

          When we query for GSI1PK = “User-1/Folder-1/”  and sort the results by GSI1SK in descending key order, the first result is Folder-1 (because ~ comes after all other keys) and the following results are File-2 and File-3 in descending date order. Assuming there are lots of files, we could continue this query using the LastEvaluatedKey feature of DynamoDB and retrieve more pages. The parent object (Folder-1) always appears in the first page of items.

          Overloaded GSI Keys Can’t Overlap

          You may have noticed that we can now also query a user’s drives in creation date order. The GSI1PK and GSI1SK fields apply to this relationship as well. This works because the relationship between the User and Drive entities does not not overlap with the relationship between the Folder and File entities.

          But what happens if we need to query all the Folders under a Drive? Let’s say the results must, again, be in creation date order.

          We can’t use the GSI1 index for this query because the GSI1PK and GSI1SK fields already have different semantics. We already use those keys to retrieve items under Users or Folders.

          So, we’ll create a new Global Secondary Index called GSI2, where GSI2PK and GSI2SK define a new relationship. The fields are shown in the table below:

          PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK      GSI2PK           GSI2SK
          User-1  User-1    User-1/                          2019-07-01  User-1/           ~
          User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02  User-1/Drive-1/  ~
          User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~           User-1/Drive-1/  2019-07-03
          User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04  User-1/Drive-1/  2019-07-04
          User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~           User-1/Drive-1/  2019-07-05
          User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06
          User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07

          Note: Please scroll the table horizontally if necessary.

          Using this new index we can query for GSI2PK = “User-1/Drive-1/” and sort the results by GSI2SK to get the folders in creation date order. Drive-1 has a tilde (~) as the sort key to ensure it comes as the first result on the first page of the query.

          Now It Gets Really Complicated

          At this point it’s becoming increasingly more complicated to keep track of all those GSI fields. Can you still remember what exactly GSI1PK and GSI2SK mean? The cognitive load is increasing because you’re dealing with abstract identifiers instead of meaningful field names.

          The bad news is that it only gets worse. As we add more entities and access patterns, we have to add more Global Secondary Indexes. Each of them will have a different meaning in different situations. Your documentation becomes very important. Developers need to check it all the time to find out what each GSI means.

          Let’s add a new Status field to Files and Folders. We will now allow querying for Files and Folders based on their Status, which may be VISIBLE, HIDDEN or DELETED. The results must be sorted by creation time.

          We end up with a design that requires three new Global Secondary Indexes. GSI3 will contain files that have a VISIBLE status. GSI4 will contain files that have a HIDDEN status. GSI5 will contain files that have a DELETED status. Here’s what the table will look like:

          PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK      GSI2PK           GSI2SK      Status    GSI3PK                    GSI3SK      GSI4PK                   GSI4SK      GSI5PK                     GSI5SK
          User-1  User-1    User-1/                          2019-07-01  User-1/           ~
          User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02  User-1/Drive-1/  ~
          User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~           User-1/Drive-1/  2019-07-03  VISIBLE   User-1/Folder-1/VISIBLE/  ~           User-1/Folder-1/HIDDEN/  ~           User-1/Folder-1/DELETED/   ~
          User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04  User-1/Drive-1/  2019-07-04  VISIBLE   User-1/Folder-1/VISIBLE/  2019-07-04  User-1/Folder-1/HIDDEN/  2019-07-04  User-1/Folder-1/DELETED/
          User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~           User-1/Drive-1/  2019-07-05  VISIBLE   User-1/Folder-2/VISIBLE/  ~           User-1/Folder-2/HIDDEN/  ~           User-1/Folder-2/DELETED/   ~
          User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06                               HIDDEN    User-1/Folder-2/VISIBLE/              User-1/Folder-2/HIDDEN/  2019-07-06  User-1/Folder-2/DELETED/
          User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07                               DELETED   User-1/Folder-2/VISIBLE/              User-1/Folder-2/HIDDEN/              User-1/Folder-2/DELETED/   2019-07-07

          Note: Please scroll the table horizontally if necessary.

          You may think this is getting a bit too complicated. It’s complicated because we still want to be able to retrieve both a parent item and its children in just one query.

          For example, let’s say we want to retrieve all VISIBLE files in Folder-1. We query for GSI3PK = “User-1/Folder-1/VISIBLE/” and again sort the results in descending order as earlier. We get back Folder-1 as the first result and File-1 as the second result. Pagination will also work if there are more results. If there are no VISIBLE files under the folder, we only get a single result, the folder.

          That’s nice. But can you now figure out how to retrieve all DELETED files in Folder-2? Which GSI will you use and what do you query for? You probably need to stop your development work for a while and spend some time reading the documentation.

          The Complexity Multiplies

          Let’s say we need to add a new Status value called ARCHIVED. This will involve creating a yet another GSI and adding application code in all the places where Files or Folders are created or updated. The new code needs to make sure that GSI6PK and GSI6SK are generated correctly.

          That’s a lot of development and testing work. It will happen every time we add a new Status value or some other way to perform conditional queries.

          Later we might also want to add new sort fields called ModifiedAt and ArchivedAt. Each new sort field will require its own set of Global Secondary Indexes. We have to create a new GSI for every possible Status value and sort key combination, so we end up with quite a lot of them. In fact, our application will now have GSI1-GSI18, and developers will need to understand what GSI1PK-GSI18PK and GSI1SK-GSI18SK mean.

          In fairness, this complexity is not unique to Single Table Design. We would have similar challenges when applying Multi Table Design and implementing many different ways to query data.

          What’s different in Multi Table Design is that each entity will live in its own table where the field names don’t have to be overloaded. If you add a feature that involves Folders, you only need to deal with the Folders table. Indexes and keys will have semantically meaningful names like “UserId-Status-CreatedAt-index”. Developers can understand them intuitively without referring to documentation all the time.

          Looking for a Compromise

          We can make compromises between Single Table Design and Multi Table Design to reduce complexity. Here are some suggestions.

          First of all, you should think of Single Table Design as an optimization that you might be applying prematurely. If you design all new applications from scratch using Single Table Design, you’re basically optimizing before knowing the real problems and bottlenecks.

          You should also consider whether the database entities will truly benefit from Single Table Design or not. If the use case involves retrieving a deep hierarchy of entities, it makes sense to combine those entities into a single table. Other entities can still live in their own tables.

          In many real-life use cases the only benefit from Single Table Design is the ability to retrieve a parent entity and its children using a single DynamoDB query. In such cases the benefit is pretty small. You could just as well make two parallel requests. Retrieve the parent using GetItem and the children using a Query. In an API based web application the user interface can perform these requests in parallel and combine the results in the frontend.

          Many of the design patterns related to Single Table Design also apply to Multi Table Design. For instance, overloaded composite keys and secondary indexes are sometimes quite helpful in modeling hierarchies and relationships. You can use them in Multi Table Design without paying the full price of complexity that Single Table Design would add.

          In summary, you should use your judgment case by case. Don’t make blanket policy to design every application using either Single Table Design or Multi Table Design. Learn the design patterns and apply them where they make sense.

          Get in Touch.

          Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

            Findings from AWS re:Invent 2019, Part 2

            I was expecting the usual set of service and feature announcements in Wernel Vogels’ Thursday keynote, but instead he did focus on what is happening behind the scenes of AWS, especially EC2 Nitro architecture and S3. So instead of analyzing Werner’s keynote, I picked 2 announcements from Wednesday that didn’t make to keynotes but are worthy of attention because how these will simplify building APIs and distributed applications.

            Amazon API Gateway HTTP APIs

            Amazon API Gateway HTTP APIs will lower the barrier of entry when starting to build that next great service or application. It is now trivial to get started with HTTP proxy for lambda function(s);

            % aws apigatewayv2 create-api \
                —-name MyAPIname \
                —-protocol-type HTTP \
                --target arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION

            It is also nice that HTTP API has Serverless Application Model (SAM) support from day 1. And when your API start getting attention, pricing is up to 70% cheaper than generic API Gateway. Compatible API Gateway definitions (=HTTP and Lambda backends with OIDC/JWT based authorization) can be exported and re-imported as HTTP APIs.

            Amplify DataStore

            Amplify DataStore is queryable, on-device data store for web, IoT, and mobile developers using React Native, iOS and Android. Idea is that you don’t need to write separate code for offline and online scenarios. Working with distributed cross-user data is as simple as using local data. DataStore is available with the latest Amplify Javascript client, iOS and Android clients are in preview.

            DataStore blog post and demo app is a good way to get your feet wet with DataStore and see how simple it can be to create applications using shared state between multiple online and offline clients.

            Interested in reading more about Petri’s views and insights? Follow his blog CarriageReturn.Nl

            Get in Touch.

            Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

              Findings from AWS re:Invent 2019, Part 1

              ML/AI was definitely the topic of Andy Jassy’s re:Invent Tuesday keynote. Another area of major investment was service proximity to customers and end-users. With that it was only natural there were also some new networking features to help building multi-region connectivity.

              Machine Learning for the Masses

              ML/AI received a lot of love in Tuesday announcements. If there is one thing to pick from the group, it would be SageMaker Autopilot:

              “With this feature, Amazon SageMaker can use your tabular data and the target column you specify to automatically train and tune your model, while providing full visibility into the process. As the name suggests, you can use it on autopilot, deploying the model with the highest accuracy with one click in Amazon SageMaker Studio, or use it as a guide to decision making, enabling you to make tradeoffs, such as accuracy with latency or model size.”

              Together with SageMaker Studio web-based IDE this is to democratize artesan work of data analytics. There were also 3 interesting real-world applications of ML announced (all in preview);

              • Amazon CodeGuru for automated code reviews and application performance recommendations.
              • Amazon Fraud Detector is managed service to identify fraudulent activities such as online payment fraud and the creation of fake accounts.
              • Amazon Detective is service to analyze, investigate and find root cause for potential security issues or suspicious activities based on analysis of logs from AWS resources.

              As services these are all very easy to consume and can bring a lot of value in preventing costly mistakes from happening. These also follow the same pattern as SageMaker Autopilot, automating artesan work traditionally performed by skilled (but overloaded) individuals.

              Getting Closer to Customer

              Another theme in Tuesday’s announcements was cloud services getting physically closer to customers. This is important when you must keep your data in certain country or need very low latencies.

              AWS Local Zone is an extension of AWS region. It brings compute, storage and selected subset of AWS services closer to customer. The very first local zone was announced in Los Angeles but I would expect these to be popping up in many cities around the world that don’t yet have their own AWS region nearby.

              If local zone is not close enough, then there is AWS Wavelength. This is yet another variation of (availability) zone. Wavelength has similar (but not the same?) subset of AWS services as Local Zone. Wavelength zones are co-located at 5G operators edges that helps in building ultra low latency services for mobile networks.

              AWS Outpost is now in GA and support for EMR and container services like ECS, EKS and App Mesh was added to service mix of Outpost. Pricing starts from $225k 3-year-upfront or $7000/month for 3 year subsciption. I think many customers would want to wait and see how Local Zones are expanding before investing in on-prem hardware.


              AWS has had a tradition of changing networking best-practices every year at re:Invent. This year it wasn’t quite as dramatic but there were very welcome feature announcements that go nicely with the idea of different flavours of local regions.

              Transit Gateway inter-region peering allows you to build global WAN within AWS networks. This is great feature when building multi-region services or have your services spread across multiple regions because of differences in local service mix. That said, please notice inter-region peering is only available at certain regions at launch.

              Transit Gateway Network Manager enables you centrally manage and monitor your global network, not only on AWS but also on-premises. As networking is getting much more complex this global view and management is going to be most welcome help. It will also help in shifting the balance of network management from on-premises towards public cloud.

              Finally support for multicast traffic was one of the last remaining blockers for moving applications to VPC. With the announcement of Transit Gateway Multicast support even that is now possible. Fine print says multicast is not supported over direct connect, site-to-site VPN or peering connections.

              Interested in reading more about Petri’s views and insights? Follow his blog CarriageReturn.Nl

              Get in Touch.

              Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

                Partner and capacity management with Peter Bakker


                Life at Nordcloud

                1. Where are you from and how did you end up at Nordcloud?

                I’m Dutch, living in Rotterdam.

                I started the Azure relationship between Microsoft and Mirabeau when I was working at Mirabeau.

                I grew their Azure business, we became an MSP and I was asked to join the Partner Advisory team by Microsoft.

                There I met Nordcloud’s founder Fernando.

                Mirabeau was acquired by Cognizant and integrated as of January 1st of this year.

                In terms of my career, I was in the middle of a journey with different changes and Fernando suggested for me to join Nordcloud in the spring of this year. His words were: “We always have room for good people”.

                I had a chat with Nordcloud’s CEO Jan and after some discussion, we agreed on interesting goals, I switched clouds from Microsoft to AWS and became the AWS partner manager at Nordcloud. 


                2. What is your role and core competence?

                I was hired as Partner Manager for AWS. My responsibility was first to move from escalation management to opportunity management. Working with different AWS managers we started fixing things and recently signed a joint partner plan for 2020. We now have a joint ambition for what we are aiming to achieve together and this is actually one of my best memories since working at Nordcloud!

                My role has also evolved since I started and I also have the hat of Head of Capacity now. I’m commercially responsible for reselling AWS, Azure, and GCP, managing our margins, making our Sales colleagues life is a bit easier and understanding cloud costs, cost optimisation and the real value of capacity management.

                I fly around a lot and get to work with different teams as we’re active in 10 countries. My daughter recently asked me if I was working at KLM.


                3. What do you like most about working at Nordcloud?

                1) Depth and broadness of skill levels: we have so many talented, amazing colleagues.

                2) The great names that we work for and all the great things we do for example for BMW, SKF, Volvo or Redbull.

                3) Freedom and opportunity to learn and grow. 


                5. What sets you on fire/ what’s your favourite thing with public cloud?

                Digital transformation! All the new business opportunities that our customers get by adopting cloud.

                For example last week at the AWS Partner summit Konecranes presented a great case of Nordcloud helping them in a very short timeframe to build a serverless solution using IoT that helps them to weigh containers. This solution is now fitted in new equipment and retrofitted into existing equipment. 

                The payback time for Konecranes was only 3 months. Sales of their equipment were boosted.

                It’s great seeing how starting small and laying foundations sets us and our clients up for success and even bigger projects. 


                6. What do you do outside work?

                I’m a passionate golf player as well as a youth at our golf club in Rotterdam.


                8. How would you describe our culture?

                Open and flat organisation!

                There is no hierarchy at Nordcloud. We are all colleagues and together we help our customers to get cloud native. 


                9. What are your greetings/advice for someone who might be considering a job in Nordcloud?

                Somebody in a recruitment process recently asked me how I like it here at Nordcloud. I answered, ” I should have done this a year ago”!

                As there is a lot of freedom and opportunity to learn and grow, you must remember to take care of yourself too. There is always something interesting to do, so it’s very much about finding the right balance. As things get excited, I sometimes have to remind myself; there is also always tomorrow!

                Get in Touch.

                Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

                  Nordcloud Achieves AWS Financial Services Competency Status


                  Press releases

                  Nordcloud has achieved Amazon Web Services (AWS) Financial Services Competency status. This designation recognizes Nordcloud for providing deep expertise to help organizations manage critical issues pertaining to the industry, such as risk management, core systems implementations, data management, navigating compliance requirements, and establishing governance models.

                  This competency will help us offer our public cloud services to an even larger group of FSI customers in all of our 10 countries.

                  Jan Kritz, CEO, Nordcloud

                  Achieving the AWS Financial Services Competency differentiates Nordcloud as an AWS Partner Network (APN) member that has demonstrated relevant technical proficiency and proven customer success, delivering solutions seamlessly on AWS. To receive the designation, APN Partners must possess deep AWS expertise and undergo an assessment of the security, performance, and reliability of their solutions. 

                  “We are excited to be recognised for our FSI achievements,  as it is our major focus area in terms of industry and solutions. A big thanks to our team and, of course, our beloved customers for trusting in Nordcloud’s ability,” said Jan Kritz, CEO of Nordcloud. “This competency will help us offer our public cloud services to an even larger group of FSI customers in all of our 10 countries.”

                  AWS is enabling scalable, flexible, and cost-effective solutions from startups to global enterprises. To support the seamless integration and deployment of these solutions, AWS established the AWS Competency Program to help customers identify Consulting and Technology APN Partners with deep industry experience and expertise.

                  “The main value of Nordcloud is to power up our customer’s digital transformation enabled by public cloud,” Kritz concluded.

                  Get in Touch.

                  Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

                    HUS Chooses Nordcloud As Partner for Amazon Web Services Development



                    Helsinki and Uusimaa Hospital District (HUS) has chosen Nordcloud as a partner to develop and manage its Amazon Web Services environments. The contract contains AWS capacity management and managed services, as well as consulting services, and it enables HUS, a Finnish pioneer in digital transformation of healthcare, to leverage Amazon Web Services as a platform for new services and data analytics development.

                    Nordcloud is proud to be an AWS Premier Consulting Partner since 2014 and an AWS Managed Service Provider since 2015. Making use of our years of experience and utilising best practices picked up along the way, Nordcloud is able to design and build cloud environments that match customer budget and demands whilst being completely elastic and scalable.

                    For HUS, Nordcloud will initiate the development of the cloud foundation, as well as, setting up new data management solutions. The foundation is a vital step on the enterprise cloud journey, as it operates as an enabler for automated operations and scalable services. 

                    Amazon Web Services is one of the leading cloud computing platforms providing a reliable, scalable, and low-cost set of remote computing services. The AWS cloud was formed by the people behind Amazon.com in 2006 when Amazon started to offer businesses IT infrastructure services. These were in the form of web services, now commonly known as cloud computing. Today, Amazon Web Services powers hundreds of thousands of businesses in 190 countries around the world. With data centre locations in North America, Europe, Brazil, Singapore, Japan, and Australia, customers across all industries are taking advantage of the AWS cloud.

                    Get in Touch.

                    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.