Data Lake best practices in AWS


BlogTech Community

Many businesses are looking into enabling analytics on many different types of data sources and gain insights to guide them to better business decisions. A data lake is one way of doing that, where you have a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. Data analysts can in a data lake then leverage the data with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Redshift, Athena, and more.


AWS Data Lake Formation

AWS Data Lake Formation is a new tool that makes it easier for businesses to setup a data lake – something that previously was a big undertaking taking months can now be broken down into just a few days of work. Data Lake Formation will automatically crawl, clean and prepare the data which you in turn can use to train machine learning models to dedupe based on what you want the data to look like. The most interesting functionality from the new Data Lake Formation might be the centralized dashboard for secure access on table and column level across all tools in the data lake – something that previously has been quite complicated and required third party tooling.

Data lake

Data lake best practices

Best practices for utilizing a data lake optimized for performance, security and data processing were discussed during the AWS Data Lake Formation session at AWS re:Invent 2018. The session was split up into three main categories: Ingestion, Organisation and Preparation of data for the data lake. Your current bottleneck may lie in all or any of these three main categories, as they often interlink – so make sure to look into all of the categories to optimize your data.



The main takeaway from the session was that S3 should be used as a single source of truth where data ingested is preserved. No transformation of data should happen in the ingestion S3 storage. If you transform the data, it should be copied to another S3 bucket.

To optimize ingestion so that you don’t have a bucket full of old data at all times, you should also look into utilizing object life cycle policies so that data that you aren’t using gets moved to a cheaper storage class such as glacier. This especially makes sense for data that is outside of your time-scope and that is not interesting for analytics anymore.

Getting data in from databases can be a pain, especially if you are trying to use replicas of on-premise databases. AWS recommends that instead of using database replicas, utilize AWS Database Migration Tool. This makes it easier to replicate the data without having to manage yet another database. If you use a AWS Glue ETL job to transform, merge and prepare the data ingested from the database, you can also optimize the resulting data for analytics and take daily snapshots to preserve the database view of the records.



Organisation of the data is usually a strategy that comes way too late in a data lake project. You should already in the beginning of the project look into organizing the data data into partitions in S3 and partition the data with keys to align with common query filters.

It is for example sometimes better to create multiple S3 buckets and then partition the buckets on year/month/day/ instead of trying to fit all of your data into one S3 bucket with even more granular partitions. This does in reality depend on what your most common queries look like. Maybe you need to partition on months instead of years depending on your usage.



For mutable data use a database such as Redshift or Apache HBase but make sure to offload the data to S3 when the data becomes immutable. You can also append delta files to the partitions and compact them on a scheduled jobs to keep the most recent version of the data and delete the rest.

Remember to compact the data from source before you do analytics – the optimal size is between 256 and 1000 MB. If you need fast ingestion than grabbing the data from S3 you can utilize streaming data to Kinesis streams, process the data with Apache Flink and push the processed data to S3.


If you’d like some help in AWS Data Lake Formation, please feel free to contact us.


Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    Lambda layers for Python runtime


    BlogTech Community

    AWS Lambda

    AWS Lambda is one of the most popular serverless compute services in the public cloud, released in November 2014. It runs your code in the response to events like DynamoDB, SNS or HTTP triggers without provisioning or managing any infrastructure. Lambda takes care of most of the things required to run your code and provides high availability. It allows you to execute even up to 1000 parallel functions at once! Using AWS lambda you can build applications like:

    • Web APIs
    • Data processing pipelines
    • IoT applications
    • Mobile backends
    • and many many more…

    Creating AWS Lambda is super simple: you just need to create a zip file with your code, dependencies and upload it to S3 bucket. There are also frameworks like serverless or SAM that handles deploying AWS lambda for you, so you don’t have to manually create and upload the zip file.

    There is, however, one problem.

    You have created a simple function which depends on a large number of other packages. AWS lambda requires you to zip everything together. As a result, you have to upload a lot of code that never changes what increases your deployment time, takes space, and costs more.

    AWS Lambda Layers

    Fast forward 4 years later at 2018 re:Invent AWS Lambda Layers are released. This feature allows you to centrally store and manage data that is shared across different functions in the single or even multiple AWS accounts! It solves a certain number of issues like:

    • You do not have to upload dependencies on every change of your code. Just create an additional layer with all required packages.
    • You can create custom runtime that supports any programming language.
    • Adjust default runtime by adding data required by your employees. For example, there is a team of Cloud Architects that builds Cloud Formation templates using the troposphere library. However, they are no developers and do not know how to manage python dependencies… With AWS lambda layer you can create a custom environment with all required data so they could code in the AWS console.

    But how does the layer work?

    When you invoke your function, all the AWS Lambda layers are mounted to the /opt directory in the Lambda container. You can add up to 5 different layers. The order is really important because layers with the higher order can override files from the previously mounted layers. When using Python runtime you do not need to do any additional operations in your code, just import library in the standard way. But, how will my python code know where to find my data?

    That’s super simple, /opt/bin is added to the $PATH environment variable. To check this let’s create a very simple Python function:

    import os
    def lambda_handler(event, context):
        path = os.popen("echo $PATH").read()
        return {'path': path}

    The response is:

        "path": "/var/lang/bin:/usr/local/bin:/usr/bin/:/bin:/opt/bin\n"


    Existing pre-defined layers

    AWS layers have been released together with a single, publicly accessible library for data processing containing 2 libraries: NumPyand SciPy. Once you have created your lambda you can click  `Add a layer` in the lambda configuration. You should be able to see and select the AWSLambda-Python36-SciPy1x layer. Once you have added your layer you can use these libraries in your code. Let’s do a simple test:

    import numpy as np
    import json
    def lambda_handler(event, context):
        matrix = np.random.randint(6, size=(2, 2))
        return {
            'matrix': json.dumps(matrix.tolist())

    The function response is:

      "matrix": "[[2, 1], [4, 2]]"


    As you can see it works without any effort.

    What’s inside?

    Now let’s check what is in the pre-defined layer. To check the mounted layer content I prepared simple script:

    import os
    def lambda_handler(event, context):
        directories = os.popen("find /opt/* -type d -maxdepth 4").read().split("\n")
        return {
            'directories': directories

    In the function response you will receive the list of directories that exist in the /opt directory:

      "directories": [

    Ok, so it contains python dependencies installed in the standard way and nothing else. Our custom layer should have a similar structure.

    Create Your own layer!

    Our use case is to create an environment for our Cloud Architects to easily build Cloud Formation templates using troposphere and awacs libraries. The steps comprise:
    <h3″>Create virtual env and install dependencies

    To manage the python dependencies we will use pipenv.

    Let’s create a new virtual environment and install there all required libraries:

    pipenv --python 3.6
    pipenv shell
    pipenv install troposphere
    pipenv install awacs

    It should result in the following Pipfile:

    url = ""
    verify_ssl = true
    name = "pypi"
    troposphere = "*"
    awacs = "*"
    python_version = "3.6"

    Build a deployment package

    All the dependent packages have been installed in the $VIRTUAL_ENV directory created by pipenv. You can check what is in this directory using ls command:


    Now let’s prepare a simple script that creates a zipped deployment package:

    mkdir -p $PY_DIR                                              #Create temporary build directory
    pipenv lock -r > requirements.txt                             #Generate requirements file
    pip install -r requirements.txt --no-deps -t $PY_DIR     #Install packages into the target directory
    cd build
    zip -r ../ .                                  #Zip files
    cd ..
    rm -r build                                                   #Remove temporary directory

    When you execute this script it will create a zipped package that you can upload to AWS Layer.


    Create a layer and a test AWS function

    You can create a custom layer and AWS lambda by clicking in AWS console. However, real experts use CLI (AWS lambda is the new feature so you have to update your awscli to the latest version).

    To publish new Lambda Layer you can use the following command (my zip file is named

    aws lambda publish-layer-version --layer-name tropo_test --zip-file fileb://

    As the response, you should receive the layer arn and some other data:

        "Content": {
            "CodeSize": 14909144,
            "CodeSha256": "qUz...",
            "Location": ""
        "LayerVersionArn": "arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:layer:tropo_test:1",
        "Version": 1,
        "Description": "",
        "CreatedDate": "2018-12-01T22:07:32.626+0000",
        "LayerArn": "arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:layer:tropo_test"

    The next step is to create AWS lambda. Yor lambda will be a very simple script that generates Cloud Formation template to create EC2 instance:

    from troposphere import Ref, Template
    import troposphere.ec2 as ec2
    import json
    def lambda_handler(event, context):
        t = Template()
        instance = ec2.Instance("myinstance")
        instance.ImageId = "ami-951945d0"
        instance.InstanceType = "t1.micro"
        return {"data": json.loads(t.to_json())}

    Now we have to create a zipped package that contains only our function:


    And create new lambda using this file (I used an IAM role that already exists on my account. If you do not have any role that you can use you have to create one before creating AWS lambda):

    aws lambda create-function --function-name tropo_function_test --runtime python3.6 
    --handler handler.lambda_handler 
    --role arn:aws:iam::xxxxxxxxxxxx:role/service-role/some-lambda-role 
    --zip-file fileb://

    In the response, you should get the newly created lambda details:

        "TracingConfig": {
            "Mode": "PassThrough"
        "CodeSha256": "l...",
        "FunctionName": "tropo_function_test",
        "CodeSize": 356,
        "RevisionId": "...",
        "MemorySize": 128,
        "FunctionArn": "arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:function:tropo_function_test",
        "Version": "$LATEST",
        "Role": "arn:aws:iam::xxxxxxxxx:role/service-role/some-lambda-role",
        "Timeout": 3,
        "LastModified": "2018-12-01T22:22:43.665+0000",
        "Handler": "handler.lambda_handler",
        "Runtime": "python3.6",
        "Description": ""

    Now let’s try to invoke our function:

    aws lambda invoke --function-name tropo_function_test --payload '{}' output
    cat output
    {"errorMessage": "Unable to import module 'handler'"}

    Oh no… It doesn’t work. In the CloudWatch you can find detailed log message: `Unable to import module ‘handler’: No module named ‘troposphere’` This error is obvious. Default python3.6 runtime does not contain troposphere library. Now let’s add layer we created in the previous step to our function:

    aws lambda update-function-configuration --function-name tropo_function_test --layers arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:layer:tropo_test:1

    When you invoke lambda again you should get the correct response:

      "data": {
        "Resources": {
          "myinstance": {
            "Properties": {
              "ImageId": "ami-951945d0",
              "InstanceType": "t1.micro"
            "Type": "AWS::EC2::Instance"

    Add a local library to your layer

    We already know how to create a custom layer with python dependencies, but what if we want to include our local code? The simplest solution is to manually copy your local files to the /python/lib/python3.6/site-packages directory.

    First, let prepare the test module that will be pushed to the layer:

    $ find local_module
    $ cat cat local_module/
    def echo_hello():
        return "hello world!"

    To manually copy your local module to the correct path you just need to add the following line to the previously used script (before zipping package):

    cp -r local_module 'build/python/lib/python3.6/site-packages'

    This works, however, we strongly advise transforming your local library into the pip module and installing it in the standard way.

    Update Lambda layer

    To update lambda layer you have to run the same code as before you used to create a new layer:

    aws lambda publish-layer-version --layer-name tropo_test --zip-file fileb://

    The request should return LayerVersionArn with incremented version number (arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:layer:tropo_test:2 in my case).

    Now update lambda configuration with the new layer version:

    aws lambda update-function-configuration --function-name tropo_function_test --layers arn:aws:lambda:eu-central-1:xxxxxxxxxxxx:layer:tropo_test:2

    Now you should be able to import local_module in your code and use the echo_hello function.


    Serverless framework Layers support

    Serverless is a framework that helps you to build applications based on the AWS Lambda Service. It already supports deploying and using Lambda Layers. The configuration is really simple – in the serverless.yml file, you provid the path to the layer location on your disk (it has to path to the directory – you cannot use zipped package, it will be done automatically). You can either create a separate serverless.yml configuration for deploying Lambda Layer or deploy it together with your application.

    We’ll show the second example. However, if you want to benefit from all the Lambda Layers advantages you should deploy it separately.

    service: tropoLayer
      individually: true
      name: aws
      runtime: python3.6
        path: build             # Build directory contains all python dependencies
        compatibleRuntimes:     # supported runtime
          - python3.6
        handler: handler.lambda_handler
           - node_modules/**
           - build/**
          - {Ref: TropoLayerLambdaLayer } # Ref to the created layer. You have to append 'LambdaLayer'
    string to the end of layer name to make it working

    I used the following script to create a build directory with all the python dependencies:

    mkdir -p $PY_DIR                                              #Create temporary build directory
    pipenv lock -r > requirements.txt                             #Generate requirements file
    pip install -r requirements.txt -t $PY_DIR                   #Install packages into the target direct

    This example individually packs a Lambda Layer with dependencies and your lambda handler. The funny thing is that you have to convert your lambda layer name to be TitleCased and add the `LambdaLayer` suffix if you want to refer to that resource.

    Deploy your lambda together with the layer, and test if it works:

    sls deploy -v --region eu-central-1
    sls invoke -f tropo_test --region eu-central-1


    It was a lot of fun to test Lambda Layers and investigate how it technically works. We will surely use it in our projects.

    In my opinion, AWS Lambda Layers is a really great feature that solves a lot of common issues in the serverless world. Of course, it is not suitable for all the use cases. If you have a simple app, that does not require a huge number of dependencies it’s easier for you to have everything in the single zip file because you do not need to manage additional layers.

    Read more on AWS Lambda in our blog!

    Notes from AWS re:Invent 2018 – Lambda@edge optimisation

    Running AWS Lambda@Edge code in edge locations

    Amazon SQS as a Lambda event source

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

      Notes from AWS Chalk session at AWS re:Invent 2018 – Lambda@Edge optimisations



      Lambda@Edge makes it possible to run Lambda code in Edge locations to modify viewer/origin requests. This can be used to modify HTTP headers, change content based on user-agent and more. We’ve written about it previously, so feel free to read this blog post if you want an introduction:

      There are quite a few limitations for Lambda@Edge which depends on which request event you are responding on. For example the maximum response that is generated by the lambda function is totally different depending on if it is a viewer or origin response (40 Kb vs 1 Mb). The function in itself also has limits, such as maximum 3GB of memory allocation and 50 Mb zipped deployment package size.

      This means that most use-cases have a need for optimisation. First thing first: evaluate if you really need to use Lambda@Edge. Cloudfront currently have a lot of functionality that is possible to take part of before trying to reinvent the wheel – caching depending on device, selecting headers to base caching on, regional blocks with WAF, etc. Even your origin can sometimes handle header rewrites and other header manipulation, which means that there is no need to spend the time to build it yourself. So you should only use Lambda@Edge if you know that cloudfront can’t do it and that there will be a benefit to rendering or serving your content at the edge.

      Optimise before the function

      If you’ve decided to use Lambda@Edge you should first look into the optimisations you can do before the function is invoked by the event. Cloudfront does a lot of optimisations for you. It groups requests so that if the response time of the object fetch is the same it will put them together and do only one get instead of sending all of them to the origin. Note that cloudfront is a multilayered CDN which will try to catch the cache from the closest location in cloudfront on miss in a specific region as well, so there is no need to build multi-region caching yourself. Another thing to look at in cloudfront is the origin paths that the event reacts upon. Perhaps the function only needs to react on a very specific HTTP path. If possible it is also always better to let the function react on origin events instead of viewer events which in turn makes the amount of events to react upon fewer and you have higher limitations for function size, response time and resource allocation.

      Coding optimisations

      When writing the function you should try to utilise global variables as much as possible since they are re-used between invocations and cached on the workers for a couple of hours. Small things such as keeping TCP sockets usable and perhaps using UDP instead of TCP can make a difference especially since Lambda@Edge is synchronous.

      Deployment testing

      When deploying the function, look at minimising the code with different tools such as browserify. Also note that Lambda@Edge can be deployed with different memory allocations so make sure that you test which size gives you the best bang for the buck – sometimes raising the memory usage from 128 Mb to 256 Mb gives you much faster responses without costing that much more.

      S3 performance

      If you are fetching content from S3, try using S3 Select to get just what you need from a subset of data from an object by using simple SQL expressions, and even better, try to use cached content in Cloudfront instead of trying to fetch it from S3 or other origins. This makes a lot of sense especially if the data can be cached.

      Last but not least: Remove the function when not in use. Don’t use Lambda@Edge if you don’t need to anymore.

      If you’d like to learn more about moving your business to the Cloud, please contact us here.

      Get in Touch.

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

        Nordcloud at AWS re:Invent 2018



        We are attending AWS re:Invent Nov. 26-30, 2018 in Las Vegas.

        AWS re:Invent 2018 is expecting some 40 000 attendees on 26.-30.11.2018 in Las Vegas, USA. This year, re:Invent will feature sessions that cover topics that seen also in past years, such as databases, analytics & big data, security & compliance, enterprise, machine learning, and compute. You can to cross-search these topics in more detail in the session catalog.

        We picked some interesting sessions to check from re:Invent 2018:

        1. Optimizing Costs as You Scale on AWS
        2. The Future of Enterprise Applications is Serverless
        3. Driving DevOps Transformation in Enterprises
        4. How HSBC Uses Serverless to Process Millions of Transactions in Real Time
        5. Build, Train, and Deploy Machine Learning for the Enterprise with Amazon SageMaker
        6. Managing Security of Large IoT Fleets
        7. Meeting Enterprise Security Requirements with AWS Native Security Services

        Follow our social media postings from Las Vegas

        Join the conversations on your expectations from re:Invent 2018.

        Here is our AWS guru Miguel´s video wish list:

        Check also Mikael´s (Data Driven Business Lead and AWS guru) wish list:

        Also, make sure you watch our AWS Alliance Lead Niko’s expectations for the event:

        Get in Touch.

        Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

          Cloud Computing News #13: Coming soon: AWS re:Invent 2018 and two new regions



          This week we focus on latest news from our partner: AWS re:Invent and launch of two new regions.

          AWS re:Invent 2018 – 6 Days to Go

          AWS re:Invent 2018 is expecting some 40 000 attendees on 26.-30.11.2018 in Las Vegas, USA. This year, re:Invent will feature sessions that cover topics that seen also in past years, such as databases, analytics & big data, security & compliance, enterprise, machine learning, and compute. You can to cross-search these topics in more detail in the session catalog.

          We picked some interesting sessions to check from re:Invent 2018:

          1. Optimizing Costs as You Scale on AWS
          2. The Future of Enterprise Applications is Serverless
          3. Driving DevOps Transformation in Enterprises
          4. How HSBC Uses Serverless to Process Millions of Transactions in Real Time
          5. Build, Train, and Deploy Machine Learning for the Enterprise with Amazon SageMaker
          6. Managing Security of Large IoT Fleets
          7. Meeting Enterprise Security Requirements with AWS Native Security Services

          Our team is also attending the event. Follow our postings from Las Vegas, and join the conversations on your expectations from re:Invent 2018.

          Here is our AWS guru Miguel´s video wish list:

          Check also Mikael´s (Data Driven Business Lead and AWS guru) wish list:

          Coming soon AWS region in Milan, Italy and a new region in Sweden set to launch later this year

          Last week AWS announced that they are building a new AWS Region in Milan, Italy and plan to open it up in early 2020. The upcoming Europe Region will have three Availability Zones and will be AWS´s sixth region in Europe, joining the existing regions in France, Germany, Ireland, the UK, and the new region in Sweden that is set to launch later this year.

          AWS currently has 57 Availability Zones in 19 geographic regions worldwide, and another 15 Availability Zones across five regions in the works for launch between now and the first half of 2020 (check out the AWS Global Infrastructure page for more info).

          Read more in AWS blog

          Nordcloud is AWS Premier Consulting Partner since 2014 and AWS Managed Service Provider since 2015

          At Nordcloud we know the AWS cloud, and we can help you take advantages of all the benefits Amazon Web Services has to offer.

          How can we help you take your business to the next level? 

          Get in Touch.

          Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.




            Much like the last 5 years at re:Invent, we were treated on the Thursday to a keynote by Werner Vogels, speaking at the MGM Grand Garden Arena. It’s a huge space and the production values that AWS brings to their keynotes (coupled with the 16,800 capacity) made for an electric start to the morning.

            Vogels started the keynote by reflecting on the keynotes he has delivered over the last 5 years. During his first ever keynote back in 2012, Vogels discussed 21st-century architecture. He provided 4 guiding commandments: Controllable, Resilient, Adaptive, and Data Driven. He returned to this theme by calling this particular keynote ’21st Century Architectures, re:Imagined’


            It was made clear from the start that, unlike previous years, there would be relatively few announcements. He was true to his word, and instead focussed on just a few key themes. Vogels took time to thank AWS’s customers, reflecting that in the beginning, they knew they had to be collaborative to succeed. They wanted to build a collection of ‘nimble’ tools which could be assembled to build what customers needed. AWS listen to customer feedback, launching services that are rock solid, then working with customers to set the roadmap and development priorities.

            AWS want to help you build services for the future, and a lot of the announcements this week are enabled by developments in technology that have come about in the last 2-3 years.


            Voice As A Control System

            One of the themes Vogels spoke about was IoT and allowing whole environments to become accessible. Every device has the ability to become an input or output device, but with so many out there, it’s good to consider how we interact with all of them and their systems. Vogels believes that digital interfaces of the future will be human-centric, and the things that we as humans use to communicate will become the inputs to systems. The first of these will be the voice as it’s the most natural and easiest interaction.

            Once you can use your voice to control systems, Vogel suggested people won’t look back, from surgeons operating theatre equipment, to simply controlling the lighting or heating in your house, it will unlock digital systems for everyone.

            To demonstrate this point, Vogels talked about the International Rice Research Institute who provide rice farmers advice on how much and which fertiliser to put on their crops based on their years of research. Consumption of this information was very low until they invested in a voice interface. Farmers can call, select from one of 27 dialects, and provide information on their land and crop conditions. They then use voice recognition and machine learning to read back to the farmer which fertilizer they need.

            This was building up to the announcement of Alexa Business, a service that ‘makes it easier for you to introduce Alexa to your organization, providing the tools you need to set-up and manage Alexa enabled devices, enroll users, and assign skills at scale’


            Ensure You Are Well Architected

            The next theme of the keynote was architecture. Typically, systems have three planes: Admin, Control, and Data. (Vogels suggested architecture that extensive was difficult to visualise on marketing slides!) The AWS Well Architected Framework was launched two years ago and has grown from a single document to five pillars across five documents with two ‘lenses’. It guides the user on how to architect for specific use cases, (currently HPC and Serverless). The framework is included in AWS certifications and AWS regularly run boot camps and ‘Well Architected Reviews’ for its customers.


            Dance Like No One Is Watching, Encrypt Like Everyone Is

            This particular section had a strong focus on security and availability. On security, Vogels recapped everything you need to ensure you are doing, from implementing a strong identity foundation to automating security best practices. The need to encrypt everything was also highlighted and security has become a problem for all. Developers are now seen to be the new security team and everything needs to be remembered. For example, ensuring the security of the CI/CD pipeline, as well as ensuring security within the pipeline.

            Development has also changed over time, meaning you need to be more security aware. It’s more collaborative, there are more languages, and more services and teams are combining. To help out, AWS have launched Cloud9a cloud-based IDE, including a code editor, debugger, and a terminal pre-packaged with essential tools (JavaScript, PHP, Python), to allow you to write, run and debug your code, so you don’t need to set-up your development environments to start new projects.


            Everything Will Fail. All The Time

            Availability, reliability, and resilience were discussed, from the basics, (hard dependencies reduce availability, redundant dependencies increase availability) to the best practices of Distributed Systems, through to deployment automation and testing. Nora Jones (Netflix), gave the example of using Chaos Engineering and how they do this at Netflix.

            Vogels highlighted that available systems cost more and therefore it becomes a business decision whether to easily run something in a single availability zone, but only achieve 99% of uptime. If you want to increase this you need to distribute your services across multiple availability zones or even regions. DynamoDB Global Tables, for example, help you to do this, becoming the ultimate tool in reliability design. Although this has little to do with AWS (and more to do with decisions made within organisations), AWS can make this much easier for you. This brings us nicely onto the final part of the keynote – letting AWS do the ‘heavy lifting’ through its managed services.

            Galls Law says, “A complex system that worked is invariably found to have evolved from a simple system that worked”. AWS allows you to keep your systems simple by providing nimble services which you can assemble to build what you need. If you run your own RDBMS, you have to take care of the control and data planes. If you run on AWS, AWS manages the control plane. AWS Managed Services are designed for AWS to control the complex and hard to manage moving parts. making it simpler for you. This was demonstrated by Abby Fuller speaking about containers on AWS, and how Amazon Fargate can help you to make your environment much more simple. AWS will continue to release managed services over the next year.



            Serverless was something that couldn’t possibly be missed out of this keynote, with it being the ultimate AWS Managed Service. There is no server management, has flexible scaling, high availability, and no idle capacity. Here are the final (Lambda) product announcements

            In addition, the AWS Serverless Application Repository was also announcedallowing users to discover collections of serverless apps and easily deploy these into your account in a few clicks. You can also publish your own apps to share with the community, allowing you to easily consume their 3rd party Lambda functions and apply them to your environments.


            If you would like to understand how Nordcloud can help you take advantage of AWS Managed Services, help discuss whether your environment is well architected for, or discuss any other of the releases made this week, please get in touch. 


            Get in Touch.

            Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

              Day 2 at Re:Invent – Builders & Musicians Come Together



              When Werner Vogels makes bold statements, expectations are set high. So when Vogel’s tweeted 15 minutes before the start of re:Invent’s day 2 keynote, we had to wonder what was coming.

              And how right we were. The close to 3 hours spent in the Venetian hotel in Las Vegas was an experience in itself.

              Andy Jassy opened the keynote with a long list of customers and partners, alongside the latest business figures. AWS are currently running at an 18 billion run rate with an incredible 42% YoY growth. With millions of active customers – defined as accounts that have used AWS in the last 30 days – the platform is by far the most used on the planet.

              As per Gartner’s 2016 Worldwide Market Segment Share analysis, the company (successfully led by Jassy), has achieved a 44.1% market share in 2016, up from 39% in 2015, more than everyone else combined. This became easily noticeable when AWS displayed an entire catalogue of new services throughout the keynote. The general stance Jassy took this year was that AWS are trying to serve their customers exactly what they asked for in terms of new products. The mission of AWS is nothing short of fixing the IT industry in favour of the end-users and customers.

              The first on stage was a live ‘house’ band, performing a segment of ‘Everything is Everything’ by Lauryn Hill, the chorus rhyming with ‘after winter must come spring’. Presumably, AWS was referring to the world of IT still being in a kind of eternal ‘winter’. The concept we also heard here was that AWS would not stop building their portfolio and that they want to offer all the tools their ‘builders’ and customers need.

              AWS used Jassy’s keynote for some big announcements (of course, set to music), with themes across the following areas:

              • Compute
              • Database
              • Data Analytics
              • Machine Learning and
              • IoT

              The Compute Revolution Goes On

              Starting in the compute services area, an overview of the vast number of compute instance types and families were shown, with special emphasis given to the Elastic GPU options. There were a few announcements also made on the Tuesday night, including Bare Metal InstancesStreamlined Access to Spot Capacity & Hibernationmaking it easier for you to get up to 90% of savings on normal pricing. There was also M5 instances which offer better-priced performance than their predecessors, and H1 instances offering fast and dense storage for Big Data applications.

              However, with the arrival of Kubernetes in the industry, it was the release of the Elastic Kubernetes that was the most eagerly anticipated. Not only have AWS recognised that their customers wanted Kubernetes on AWS, but they also realise that there’s a lot of manual labour involved in maintaining and managing the servers that run ECS & EKS.

              To solve this particular problem, AWS announced AWS Fargate, a fully managed service for both ECS & EKS meaning no more server management and therefore increasing the ROI in running containers on the platform. This is available for ECS now and will be available for EKS in early 2018.

              Having started with servers and containers, Jassy then moved on to the next logical evolution of infrastructure services: Serverless. With a 300% usage growth, it’s fair to say that if you’re not running something on Lambda yet, you will be soon. Jassy reiterated that AWS are building services that integrate with the rest of the AWS platform to ensure that builders don’t have to compromise. They want to make progress and get things done fast. Ultimately, this is what AWS compute will mean to the world: faster results. Look out for a dedicated EKS blog post coming soon!

              Database Freedom

              The next section of the keynote must have had some of AWS’s lawyers on the edge of their seats, and also the founder of a certain database vendor… AWS seem to have a clear goal to put an end to the historically painful ‘lock-in’ some customers experience, referring frequently to ‘database freedom’. There’s a lot of cool things happening with databases at the moment, and many of the great services and solutions shown at re:Invent are built using AWS database services. Out of all of these, Aurora is by far growing the fastest, and actually is the fastest growing service in the entire history of AWS.

              People love Aurora because it can scale out for millions of reads per second. It can also autoscale new read replicas and offers seamless recovery from reading replica failures. People want to be able to do this faster, which is why AWS launched a new Aurora features, Auto Multi-Master. This allows for zero application downtime due to any write node failure (previously, AWS suggested this took around 30 seconds), and zero downtime due to an availability zone failure. During 2018 AWS will also introduce the ability to have multi-region masters – this will allow customers to easily scale their applications across regions have a single, consistent data source.

              Lastly, and certainly not least, was the announcement of Aurora Serverless. which is an on-demand, auto-scaling, Serverless version of Aurora. The users pay by the second – an unbelievably powerful feature for many use cases.

              Finally, Jassy turned its focus point to DynamoDB service, which scaled to ~12.9 million requests per second at its peak during the last Amazon Prime Day. Just let that sink in for a moment! The DynamoDB service is used by a huge number of major global companies, powering mission-critical workloads of all kinds. The reason for this is, from our perspective, is the fact that it’s very easy to access and use as a service. What was announced today was the new feature DynamoDB Global Tables. This enables users to build high performance, globally distributed applications.

              The final database feature released for DynamoDB was managed back-up & restore, allowing for on-demand backups, point-in-time recovery (in the past 35 days), allowing backups for data archival or regulatory requirements to be taken of hundreds of TB with no interruption.

              Jassy wrapped up the database section of his keynote by announcing Amazon Neptune, a fully managed graph database which will make it easy to build and run applications that work with highly connected data sets.


              Next Jassy turned to Analytics, commenting that people want to be using S3 as their data lake. Athena allows for easy querying of structured data within S3, however, most analytics jobs involve processing only a subset of the data stored within S3 objects and Athena requires the whole object to the processed. To ease the pain, AWS released S3 Select – allowing for applications, (including Athena) to retrieve a subset of data from an S3 object using simple SQL expressions – AWS claim drastic performance increases – possibly up to 400% performance.

              Many of our customers are required by regulation to store logs for up to 7 years and as such ship them to Glacier to reduce the cost of storage. This becomes problematic if you need to query this data though. How great would it be if this could become part of your data lake? Jassy asked, before announcing Glacier Select. Glacier Select allows for queries to be run directly on data stored in Glacier, extending your data lake into Glacier while reducing your storage costs.

              Machine Learning

              The house band introduced Machine Learning with ‘Let it Rain’ from Eric Clapton. Dr Matt Woods made an appearance and highlighted how important machine learning is to Amazon itself. The company uses a lot of it, from personal recommendations on to Fulfillment automation & inventory in its warehouses.

              Jassy highlighted that AWS only invests in building technology that its customers need, (and, remember is a customer!) not because it is cool, or it is funky. Jassy described three tiers of Machine Learning: Frameworks and Interfaces, Platform Services & Application Services.

              At the Frameworks and Interfaces tier emphasis was placed on the broad range of frameworks that could be used on AWS, recognising that one shoe does not fit every foot and the best results come when using the correct tool for the job. Moving to the Platform Services tier, Jassy highlighted that most companies do not have to expect machine learning practitioners (yet) – it is after all a complex beast. To make this easy for developers, Amazon SageMaker was announced – a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale.

              Also at the platform tier, AWS launched DeepLens, a deep learning enabled wireless video camera designed to help developers grow their machine learning skills. This integrates directly with SageMaker giving developers an end-to-end solution to learn, develop and test machine learning applications. DeepLens will ship in early 2018, available on for $249.

              The machine learning announcements did not stop there! As Jassy moved into the Application Services tier AWS launched:


              Finally, Jassy turned to IoT – identifying five ‘frontiers’ each with its own release, either available now, or in early 2018:

              1. Getting into the game – IoT One Click (in Preview) will make it easy for simple devices to trigger AWS Lambda functions that execute a specific action.
              2. Device Management – AWS IoT Device Management will provide fleet management of connected devices, including the onboarding, organisation, monitor and remote management through a devices lifetime.
              3. IoT Security – AWS IoT Device Defender (early 2018) will provide security management to your fleet of IoT devices, including auditing to ensure your fleet meets best practice.
              4. IoT Analytics – AWS IoT Analytics, making it easy to cleanse, process, enrich, store, and analyze IoT data at scale.
              5. Smaller Devices – Amazon FreeRTOS, an operating system for microcontrollers.

              Over the next weeks and days, the Nordcloud team will be diving deeper into these new announcements, (including our first thoughts after getting our hands on the new releases) We’ll also publish our thoughts and how they can benefit you.

              It should be noted that, compared to previous years, AWS are announcing more outside the keynotes, in sessions and on their Twitch Channel and so there are many new releases which are not gaining the attention they might deserve. Examples include T2 UnlimitedInter-Region VPC Peering and Launch Templates for EC2 – as always the best place to keep up-to-date is the AWS ‘whats new‘ page.

              If you would like to discuss how any of today’s announcements could benefit your business, please get in touch.

              Get in Touch.

              Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.




                AWS re:Invent kicked off in Vegas yesterday, and a number of the Nordcloud team have travelled across the pond to attend what is probably the biggest Cloud event on the planet.

                In 2016, Jaakko documented the sheer scale that is involved in running re:Invent, and this year AWS have managed to scale it up again. With a 44k+ attendees, (30k+ in 2016) the campus now spreads across 7 hotels the length of the strip – that’s a 50-minute walk end to end (luckily there are shuttle buses!). The partner expo has doubled in size and is now across two locations, making the scale of the conference hard to comprehend.

                Sitting in the 16,800 capacity arena for the Partner Keynote on the first ‘formal’ day of the event was a little awe-inspiring. Werner Vogel’s keynote will be in the same location and we fully expect the atmosphere to be electric when filled to capacity on Thursday.

                Today’s keynote was led by Terry Wise, AWS’ VP of Global Alliances, Ecosystems and Channels who highlighted both the growth of AWS, (1300 new releases so far this year and 70 on Monday of re:Invent alone) and the growth of the AWS Partner Community. AWS have clear evidence that those customers who work with companies within their partner ecosystem, such (as Nordcloud) are able to adopt the cloud faster and more effectively. In this way, AWS is committed to providing partners with the training and tools to help them do that (launching several within the keynote).

                AWS recognise that their customers want skilled partners who are specialised and to support the AWS partner competency programme. AWS, therefore, audits the partners on their skillset and ensures that we have completed reference-able real-world projects. Today, AWS announced a Networking and Machine Learning Competency, and coming in 2018, Blockchain, Containers, End User Computing & Cloud Management Tools. Nordcloud already holds the DevOps Competency, and we are Managed Services, Lambda, DynamoDB and API Gateway Partners. We will, of course, be looking to add some of these new competencies to our lineup.

                Wise believes that “Cloud is the foundation for innovation” and to help demonstrate this, he invited a number of people to the stage:

                • Colleen Manaher, the Executive Director of U.S. Customers and Border Protection (CBP) talked about how CBP was using AWS and Machine learning to accomplish their vision of allowing you to “go from reservation to destination and back again without a passport or a boarding pass”, using facial recognition, machine learning and AWS.
                • Matt Wood, GM, Deep Learning and AI at AWS talked about the challenges of Machine Learning & how Amazon uses AI on a daily basis (from robots in their warehouses and Amazon Alexa to recognising music and actors in Amazon Prime Video).
                • John Nichols from PG&E talked about the importance of Mentorship and Culture when transitioning to the cloud.
                • David Mccann announced updates in the AWS Market Place & Service Catalog to make it easier for partners to deliver their services to end customers, including an Enterprise Contract for the AWS Marketplace.
                • Steve Bashada, EVP and GM of Siemens PLM talked about how Siemens is using IoT (helping them increase availability of trains from 87% to 99%) and how they have launched MindSphere on AWS, which will allow companies to develop robust IoT solutions faster on AWS.
                • Andy Jassy, CEO of AWS, talked about the importance of Partners for AWS and provided some insight into where he sees the market going over the next 12 months.

                The Nordcloud team are in Las Vegas until Saturday morning, meeting with partners, customers & attending sessions. If you would like to discuss anything in this blog post of how Nordcloud can help, please get in touch!

                Get in Touch.

                Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.