Findings from AWS re:Invent 2019, Part 2

I was expecting the usual set of service and feature announcements in Wernel Vogels’ Thursday keynote, but instead he did focus on what is happening behind the scenes of AWS, especially EC2 Nitro architecture and S3. So instead of analyzing Werner’s keynote, I picked 2 announcements from Wednesday that didn’t make to keynotes but are worthy of attention because how these will simplify building APIs and distributed applications.

Amazon API Gateway HTTP APIs

Amazon API Gateway HTTP APIs will lower the barrier of entry when starting to build that next great service or application. It is now trivial to get started with HTTP proxy for lambda function(s);

% aws apigatewayv2 create-api \
    —-name MyAPIname \
    —-protocol-type HTTP \
    --target arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION

It is also nice that HTTP API has Serverless Application Model (SAM) support from day 1. And when your API start getting attention, pricing is up to 70% cheaper than generic API Gateway. Compatible API Gateway definitions (=HTTP and Lambda backends with OIDC/JWT based authorization) can be exported and re-imported as HTTP APIs.

Amplify DataStore

Amplify DataStore is queryable, on-device data store for web, IoT, and mobile developers using React Native, iOS and Android. Idea is that you don’t need to write separate code for offline and online scenarios. Working with distributed cross-user data is as simple as using local data. DataStore is available with the latest Amplify Javascript client, iOS and Android clients are in preview.

DataStore blog post and demo app is a good way to get your feet wet with DataStore and see how simple it can be to create applications using shared state between multiple online and offline clients.

Interested in reading more about Petri’s views and insights? Follow his blog CarriageReturn.Nl

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    Findings from AWS re:Invent 2019, Part 1

    ML/AI was definitely the topic of Andy Jassy’s re:Invent Tuesday keynote. Another area of major investment was service proximity to customers and end-users. With that it was only natural there were also some new networking features to help building multi-region connectivity.

    Machine Learning for the Masses

    ML/AI received a lot of love in Tuesday announcements. If there is one thing to pick from the group, it would be SageMaker Autopilot:

    “With this feature, Amazon SageMaker can use your tabular data and the target column you specify to automatically train and tune your model, while providing full visibility into the process. As the name suggests, you can use it on autopilot, deploying the model with the highest accuracy with one click in Amazon SageMaker Studio, or use it as a guide to decision making, enabling you to make tradeoffs, such as accuracy with latency or model size.”

    Together with SageMaker Studio web-based IDE this is to democratize artesan work of data analytics. There were also 3 interesting real-world applications of ML announced (all in preview);

    • Amazon CodeGuru for automated code reviews and application performance recommendations.
    • Amazon Fraud Detector is managed service to identify fraudulent activities such as online payment fraud and the creation of fake accounts.
    • Amazon Detective is service to analyze, investigate and find root cause for potential security issues or suspicious activities based on analysis of logs from AWS resources.

    As services these are all very easy to consume and can bring a lot of value in preventing costly mistakes from happening. These also follow the same pattern as SageMaker Autopilot, automating artesan work traditionally performed by skilled (but overloaded) individuals.

    Getting Closer to Customer

    Another theme in Tuesday’s announcements was cloud services getting physically closer to customers. This is important when you must keep your data in certain country or need very low latencies.

    AWS Local Zone is an extension of AWS region. It brings compute, storage and selected subset of AWS services closer to customer. The very first local zone was announced in Los Angeles but I would expect these to be popping up in many cities around the world that don’t yet have their own AWS region nearby.

    If local zone is not close enough, then there is AWS Wavelength. This is yet another variation of (availability) zone. Wavelength has similar (but not the same?) subset of AWS services as Local Zone. Wavelength zones are co-located at 5G operators edges that helps in building ultra low latency services for mobile networks.

    AWS Outpost is now in GA and support for EMR and container services like ECS, EKS and App Mesh was added to service mix of Outpost. Pricing starts from $225k 3-year-upfront or $7000/month for 3 year subsciption. I think many customers would want to wait and see how Local Zones are expanding before investing in on-prem hardware.


    AWS has had a tradition of changing networking best-practices every year at re:Invent. This year it wasn’t quite as dramatic but there were very welcome feature announcements that go nicely with the idea of different flavours of local regions.

    Transit Gateway inter-region peering allows you to build global WAN within AWS networks. This is great feature when building multi-region services or have your services spread across multiple regions because of differences in local service mix. That said, please notice inter-region peering is only available at certain regions at launch.

    Transit Gateway Network Manager enables you centrally manage and monitor your global network, not only on AWS but also on-premises. As networking is getting much more complex this global view and management is going to be most welcome help. It will also help in shifting the balance of network management from on-premises towards public cloud.

    Finally support for multicast traffic was one of the last remaining blockers for moving applications to VPC. With the announcement of Transit Gateway Multicast support even that is now possible. Fine print says multicast is not supported over direct connect, site-to-site VPN or peering connections.

    Interested in reading more about Petri’s views and insights? Follow his blog CarriageReturn.Nl

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

      Data Lake best practices in AWS


      BlogTech Community

      Many businesses are looking into enabling analytics on many different types of data sources and gain insights to guide them to better business decisions. A data lake is one way of doing that, where you have a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. Data analysts can in a data lake then leverage the data with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Redshift, Athena, and more.


      AWS Data Lake Formation

      AWS Data Lake Formation is a new tool that makes it easier for businesses to setup a data lake – something that previously was a big undertaking taking months can now be broken down into just a few days of work. Data Lake Formation will automatically crawl, clean and prepare the data which you in turn can use to train machine learning models to dedupe based on what you want the data to look like. The most interesting functionality from the new Data Lake Formation might be the centralized dashboard for secure access on table and column level across all tools in the data lake – something that previously has been quite complicated and required third party tooling.

      Data lake

      Data lake best practices

      Best practices for utilizing a data lake optimized for performance, security and data processing were discussed during the AWS Data Lake Formation session at AWS re:Invent 2018. The session was split up into three main categories: Ingestion, Organisation and Preparation of data for the data lake. Your current bottleneck may lie in all or any of these three main categories, as they often interlink – so make sure to look into all of the categories to optimize your data.



      The main takeaway from the session was that S3 should be used as a single source of truth where data ingested is preserved. No transformation of data should happen in the ingestion S3 storage. If you transform the data, it should be copied to another S3 bucket.

      To optimize ingestion so that you don’t have a bucket full of old data at all times, you should also look into utilizing object life cycle policies so that data that you aren’t using gets moved to a cheaper storage class such as glacier. This especially makes sense for data that is outside of your time-scope and that is not interesting for analytics anymore.

      Getting data in from databases can be a pain, especially if you are trying to use replicas of on-premise databases. AWS recommends that instead of using database replicas, utilize AWS Database Migration Tool. This makes it easier to replicate the data without having to manage yet another database. If you use a AWS Glue ETL job to transform, merge and prepare the data ingested from the database, you can also optimize the resulting data for analytics and take daily snapshots to preserve the database view of the records.



      Organisation of the data is usually a strategy that comes way too late in a data lake project. You should already in the beginning of the project look into organizing the data data into partitions in S3 and partition the data with keys to align with common query filters.

      It is for example sometimes better to create multiple S3 buckets and then partition the buckets on year/month/day/ instead of trying to fit all of your data into one S3 bucket with even more granular partitions. This does in reality depend on what your most common queries look like. Maybe you need to partition on months instead of years depending on your usage.



      For mutable data use a database such as Redshift or Apache HBase but make sure to offload the data to S3 when the data becomes immutable. You can also append delta files to the partitions and compact them on a scheduled jobs to keep the most recent version of the data and delete the rest.

      Remember to compact the data from source before you do analytics – the optimal size is between 256 and 1000 MB. If you need fast ingestion than grabbing the data from S3 you can utilize streaming data to Kinesis streams, process the data with Apache Flink and push the processed data to S3.


      If you’d like some help in AWS Data Lake Formation, please feel free to contact us.


      Get in Touch.

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

        AWS re:Invent 2018: our recap



        Here are some updates and experiences from AWS re:Invent 2018 Las Vegas

        What a week it was! Those were not well-rested people I saw on the flight back home to Finland.

        Mainly three topics were covered last week by Nordcloud team: hosting clients, tuning in on new launches and updates, and personal skills development.

        A plethora of new things were launched during the week – so one week didn´t seem to fit everything. Comprehensive list of all announcements can be found online and in this post I’ll focus on sharing our experiences and highlights from the event.

        AWS has both the biggest market share and growth numbers*

        It is safe to say that AWS is the biggest in the public cloud market (with 51.8% market share) and with biggest growth in absolute numbers ($2.1B). More than half of all Windows workloads in the public cloud (57.7%) are run on AWS. There are a total of 86 premier tier partners globally, and Nordcloud is one of them. The amount of premier tier partners keeps growing because the public cloud usage is constantly increasing. 

        AWS estimates that $2T is spent annually on datacenter maintenance, i.e. keeping the lights on. This explains why Jeff Bezos is interested enough to have migrations reported to him on a monthly basis. From development point of view the future is serverless.


        Our top 3 picks of new service launches

        New service launches were announced during re:Invent. Our CTO Ilja Summala picked his top 3 announcements: Lambda Layers, AWS Security Hub and AWS Outposts.

        Lambda Layers allows the code to be packaged and deployed across multiple functions – this helps code reuse and service management. Security Hub will enable large organisations centralise their control in multi-account environments. AWS Outposts launch means that AWS is entering the hybrid competition – you can have AWS services in your own on-prem datacenter. This will open new opportunities for clients who don’t yet want to migrate to public cloud.

        Being a premier partner is becoming even more premium

        The requirements for different partner tiers are going to be changed somewhat during the first half of 2019, and they are going to be a bit harder to achieve. As the only premier partner in the Nordics, we are going to continue to serve our AWS customers in the Nordics and all across Europe. Nordcloud is also an AWS Competency partner in DevOps. This program expanded with Containers Competency this year.

        Stay tuned for more news on AWS – a new AWS Region Stockholm launch this month.

        *Andy Jassy ‘Keynote’ AWS re:Invent; Las Vegas, USA: 26-30 November 2018


        At Nordcloud we know the AWS cloud, and we can help you take advantages of all the benefits Amazon Web Services has to offer.


        How can we take your business to the next level with AWS?


        Get in Touch.

        Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.