Creating a business case for Reserved Instances

CATEGORIES

Blog

Elastic Compute Cloud (EC2) is one of the most used services on the AWS cloud. To give you a clear idea of how often this service is used within Nordcloud’s customer base, EC2 represents more than 60% of spend, providing businesses with many opportunities to reduce cost. Organisations that invest in this may see significant discounts of up to 75%, especially when comparing to On-Demand pricing, and will provide reserved capacity when used in a specific availability zone.

On the surface, Reserved Instances look simple enough, and who doesn’t like to reduce their costs! However, getting down into the nitty-gritty of Reserved Instances (RIs) can sometimes feel overwhelming. Here is a summary of the different types you can invest in, and how you can create a solid business case.

The different types of Reserved Instances

Standard RIs: These provide the most significant discount (up to 75% off On-Demand) and are best suited for steady-state usage.

Regional RI’s are just a change in the attributes applied to the tokens you purchase, moving the location attribute from the Availability Zone to the Region, allowing an instance of a given type, size and OS to be deployed anywhere within a region and always have RI coverage. Unlike a Standard RI, when purchasing a Regional RI you lose the capacity reservation so you are only benefiting from the cost savings. From March 2017, regional RIs now provide instance size flexibility in addition to AZ flexibility. With instance size flexibility, your regional RI’s discounted rate will automatically apply to usage of any size in the instance family, and in any AZ.

Convertible RI’s require a three-year term but offer a lot more flexibility – so you need to be confident that you are going to be using at least the same number of EC2 Instances (or more!) over the next three years. Consider this type of RI carefully. If you are planning to move workloads to PaaS or to re-architect and go Serverless, these might not be for you.

Convertible RI’s can be exchanged for any other type of RI (size, region, family, OS) so allows for future proofing changes in your infrastructure, or for changes to new instance families which AWS have not yet announced. If you initially purchased a convertible RI for an m4.2xlagre after two years you could swap this for a c4.2xlarge for RI for the final year. When converting the value of your RI’s has to remain at least the same – you may not have a 1:1 cost match with your new RI’s so you might need to buy a little bit more each time to ensure that you maintain the value.

Why should I make a business case?

Evaluating your need correctly will allow you to make the biggest savings and you’ll also be able to drastically reduce the risk associated with purchases simply by knowing how to analyse your usage. Reserved Instances are billed hourly and in order to get the best results of your usage analysis, you’ll have to analyse the data in the same way AWS bills it. Try to analyse at least the last six months in order to have a bigger picture of your usage, note the differences, and find trends.

If you aim for 100% coverage it is very easy to start losing money, because you end up with no On Demand usage left. However, because EC2 usage can be unpredictable, (especially if you’re using autoscaling) you would probably end up paying more than if you left some of your usage for On-Demand instances. If you keep track of utilisation you’ll be able to see early on whether you’re not using some of your purchased Reserved Instances, (you can see your coverage and utilisation data in the AWS Cost Explorer). You’ll then be able to react immediately. A way to repair this kind of situation is changing the usage of your projects to use the instance you have just bought. If you have bought a zonal Reserved Instance you might want to change the zone or modify the instance for regional scope. You might also want to change the size of the instance to cover your current usage.

To find out everything you need to know about optimising Reserved Instances, we’ve created a helpful, in-depth guide.

Blog

Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

Blog

Building better SaaS products with UX Writing (Part 3)

UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

Blog

Building better SaaS products with UX Writing (Part 2)

The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

Get in Touch

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








    Amazon SQS as a Lambda event source = all the fun!

    CATEGORIES

    Blog

    What is Amazon SQS and Lambda and why should I care?

    Amazon Simple Queue Service (Amazon SQS) is a distributed, fully managed message queueing service which was released as one of the first AWS services. It allows you to decouple your application into components which communicate using asynchronous messages. Using a simple, programmatic API you can get started and poll for messages that can be sent from many different sources. It acts as a buffer for your workers, greatly reducing the time spent on a synchronous call by a user – meaning you can send a response and do the work later.

    In November 2014 Amazon released AWS Lambda, which is one of the most recognisable services in Cloud Computing and in my opinion – the best available implementation of Serverless paradigm. It runs code in response to certain events, eg. file uploaded to S3 or just an HTTP request. You don’t need to provision any compute resources.

    But what if you want to connect these two services and make SQS messages trigger Lambda functions? We’ve been waiting for this feature for a very long time, and were tired of creating custom containers with pollers or using SNS as a bad alternative.

    In Nordcloud R&D, we are partial to Serverless and event-driven paradigms however sometimes our Lambda functions call each other asynchronously and become huge, rapidly exceeding concurrency limits and throwing exceptions all over the place. Using SQS to trigger Lambda functions acts like a buffer. We know that lambda has a maximum time limit of 5 minutes, so we can use all the good things that come with SQS – visibility timeouts, at-least-once delivery, dead letter queues and so on. Now it’s possible to not have to provision any containers or EC2 instances (just Serverless code) and let Amazon handle everything for us.

    But before you start using SQS as your event source for Lambda functions, you should know how it’s implemented and what to expect.

    How is it implemented?

    When working with SQS, you need to wait for messages to be received, process them and delete from the queue. If you don’t delete the message, it will come back after specified VisibilityTimeout, because SQS thinks the processing failed and makes it available for consuming again, so you won’t lose any messages. This process is not applicable when using SQS as an event source for Lambda as you don’t touch the SQS part!

    Lambda polls for messages internally then calls your function and, if it completes successfully, deletes the message on your behalf. Make sure that your code throws exceptions if you want to process the message again. Equally important is that you need to return a successful code so you won’t get into an endless loop of duplicated messages. Remember that you are billed for every API call that is made by the internal poller.

    Another thing is that Lambda is invoked synchronously. There’s no retries and the Dead Letter Queue on Lambda has no use. Everything will be handled by Amazon SQS, so find the optimal settings for VisibilityTimeoutmaxReceiveCount and definitely configure DLQ policy. Even though it shouldn’t be a problem, please refrain from setting the VisibilityTimeout equal to the function timeout, as the polling mechanism will consume some additional time and it will be counted as in a processing state.

    You are also limited by the function level concurrent execution limit which defaults to a shared pool of unreserved concurrency allocation (1000 per region). You can lower that by specifying the reserved concurrent executions parameter to a subset of your account’s limit. However, it will subtract that number from your shared pool and it may affect other functions! Plus, if your Lambda is VPC-enabled then Amazon EC2 limits will apply (think ENI).

    If you like taking Amazon SQS up a level like us, you’ll notice that the number of messages in flight will begin to rise. That’s your Lambda gradually scaling out in response to the queue size, eventually hitting the concurrency limit. These messages will be consumed and synchronous invocation will fail with an exception. That’s when your Amazon SQS retry policy comes in hand. Although it is not confirmed anywhere, this behaviour may lead to starvation of certain messages, but you should be already prepared for that!

    One more thing from our R&D division. What happens if you add one queue as an event source for two different functions? That’s right, it will act as a load balancer.

    sqs lambda trigger

    Does it really work?

    We ran some tests with the following assumptions:

    • all messages were available in the queue before enabling the Lambda trigger
    • SQS visibility timeout is set to 1h
    • all test cases are in separate environments and time
    • Lambda does nothing, just sleeps for some specified amount of time

     

    This is what we got:

    Normal use case

    1000 messages, sleep for 3 seconds – nothing really interesting, works as good as we expected it to, it consumed our messages pretty quickly, Cloudwatch didn’t even register the scaling process.

    use case

    Normal use case, heavy load

    Again, 3 seconds sleep but 10000 messages. This is over our concurrency limit, but the scale-out process took more than executing first Lambdas, so it didn’t throttle. It took a little bit longer to consume all of our messages.

    use case 2

     

    Long-running lambdas

    Let’s get back to 1000 messages, but with 240 seconds of sleep. Now AWS is handling the scale-out process for internal workers. You’ll have noticed we have managed to get about 550 concurrent lambdas running. Good news!

    lambdas

     

    Hitting the concurrency limit

    Again, 240 seconds of sleep but let’s push it to the limit: 10000 messages, concurrency limit set to 1000.

    What happened? Again, AWS reacts to the number of messages available in Amazon SQS, so scales internal workers up to a certain point, when the concurrency limit is reached. Of course, in the world of distributed computing and eventual consistency, there is no way it can predict how many Lambdas it can run, so we can finally see it throttle. Throttled Lambdas return exceptions to AWS workers – that’s the signal to stop, but it still tries because perhaps that’s not our global limit and it’s just other functions taking our pool. What is important is that AWS won’t retry function execution, this message will come back to the queue after defined VisibilityTimeout,  you’ll see some invocations after 23:30 (yes, we can’t sleep).

    The same thing happens when you set your own reserved concurrency pool. We ran the same test for a maximum of 50 concurrent executions. Based on the throttling, it was too low.

    Multiple Lambda workers

    This is simply awesome! Amazon SQS gives you a possibility to subscribe multiple functions to one queue! We sent 10000 messages to a queue set as an event source for 4 different functions. You’ll notice that every Lambda was executed about 2500 times. It means that this setup behaves like a load-balancer. However, it’s not possible to subscribe Lambdas from different regions and create a global load balancer.

    We had so much fun trying out this feature. Amazon SQS as an event source for Lambda allows you to easily process messages without using containers or EC2 instances. When it was released, we were thinking about the design of our new project. It matched our requirements perfectly and we are already using it! But remember that these are Lambda workers and the solution is not suitable for heavy load processing, because you are limited to the 5-minute timeout, memory constraints, and concurrency limit. Do you need to queue lots of short tasks? Maybe you need some kind of a buffer to securely execute asynchronous calls? Give it a try, it’s awesome!

    If you’d like to find out more about this service, get in contact with one of our experts here.

    Blog

    Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

    When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

    Blog

    Building better SaaS products with UX Writing (Part 3)

    UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

    Blog

    Building better SaaS products with UX Writing (Part 2)

    The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

    Get in Touch

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








      Update: A new generation of EC2 instances from AWS

      CATEGORIES

      Blog

      AWS announced yesterday they had released a new set of EC2 GPU-powered instances specifically geared towards graphic intensive applications. G3 applications can now be launched to acquire ‘a powerful combination of GPU, CPU, and host memory for workloads such as 3D rendering, 3D visualisations, graphics-intensive remote workstations, video encoding, and virtual reality applications.’

      This new set of instances is a clear step up from G2 in terms of more powerful GPUs, faster processors, and larger host memory and will be available in regions including EU (Ireland), with others opening up in the coming months. It also provides new features such as multi-monitor support, H.265 (HEVC) encoding, enhanced graphics rendering effect, NVIDIA GRID Virtual Workstation features, and Enhanced Networking.

      AWS offers the largest range of instances in the Cloud space and the first to provide these type of graphic focussed applications.

      If you’d like to learn more about instances and how to start your journey with the AWS Cloud, please contact us here. 

      Blog

      Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

      When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

      Blog

      Building better SaaS products with UX Writing (Part 3)

      UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

      Blog

      Building better SaaS products with UX Writing (Part 2)

      The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

      Get in Touch

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.