Problems with DynamoDB Single Table Design


BlogTech Community


DynamoDB is Amazon’s managed NoSQL database service. DynamoDB provides a simple, schemaless database structure and very high scalability based on partitioning. It also offers an online management console, which lets you query and edit data and makes the overall developer experience very convenient.

There are two main approaches to designing DynamoDB databases. Multi Table Design strores each database entity in a separate table. Single Table Design stores all entities in one big common table.

This article focuses mostly on the development experience of creating DynamoDB applications. If you’re working with a large scale project, performance and scalability may be more important aspects for you. However, you can’t completely ignore the developer experience. If you apply Single Table Design, the developer experience will be more cumbersome and less intuitive than with Multi Table Design.

Multi Table Design Overview

DynamoDB is based on individual tables that have no relationships between each other. Despite the limitation, we tend to use them in the same way as SQL database tables. We name a DynamoDB table according to a database entity, and then store instances of that database entity in that table. Each entity gets their own table.

We can call this approach Multi Table Design, because an application usually requires multiple entities. It’s the default way most of us create DynamoDB applications.

Let’s say we have the entities User, Drive, Folder and File. We would typically then have four DynamoDB tables as shown in the database layout below.

The boldface headers are field names, and the numbers are field values organized into table rows. For simplicity, we’re only dealing with numeric identifiers.

UserId(PK)  DriveId(SK)
1           1
1           2
UserId(PK)  FolderId(SK)  ParentDriveId
1           1             1  
1           2             2
UserId(PK)  FileId(SK)    ParentFolderId
1           1             1
1           2             2
1           3             2

Note: PK means Partition Key and SK means Sort Key. Together they are the table’s unique primary key.

It’s pretty easy to understand the structure of this database. Everything is partitioned by UserId. Underneath each User there are Drives which may contain Folders. Folders may contain Files.

The main limitation of Multi Table Design is that you can only retrieve data from one table in one query. If you want to retrieve a User and all their Drives, Folders and Files, you need to make four separate queries. This is particularly inefficient in use cases where you cannot make all the queries in parallel. You need to first look up some data in one table, so that you can find the related data in another table.

Single Table Design Overview

Single Table Design is the opposite of Multi Table Design. Amazon has advocated this design pattern in various technical presentations. For an example, see DAT401 Advanced Design Patterns for DynamoDB by Rick Houlihan.

The basic idea is to store all database entities in a single table. You can do this because of DynamoDB’s schemaless design. You can then makes queries that retrieve several kinds of entities at the same time, because they are all in the same table.

The primary key usually contains the entity type as part of it. The table might thus contain an entity called “User-1” and an entity called “Folder-1”. The first one is a User with identifier “1”. The second one is a Folder with identifier “1”. They are separate because of the entity prefix, and can be stored in the same table.

Let’s say we have the entities User, Drive, Folder and File that make up a hierarchy. A table containing a bunch of these entities might look like this:

PK        SK         HierarchyId
User-1    User-1     User-1/
User-1    Drive-1    User-1/Drive-1/
User-1    Folder-1   User-1/Drive-1/Folder-1/
User-1    File-1     User-1/Drive-1/Folder-1/File-1/
User-1    Folder-2   User-1/Drive-1/Folder-2/
User-1    File-2     User-1/Drive-1/Folder-2/File-2/
User-1    File-3     User-1/Drive-1/Folder-2/File-3/

Note: PK means Partition Key and SK means Sort Key. Together they are the table’s unique primary key. We’ll explain HierarchyId in just a moment.

As you can see, all items are in the same table. The partition key is always User-1, so that all of User-1’s data resides in the same partition.

Advantages of Single Table Design

The main advantage that you get from Single Table Design is the ability to retrieve a hierarchy of entities with a single query. You can achieve this by using Secondary Indexes. A Secondary index provides a way to query the items in a table in a specific order.

Let’s say we create a Secondary Index where the partition key is PK and the sort key is HierarchyId. It’s now possible to query all the items whose PK is “User-1” and that have a HierarchyId beginning with “User-1/Drive-1/”. We get all the folders and files that the user has stored on Drive-1, and also the Drive-1 entity itself, as the result.

The same would have been possible with Multi Table Design, just not as efficiently. We would have defined similar Secondary Indexes to implement the relationships. Then we would have separately queried the user’s drives from the Drives table, folders from the Folders table, and files from the Files table, and combined all the results.

Single Table Design can also handle other kinds of access patterns more efficiently than Multi Table Design. Check the YouTube video mentioned in the beginning of this article to learn more about them.

Complexity of Single Table Design

Why would we not always use Single Table Design when creating DynamoDB based applications? Do we lose something significant by applying it to every use case?

The answer is yes. We lose simplicity in database design. When using Single Table Design, the application becomes more complicated and unintuitive to develop. As we add new features and access patterns over time, the complexity keeps growing.

Just managing one huge DynamoDB table is complicated in itself. We have to remember to include the “User-” entity prefix in all queries when working with AWS Console. Simple table scans aren’t possible without specifying a prefix.

We also need to manually maintain the HierarchyId composite key whenever we create or update entities. It’s easy to cause weird bugs by forgetting to update HierarchyId in some edge case or when editing the database manually.

As we start adding sorting and filtering capabilities to our database queries, things get even more complicated.

Things Get More Complicated

Now, let’s allow sorting files by their creation date. Extending our example, we might have a table design like this:

PK      SK        HierarchyId                      CreatedAt
User-1  User-1    User-1/                          2019-07-01
User-1  Drive-1   User-1/Drive-1/                  2019-07-02
User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03
User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04
User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05
User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06
User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07

How do we retrieve the contents of Folder-2 ordered by the CreatedAt field? We add a Global Secondary Index for this access pattern, which will consist of GSI1PK and GSI1SK:

PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK
User-1  User-1    User-1/                          2019-07-01  User-1/           ~
User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02
User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~
User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04
User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~
User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06
User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07

We’ll get to the semantics of GSI1PK and GSI1SK in just a moment.

But why did we call these fields GSI1PK and GSI1SK instead of something meaningful? Because they will contain different kinds of values depending on the entity stored in each database item. GSI1PK and GSI1SK will be calculated differently depending on whether the item is a User, Drive, Folder or File.

Overloading Names Adds Cognitive Load

Since it’s not possible to give GSI keys sensible names, we just call them GSI1PK and GSI1SK. These kind of generic field names add cognitive load, because the fields are no longer self-explanatary. Developers need to check development documentation to find out what exactly GSI1PK and GSI1SK mean for some  particular entity.

So, why is the GSI1PK field not the same as HierarchyId? Because in DynamoDB you cannot query for a range of partition key values. You have to query for one specific partition key. In this use case, we can query for GSI1PK = “User-1/” to get items under a user, and query for GSI1PK  = “User-1/Folder-1” to get items under a user’s folder.

What about the tilde (~) characters in some GS1SK values? They implement reverse date sorting in a way that also allows pagination. Tilde is the last printable character in the ASCII character set and will sort after all other characters. It’s a nice hack, but it also adds even more cognitive load to understanding what’s happening.

When we query for GSI1PK = “User-1/Folder-1/”  and sort the results by GSI1SK in descending key order, the first result is Folder-1 (because ~ comes after all other keys) and the following results are File-2 and File-3 in descending date order. Assuming there are lots of files, we could continue this query using the LastEvaluatedKey feature of DynamoDB and retrieve more pages. The parent object (Folder-1) always appears in the first page of items.

Overloaded GSI Keys Can’t Overlap

You may have noticed that we can now also query a user’s drives in creation date order. The GSI1PK and GSI1SK fields apply to this relationship as well. This works because the relationship between the User and Drive entities does not not overlap with the relationship between the Folder and File entities.

But what happens if we need to query all the Folders under a Drive? Let’s say the results must, again, be in creation date order.

We can’t use the GSI1 index for this query because the GSI1PK and GSI1SK fields already have different semantics. We already use those keys to retrieve items under Users or Folders.

So, we’ll create a new Global Secondary Index called GSI2, where GSI2PK and GSI2SK define a new relationship. The fields are shown in the table below:

PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK      GSI2PK           GSI2SK
User-1  User-1    User-1/                          2019-07-01  User-1/           ~
User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02  User-1/Drive-1/  ~
User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~           User-1/Drive-1/  2019-07-03
User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04  User-1/Drive-1/  2019-07-04
User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~           User-1/Drive-1/  2019-07-05
User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06
User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07

Note: Please scroll the table horizontally if necessary.

Using this new index we can query for GSI2PK = “User-1/Drive-1/” and sort the results by GSI2SK to get the folders in creation date order. Drive-1 has a tilde (~) as the sort key to ensure it comes as the first result on the first page of the query.

Now It Gets Really Complicated

At this point it’s becoming increasingly more complicated to keep track of all those GSI fields. Can you still remember what exactly GSI1PK and GSI2SK mean? The cognitive load is increasing because you’re dealing with abstract identifiers instead of meaningful field names.

The bad news is that it only gets worse. As we add more entities and access patterns, we have to add more Global Secondary Indexes. Each of them will have a different meaning in different situations. Your documentation becomes very important. Developers need to check it all the time to find out what each GSI means.

Let’s add a new Status field to Files and Folders. We will now allow querying for Files and Folders based on their Status, which may be VISIBLE, HIDDEN or DELETED. The results must be sorted by creation time.

We end up with a design that requires three new Global Secondary Indexes. GSI3 will contain files that have a VISIBLE status. GSI4 will contain files that have a HIDDEN status. GSI5 will contain files that have a DELETED status. Here’s what the table will look like:

PK      SK        HierarchyId                      CreatedAt   GSI1PK            GSI1SK      GSI2PK           GSI2SK      Status    GSI3PK                    GSI3SK      GSI4PK                   GSI4SK      GSI5PK                     GSI5SK
User-1  User-1    User-1/                          2019-07-01  User-1/           ~
User-1  Drive-1   User-1/Drive-1/                  2019-07-02  User-1/           2019-07-02  User-1/Drive-1/  ~
User-1  Folder-1  User-1/Drive-1/Folder-1/         2019-07-03  User-1/Folder-1/  ~           User-1/Drive-1/  2019-07-03  VISIBLE   User-1/Folder-1/VISIBLE/  ~           User-1/Folder-1/HIDDEN/  ~           User-1/Folder-1/DELETED/   ~
User-1  File-1    User-1/Drive-1/Folder-1/File-1/  2019-07-04  User-1/Folder-1/  2019-07-04  User-1/Drive-1/  2019-07-04  VISIBLE   User-1/Folder-1/VISIBLE/  2019-07-04  User-1/Folder-1/HIDDEN/  2019-07-04  User-1/Folder-1/DELETED/
User-1  Folder-2  User-1/Drive-1/Folder-2/         2019-07-05  User-1/Folder-2/  ~           User-1/Drive-1/  2019-07-05  VISIBLE   User-1/Folder-2/VISIBLE/  ~           User-1/Folder-2/HIDDEN/  ~           User-1/Folder-2/DELETED/   ~
User-1  File-2    User-1/Drive-1/Folder-2/File-2/  2019-07-06  User-1/Folder-2/  2019-07-06                               HIDDEN    User-1/Folder-2/VISIBLE/              User-1/Folder-2/HIDDEN/  2019-07-06  User-1/Folder-2/DELETED/
User-1  File-3    User-1/Drive-1/Folder-2/File-3/  2019-07-07  User-1/Folder-2/  2019-07-07                               DELETED   User-1/Folder-2/VISIBLE/              User-1/Folder-2/HIDDEN/              User-1/Folder-2/DELETED/   2019-07-07

Note: Please scroll the table horizontally if necessary.

You may think this is getting a bit too complicated. It’s complicated because we still want to be able to retrieve both a parent item and its children in just one query.

For example, let’s say we want to retrieve all VISIBLE files in Folder-1. We query for GSI3PK = “User-1/Folder-1/VISIBLE/” and again sort the results in descending order as earlier. We get back Folder-1 as the first result and File-1 as the second result. Pagination will also work if there are more results. If there are no VISIBLE files under the folder, we only get a single result, the folder.

That’s nice. But can you now figure out how to retrieve all DELETED files in Folder-2? Which GSI will you use and what do you query for? You probably need to stop your development work for a while and spend some time reading the documentation.

The Complexity Multiplies

Let’s say we need to add a new Status value called ARCHIVED. This will involve creating a yet another GSI and adding application code in all the places where Files or Folders are created or updated. The new code needs to make sure that GSI6PK and GSI6SK are generated correctly.

That’s a lot of development and testing work. It will happen every time we add a new Status value or some other way to perform conditional queries.

Later we might also want to add new sort fields called ModifiedAt and ArchivedAt. Each new sort field will require its own set of Global Secondary Indexes. We have to create a new GSI for every possible Status value and sort key combination, so we end up with quite a lot of them. In fact, our application will now have GSI1-GSI18, and developers will need to understand what GSI1PK-GSI18PK and GSI1SK-GSI18SK mean.

In fairness, this complexity is not unique to Single Table Design. We would have similar challenges when applying Multi Table Design and implementing many different ways to query data.

What’s different in Multi Table Design is that each entity will live in its own table where the field names don’t have to be overloaded. If you add a feature that involves Folders, you only need to deal with the Folders table. Indexes and keys will have semantically meaningful names like “UserId-Status-CreatedAt-index”. Developers can understand them intuitively without referring to documentation all the time.

Looking for a Compromise

We can make compromises between Single Table Design and Multi Table Design to reduce complexity. Here are some suggestions.

First of all, you should think of Single Table Design as an optimization that you might be applying prematurely. If you design all new applications from scratch using Single Table Design, you’re basically optimizing before knowing the real problems and bottlenecks.

You should also consider whether the database entities will truly benefit from Single Table Design or not. If the use case involves retrieving a deep hierarchy of entities, it makes sense to combine those entities into a single table. Other entities can still live in their own tables.

In many real-life use cases the only benefit from Single Table Design is the ability to retrieve a parent entity and its children using a single DynamoDB query. In such cases the benefit is pretty small. You could just as well make two parallel requests. Retrieve the parent using GetItem and the children using a Query. In an API based web application the user interface can perform these requests in parallel and combine the results in the frontend.

Many of the design patterns related to Single Table Design also apply to Multi Table Design. For instance, overloaded composite keys and secondary indexes are sometimes quite helpful in modeling hierarchies and relationships. You can use them in Multi Table Design without paying the full price of complexity that Single Table Design would add.

In summary, you should use your judgment case by case. Don’t make blanket policy to design every application using either Single Table Design or Multi Table Design. Learn the design patterns and apply them where they make sense.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    Day 2 at Re:Invent – Builders & Musicians Come Together



    When Werner Vogels makes bold statements, expectations are set high. So when Vogel’s tweeted 15 minutes before the start of re:Invent’s day 2 keynote, we had to wonder what was coming.

    And how right we were. The close to 3 hours spent in the Venetian hotel in Las Vegas was an experience in itself.

    Andy Jassy opened the keynote with a long list of customers and partners, alongside the latest business figures. AWS are currently running at an 18 billion run rate with an incredible 42% YoY growth. With millions of active customers – defined as accounts that have used AWS in the last 30 days – the platform is by far the most used on the planet.

    As per Gartner’s 2016 Worldwide Market Segment Share analysis, the company (successfully led by Jassy), has achieved a 44.1% market share in 2016, up from 39% in 2015, more than everyone else combined. This became easily noticeable when AWS displayed an entire catalogue of new services throughout the keynote. The general stance Jassy took this year was that AWS are trying to serve their customers exactly what they asked for in terms of new products. The mission of AWS is nothing short of fixing the IT industry in favour of the end-users and customers.

    The first on stage was a live ‘house’ band, performing a segment of ‘Everything is Everything’ by Lauryn Hill, the chorus rhyming with ‘after winter must come spring’. Presumably, AWS was referring to the world of IT still being in a kind of eternal ‘winter’. The concept we also heard here was that AWS would not stop building their portfolio and that they want to offer all the tools their ‘builders’ and customers need.

    AWS used Jassy’s keynote for some big announcements (of course, set to music), with themes across the following areas:

    • Compute
    • Database
    • Data Analytics
    • Machine Learning and
    • IoT

    The Compute Revolution Goes On

    Starting in the compute services area, an overview of the vast number of compute instance types and families were shown, with special emphasis given to the Elastic GPU options. There were a few announcements also made on the Tuesday night, including Bare Metal InstancesStreamlined Access to Spot Capacity & Hibernationmaking it easier for you to get up to 90% of savings on normal pricing. There was also M5 instances which offer better-priced performance than their predecessors, and H1 instances offering fast and dense storage for Big Data applications.

    However, with the arrival of Kubernetes in the industry, it was the release of the Elastic Kubernetes that was the most eagerly anticipated. Not only have AWS recognised that their customers wanted Kubernetes on AWS, but they also realise that there’s a lot of manual labour involved in maintaining and managing the servers that run ECS & EKS.

    To solve this particular problem, AWS announced AWS Fargate, a fully managed service for both ECS & EKS meaning no more server management and therefore increasing the ROI in running containers on the platform. This is available for ECS now and will be available for EKS in early 2018.

    Having started with servers and containers, Jassy then moved on to the next logical evolution of infrastructure services: Serverless. With a 300% usage growth, it’s fair to say that if you’re not running something on Lambda yet, you will be soon. Jassy reiterated that AWS are building services that integrate with the rest of the AWS platform to ensure that builders don’t have to compromise. They want to make progress and get things done fast. Ultimately, this is what AWS compute will mean to the world: faster results. Look out for a dedicated EKS blog post coming soon!

    Database Freedom

    The next section of the keynote must have had some of AWS’s lawyers on the edge of their seats, and also the founder of a certain database vendor… AWS seem to have a clear goal to put an end to the historically painful ‘lock-in’ some customers experience, referring frequently to ‘database freedom’. There’s a lot of cool things happening with databases at the moment, and many of the great services and solutions shown at re:Invent are built using AWS database services. Out of all of these, Aurora is by far growing the fastest, and actually is the fastest growing service in the entire history of AWS.

    People love Aurora because it can scale out for millions of reads per second. It can also autoscale new read replicas and offers seamless recovery from reading replica failures. People want to be able to do this faster, which is why AWS launched a new Aurora features, Auto Multi-Master. This allows for zero application downtime due to any write node failure (previously, AWS suggested this took around 30 seconds), and zero downtime due to an availability zone failure. During 2018 AWS will also introduce the ability to have multi-region masters – this will allow customers to easily scale their applications across regions have a single, consistent data source.

    Lastly, and certainly not least, was the announcement of Aurora Serverless. which is an on-demand, auto-scaling, Serverless version of Aurora. The users pay by the second – an unbelievably powerful feature for many use cases.

    Finally, Jassy turned its focus point to DynamoDB service, which scaled to ~12.9 million requests per second at its peak during the last Amazon Prime Day. Just let that sink in for a moment! The DynamoDB service is used by a huge number of major global companies, powering mission-critical workloads of all kinds. The reason for this is, from our perspective, is the fact that it’s very easy to access and use as a service. What was announced today was the new feature DynamoDB Global Tables. This enables users to build high performance, globally distributed applications.

    The final database feature released for DynamoDB was managed back-up & restore, allowing for on-demand backups, point-in-time recovery (in the past 35 days), allowing backups for data archival or regulatory requirements to be taken of hundreds of TB with no interruption.

    Jassy wrapped up the database section of his keynote by announcing Amazon Neptune, a fully managed graph database which will make it easy to build and run applications that work with highly connected data sets.


    Next Jassy turned to Analytics, commenting that people want to be using S3 as their data lake. Athena allows for easy querying of structured data within S3, however, most analytics jobs involve processing only a subset of the data stored within S3 objects and Athena requires the whole object to the processed. To ease the pain, AWS released S3 Select – allowing for applications, (including Athena) to retrieve a subset of data from an S3 object using simple SQL expressions – AWS claim drastic performance increases – possibly up to 400% performance.

    Many of our customers are required by regulation to store logs for up to 7 years and as such ship them to Glacier to reduce the cost of storage. This becomes problematic if you need to query this data though. How great would it be if this could become part of your data lake? Jassy asked, before announcing Glacier Select. Glacier Select allows for queries to be run directly on data stored in Glacier, extending your data lake into Glacier while reducing your storage costs.

    Machine Learning

    The house band introduced Machine Learning with ‘Let it Rain’ from Eric Clapton. Dr Matt Woods made an appearance and highlighted how important machine learning is to Amazon itself. The company uses a lot of it, from personal recommendations on to Fulfillment automation & inventory in its warehouses.

    Jassy highlighted that AWS only invests in building technology that its customers need, (and, remember is a customer!) not because it is cool, or it is funky. Jassy described three tiers of Machine Learning: Frameworks and Interfaces, Platform Services & Application Services.

    At the Frameworks and Interfaces tier emphasis was placed on the broad range of frameworks that could be used on AWS, recognising that one shoe does not fit every foot and the best results come when using the correct tool for the job. Moving to the Platform Services tier, Jassy highlighted that most companies do not have to expect machine learning practitioners (yet) – it is after all a complex beast. To make this easy for developers, Amazon SageMaker was announced – a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale.

    Also at the platform tier, AWS launched DeepLens, a deep learning enabled wireless video camera designed to help developers grow their machine learning skills. This integrates directly with SageMaker giving developers an end-to-end solution to learn, develop and test machine learning applications. DeepLens will ship in early 2018, available on for $249.

    The machine learning announcements did not stop there! As Jassy moved into the Application Services tier AWS launched:


    Finally, Jassy turned to IoT – identifying five ‘frontiers’ each with its own release, either available now, or in early 2018:

    1. Getting into the game – IoT One Click (in Preview) will make it easy for simple devices to trigger AWS Lambda functions that execute a specific action.
    2. Device Management – AWS IoT Device Management will provide fleet management of connected devices, including the onboarding, organisation, monitor and remote management through a devices lifetime.
    3. IoT Security – AWS IoT Device Defender (early 2018) will provide security management to your fleet of IoT devices, including auditing to ensure your fleet meets best practice.
    4. IoT Analytics – AWS IoT Analytics, making it easy to cleanse, process, enrich, store, and analyze IoT data at scale.
    5. Smaller Devices – Amazon FreeRTOS, an operating system for microcontrollers.

    Over the next weeks and days, the Nordcloud team will be diving deeper into these new announcements, (including our first thoughts after getting our hands on the new releases) We’ll also publish our thoughts and how they can benefit you.

    It should be noted that, compared to previous years, AWS are announcing more outside the keynotes, in sessions and on their Twitch Channel and so there are many new releases which are not gaining the attention they might deserve. Examples include T2 UnlimitedInter-Region VPC Peering and Launch Templates for EC2 – as always the best place to keep up-to-date is the AWS ‘whats new‘ page.

    If you would like to discuss how any of today’s announcements could benefit your business, please get in touch.

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

      5 Things To Know When Starting To Use DynamoDB & NoSQL Databases



      DynamoDB is the managed NoSQL database service of AWS. I’ve used it in production on a large scale and learned my lessons on some things that you really need to know when you start using DynamoDB and NoSQL databases in general. I will cover five of the most important ones here.

      1. The Basics

      So, first of all, what is a NoSQL database? The idea of these databases is that they are key-value stores that cannot use complex SQL queries or relations. The lack of relations is important for the design, so they can also be called non relational databases. So, why to use them? The upside of such system is that you can actually make them highly scalable and distributed since key-value stores are easier to spread over multiple partitions. This scalability is managed by AWS when you use DynamoDB and all you need to take care of is table and query design and making sure you have enough throughput provisioned. DynamoDB is also very durable storage because there will be three geographically distributed replicas of your tables.

      2. Table Design

      DynamoDB, like most other NoSQL databases, is (almost) schema-less. This means that you don’t define a strict schema for your tables but rather just define the primary key and indexes. At any point you can then decide to add any kind of attribute to any of your items. The items in a table don’t even need to have the same attributes. A common misconception is that because the tables don’t have a fixed schema, you wouldn’t need to focus on table design. This couldn’t be more wrong! It is extremely important to pick good primary keys for your tables. Your primary key can just be a partition key or a combination of a partition key and a sort key. If you want to do queries on your table, they can only be done on the sort key and this is why you need to be careful with your key schema. Also, just to emphasize it, you cannot do queries on your partition key and the reason is obvious: your data is scattered across multiple partitions based on your partition key so it wouldn’t be very effective to do queries across all partitions.

      3. Indexing

      In addition to making queries on your sort key, you can define additional indexes. Local secondary index is kind of like another sort key that uses your table’s partition key and your selected attribute for queries. So with a local secondary index, you can do queries on other attributes in your table. These indexes consume throughput and storage space so you should use them quite sparingly. In addition, you can define Global secondary indexes that can have a different partition key instead of your table’s main partition key. Global secondary indexes have their own throughput that you can define separately and of course pay for separately too. No matter how many indexes you add, you can only query one index at a time!

      4. Table items and throughput 

      So, you have your table now defined and you want to start adding items to it. Adding and updating items consume your write throughput and requesting and querying items consume your read throughput. These throughput values can be changed at any time also on-the-go so there’s no need for a maintenance break for your application when changing these values. DynamoDB will then automatically handle any additional partitioning needed to achieve your requested throughput.

      The maximum size for a single item in a DynamoDB table is 400 KB. You probably want to actually aim for a lot less. The bigger your items are, the more throughput you consume. DynamoDB has quite a nice set of data types from maps and lists (JSON!) to basic strings, numbers and so on. It can be tempting to put larger items in JSON format to DynamoDB but you should always consider the maximum item size when doing so. Best practice is to put larger items to S3 and have a reference in DynamoDB.

      5. Use Cases

      An optimal use case for a DynamoDB table is a simple table where you have a wide range of different keys and possibly a simple sort key attached to them. For example, storing simple data of your users that doesn’t have relations to other tables (user-id as the partition key) or something like sensor data and clickstreams are excellent examples of use cases for DynamoDB. You can also use it together with a relational databases for the parts of your data that need high throughput for reads and writes.

      Getting started is easy, just login to your AWS account and create a table! In addition, use cases are covered in the labs and modules on our Architecting on AWS and Developing on AWS courses that we deliver at Nordcloud.

      Get in Touch.

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.