DynamoDB is the managed NoSQL database service of AWS. I’ve used it in production on a large scale and learned my lessons on some things that you really need to know when you start using DynamoDB and NoSQL databases in general. I will cover five of the most important ones here.
1. The Basics
So, first of all, what is a NoSQL database? The idea of these databases is that they are key-value stores that cannot use complex SQL queries or relations. The lack of relations is important for the design, so they can also be called non relational databases. So, why to use them? The upside of such system is that you can actually make them highly scalable and distributed since key-value stores are easier to spread over multiple partitions. This scalability is managed by AWS when you use DynamoDB and all you need to take care of is table and query design and making sure you have enough throughput provisioned. DynamoDB is also very durable storage because there will be three geographically distributed replicas of your tables.
2. Table Design
DynamoDB, like most other NoSQL databases, is (almost) schema-less. This means that you don’t define a strict schema for your tables but rather just define the primary key and indexes. At any point you can then decide to add any kind of attribute to any of your items. The items in a table don’t even need to have the same attributes. A common misconception is that because the tables don’t have a fixed schema, you wouldn’t need to focus on table design. This couldn’t be more wrong! It is extremely important to pick good primary keys for your tables. Your primary key can just be a partition key or a combination of a partition key and a sort key. If you want to do queries on your table, they can only be done on the sort key and this is why you need to be careful with your key schema. Also, just to emphasize it, you cannot do queries on your partition key and the reason is obvious: your data is scattered across multiple partitions based on your partition key so it wouldn’t be very effective to do queries across all partitions.
In addition to making queries on your sort key, you can define additional indexes. Local secondary index is kind of like another sort key that uses your table’s partition key and your selected attribute for queries. So with a local secondary index, you can do queries on other attributes in your table. These indexes consume throughput and storage space so you should use them quite sparingly. In addition, you can define Global secondary indexes that can have a different partition key instead of your table’s main partition key. Global secondary indexes have their own throughput that you can define separately and of course pay for separately too. No matter how many indexes you add, you can only query one index at a time!
4. Table items and throughput
So, you have your table now defined and you want to start adding items to it. Adding and updating items consume your write throughput and requesting and querying items consume your read throughput. These throughput values can be changed at any time also on-the-go so there’s no need for a maintenance break for your application when changing these values. DynamoDB will then automatically handle any additional partitioning needed to achieve your requested throughput.
The maximum size for a single item in a DynamoDB table is 400 KB. You probably want to actually aim for a lot less. The bigger your items are, the more throughput you consume. DynamoDB has quite a nice set of data types from maps and lists (JSON!) to basic strings, numbers and so on. It can be tempting to put larger items in JSON format to DynamoDB but you should always consider the maximum item size when doing so. Best practice is to put larger items to S3 and have a reference in DynamoDB.
5. Use Cases
An optimal use case for a DynamoDB table is a simple table where you have a wide range of different keys and possibly a simple sort key attached to them. For example, storing simple data of your users that doesn’t have relations to other tables (user-id as the partition key) or something like sensor data and clickstreams are excellent examples of use cases for DynamoDB. You can also use it together with a relational databases for the parts of your data that need high throughput for reads and writes.
Getting started is easy, just login to your AWS account and create a table! In addition, use cases are covered in the labs and modules on our Architecting on AWS and Developing on AWS courses that we deliver at Nordcloud.