Amazon DynamoDB
Consistency and Throughput
Author: Donatien MBADI OUM, Database
Solution Architect | Oracle | AWS |Azure
1.
Overview
Amazon DynamoDB is a fully managed, serverless NoSQL database service in
the cloud that supports key-value and document structures. It designed for high
performance applications at any scale and includes built-in security, backup
and restore, flexible capacity modes, multi-region replication,
in-memory-caching and more.
DynamoDB is:
Fast: Provides ultra-low latencies with
single-digit millisecond performance. Combine in-memory caching with DAX
(DynamoDB Acceleretor, you can get as low as microsecond latency)
Flexible: Supports No
SQL type of structure
Cost-effective: You pay as you go
Fault tolerance: Tour data is replicated across multiple AZ
Secure: Your data is
encrypted at rest and in-transit
DynamoDB provides a Global tables which is a multi-region and multi-master
version and the Backup and restore functions with PITR (Point In Time Recovery)
in the past 35 days.
DynamoDB supports CRUD (Create/Read/Update/Delete) operations through APIs
and transactional across multiple tables. With DynamoDb, you cannot query
multiple tables at once and you cannot run analytical operations or aggregation
functions like on relational databases.
The most important aspect of DynamoDB is that the table access pattern must
be known ahead of time for efficient design and performance.
2.
DynamoDB Terminology
2.1. Tables, Items and Attributes
A table is a group of data records. This concept is similar to a table in a
relational database or a collection in MongoDB. Tables are top-level entities
but there is no strict inter-table relationships, so the tables are always
independent of each other, this allow to control the performance at the table
level. For example, you might have a USERS table to store data about users and PLAYERS
table to store data about players.
An item is a single data record in a table. Items are similar in many ways
to rows, records or tuples in other database systems. For example, for USERS
table, each item represent a user. Each item in a table is uniquely identified
by the stated primary key of the table. Items are stored as JSON
(DynamoDB-specific JSON).
Attributes are pieces of data attached to a single item. An attribute is
comparable to a column in a relational database of field in MongoDB. For
example a user in USERS table contains attributes called userid, username,
dateofbirth.
2.2. Primary Key
Each item in a table is uniquely identified by a primary key. The primary
definition must be defined at the creation of the table. DynamoDB supports two
different types of primary keys:
-
Partition key. It’s a simple
key composed of one attribute. For example, the user_id is a simple primary key
of USERS table. You can access any item in the USERS table directly by
providing the userid value for that item.
For example, a USERS table is defines as below:
user_id |
username |
dateofbirth |
profile |
createdate |
12035 |
dmbadi |
1971-05-24 |
admin |
2022-05-12 15:10:02 |
20136 |
jdongmo |
1982-07-12 |
developer |
2023-01-04 09:22:50 |
25123 |
nangouande |
1980-05-13 |
analyst |
2021-11-19 18:25:13 |
54217 |
mhadama |
1998-11-05 |
analyst |
2022-03-18 17:15:05 |
-
Partition key and Sort key. It referred as composite primary key. This type of key is composed of two
attributes, the first attribute is the partition key and the second attribute
is the sort key. In this case all items with the same partition key value are
stored together, in sorted order by sort key value.
For example, a PLAYERS table is defined as below:
player_id |
game_id |
game_ts |
result |
duration |
jm001 |
2564 |
2023-04-02 14:52:01 |
win |
15 |
rm054 |
3215 |
2023-02-15 00:45:42 |
win |
30 |
jm001 |
125 |
2023-03-25 12:12:34 |
lose |
47 |
ra0152 |
3504 |
2023-04-03 08:45:12 |
win |
12 |
ra0152 |
120356 |
2023-04-02 15:12:05 |
lose |
20 |
This PLAYERS table is an example of table with a composite primary key,
player_id as partition key and game_id as sort key. You can access any item in
the PLAYERS table by providing the player_id and game_id. A composite primary
key gives you additional flexibility when querying data. If you provide only
the value of player_id, DynamoDB retrieves all the games by player. To provide
a subset of games by particular player, you can provide a value for player along
with a range of values for game_id.
2.3. Secondary indexes
The primary key uniquely identifies an item in a table, and you may make
queries against the table using the primary key. However, sometimes you have
additional access patterns that would be inefficient with your primary key.
DynamoDB introduces the notion of secondary indexes to enable these additional
access patterns. There are two types of secondary indexes:
-
Local Secondary Index (LSI). You can only add a local secondary indexes on tables with composite
primary key. A local secondary index maintains the same HASH key as the
underlying table while allowing for a different RANGE key.
For example on a PLAYERS table primary key is player_id + game_id and LSI can be player_id + game_ts or player_id + results or player_id + duration.
You can define up to 5 LSIs and the index items must be
specified at the table creation and cannot be deleted later (You have to create
a new table). For the given HASH key, you may only store up to 10GB of data.
LSI support eventual, strong and transactional consistency but
you can only query single partition (specified by HASH key). However, you can
query any table attributes even if the attributes are not projected on the
index.
LSI consumes provisioned throughput of the base table.
-
Global Secondary Index (GSI). You can
add a global secondary indexes to tables with either simple primary keys or
composite keys. You can have a same or different partition/HASH key and/or a
same sort/range key than the table’s primary key. You can also omit sort/range
key.
For example on a PLAYERS table primary key is player_id + game_id and GSI can be player_id, game_id, player_id + results, game_ts
+ game_id, game_ts_id + duration.
You can have up to 20 GSIs (soft limit) and there is no
restrictions for indexed items. You can or delete GSI any time but you can
delete only one GSI at a time.
GSI only support eventual consistency and you can query across
partitions (over entire table). However, you can query projected attributes
(attributes included in the index)
GSI has its own provisioned throughput
1.
DynamoDB Consistency model
1.1. Understanding Consistency
A database consistency model determines the manner and timing in which a
successful write or update is reflected in a subsequent read operation of that
same value. You have read consistency and write consistency.
1.2. Read and write Consistency
Amazon DynamoDB lets you specify the desired consistency characteristics
for each read request within an application. You can specify either a read is
eventually consistent, strongly consistent or transactional consistent. You
also have a standard write consistency and a transactional write consistency
The eventual consistency option is a default in Amazon DynamoDB and
maximizes the read throughput. However, an eventually consistent read may not
always reflect the results of the recently completed write. Consistency across
all copies of data reached within a second.
The strongly consistency returns a result that reflects all writes that
received a succeful response prior to the read. To get a strongly consistent
read result, you can specify a ConsistentRead parameter to TRUE in your read request.
It takes more resources to process a strongly consistent read than an
eventually consistent read and eventually consistency is 50% cheaper than
strong consistency.
Transactional consistent reads are used for ACID support one or more tables
within a single AWS account and region. It allows you to write to multiple
tables once or not at all. It’s 2x the cost of strongly consistent reads and 2x
the cost of standards writes.
2.
Pricing Model
Amazon DynamoDB charges for reading, writing and storing data in your
tables, along with any optional features you choose to enable. DynamoDB has two
capacity modes: provisioned capacity and on-demand capacity.
2.1. Provisioned Capacity
With provisioned capacity, you specify the number of reads and writes per
second that you expect your application to require, you pay for the capacity
you provision. You can use auto-scaling to automatically adjust the provision
capacity to ensure application performance while reducing costs.
Provisioned capacity uses Read Capacity Units (RCUs) and Write Capacity
Units (WCUs). These are the units in which you provision your capacity and if
you consume beyond that provisioned capacity, it might result in throttling,
that why you must use auto-scaling.
Along with provisioned capacity, you can also use Reserved Capacity for
discounts over 1 or 3 years term contracts. In this case, you’re charged a
one-time fee, plus an hourly fee per 100 RCUs and WCUs.
Provisioned capacity mode might be best if you have predictable application
traffic or you run application whose traffic is consistent or ramps gradually
or you can forecast capacity requirements to control costs.
In Provisioned capacity mode:
§ A Read Capacity Unit (RCU) consist of blocks of 4KB, last
block always rounded up.
§ 1 strongly consistent read/sec = 1RCU
§ 2 eventually consistent read/sec = 1RCU
§ 1 transactional read/sec = 2RCUs
·
A Write Capacity Unit (WCU) consist of
blocks if 1KB, last block always rounded up
·
1 table write/sec = 1WCU
·
1 transactional write/sec = 2WCUS
Note: If you are underutilizing your
provisioned capacity, DynamoDB will preserve about 5 minutes of unused read and
write capacity for future use, this called a DynamoDB Burst Capacity. This
capacity gets utilized whenever you exceed your provisioned capacity.
2.2. On-Demand Capacity
With on-demand capacity, DynamoDB charges you the data reads and writes
your application performs on your tables. You do not need to specify how much
read and write throughput you expect your application to perform. DynamoDb
instantly accommodates your workloads as they ramp up or down. Throttling can
occur if you exceed 2x the previous peak within 30 minutes.
You pay the number of request (read and write) your application makes and
then you don’t need to provision capacity units.
On-Demand Capacity uses Read Request Units and Write Request Units, similar
to Read Capacity Units and Write Capacity Units and you cannot use reserved
capacity with On-Demand mode.
On-Demand capacity mode might be best if you create new tables with unknown
workloads or you have unpredictable application traffic or you prefer the ease
of paying for only what you use.
In On-Demand capacity mode:
§ A Read Request Unit (RRU) consist of clocks of 4KB, last block
always rounded up.
§ 1 strongly consistent read request = 1RRU
§ 2 eventually consistent read request = 1RRU
§ 1 transactional read requests = 2RRUs
·
A Write Request Unit (WRU) consist of
blocks if 1KB, last block always rounded up
·
1 table write request = 1WRU
·
1 transactional write request = 2WRUS
3.
Examples of calculating Capacity Units and Throughput
3.1. Calculating Capacity units
How to calculate capacity units to read 25KB of item?
As 1 RCU consists of blocks of 4KB, 25KB/4KB = 6.25, rounded up to 7 RCUs.
-
With Strong consistency reading 25KB
will consume about 7 RCUs
-
With Eventually consistency reading
25KB will consume ½ x 7 RCUS = 3.5 rounded up to 4 RCUs
-
If you are use transactional
consistency you consume 2 x 7 RCUs = 14 RCUs
As 1 WCU consist of blocks of 1KB,
-
A standard Write is 25KB/1KB = 25WCUs
-
A transaction write is 2 x 25 WCUs =
50 WCUs
3.2. Calculating Throughput
How to calculate the amount of throughput your application can support?
A DynamoDB table has provisioned capacity of 15 RCUs and 10WCUs.
-
Read Throughput with strong
consistency = 4KB x 15 = 60KB/sec
-
Read Throughput with eventual
consistency = 2 x (60KB/sec) = 120KB/sec
-
Transactional read throughput = ½ x
(60KB/sec) = 30KB/sec
-
A Write throughput = 1KB x 10 =
10KB/sec
Aucun commentaire:
Enregistrer un commentaire