Database and Datascience: Amazon DynamoDB Consistency and Throughput

Amazon DynamoDB Consistency and Throughput

Author: Donatien MBADI OUM, Database Solution Architect | Oracle | AWS |Azure

1. Overview

Amazon DynamoDB is a fully managed, serverless NoSQL database service in the cloud that supports key-value and document structures. It designed for high performance applications at any scale and includes built-in security, backup and restore, flexible capacity modes, multi-region replication, in-memory-caching and more.

DynamoDB is:

Fast: Provides ultra-low latencies with single-digit millisecond performance. Combine in-memory caching with DAX (DynamoDB Acceleretor, you can get as low as microsecond latency)

Flexible: Supports No SQL type of structure

Cost-effective: You pay as you go

Fault tolerance: Tour data is replicated across multiple AZ

Secure: Your data is encrypted at rest and in-transit

DynamoDB provides a Global tables which is a multi-region and multi-master version and the Backup and restore functions with PITR (Point In Time Recovery) in the past 35 days.

DynamoDB supports CRUD (Create/Read/Update/Delete) operations through APIs and transactional across multiple tables. With DynamoDb, you cannot query multiple tables at once and you cannot run analytical operations or aggregation functions like on relational databases.

The most important aspect of DynamoDB is that the table access pattern must be known ahead of time for efficient design and performance.

2. DynamoDB Terminology

2.1. Tables, Items and Attributes

A table is a group of data records. This concept is similar to a table in a relational database or a collection in MongoDB. Tables are top-level entities but there is no strict inter-table relationships, so the tables are always independent of each other, this allow to control the performance at the table level. For example, you might have a USERS table to store data about users and PLAYERS table to store data about players.

An item is a single data record in a table. Items are similar in many ways to rows, records or tuples in other database systems. For example, for USERS table, each item represent a user. Each item in a table is uniquely identified by the stated primary key of the table. Items are stored as JSON (DynamoDB-specific JSON).

Attributes are pieces of data attached to a single item. An attribute is comparable to a column in a relational database of field in MongoDB. For example a user in USERS table contains attributes called userid, username, dateofbirth.

2.2. Primary Key

Each item in a table is uniquely identified by a primary key. The primary definition must be defined at the creation of the table. DynamoDB supports two different types of primary keys:

- Partition key. It’s a simple key composed of one attribute. For example, the user_id is a simple primary key of USERS table. You can access any item in the USERS table directly by providing the userid value for that item.

For example, a USERS table is defines as below:

user_id	username	dateofbirth	profile	createdate
12035	dmbadi	1971-05-24	admin	2022-05-12 15:10:02
20136	jdongmo	1982-07-12	developer	2023-01-04 09:22:50
25123	nangouande	1980-05-13	analyst	2021-11-19 18:25:13
54217	mhadama	1998-11-05	analyst	2022-03-18 17:15:05

- Partition key and Sort key. It referred as composite primary key. This type of key is composed of two attributes, the first attribute is the partition key and the second attribute is the sort key. In this case all items with the same partition key value are stored together, in sorted order by sort key value.

For example, a PLAYERS table is defined as below:

player_id	game_id	game_ts	result	duration
jm001	2564	2023-04-02 14:52:01	win	15
rm054	3215	2023-02-15 00:45:42	win	30
jm001	125	2023-03-25 12:12:34	lose	47
ra0152	3504	2023-04-03 08:45:12	win	12
ra0152	120356	2023-04-02 15:12:05	lose	20

This PLAYERS table is an example of table with a composite primary key, player_id as partition key and game_id as sort key. You can access any item in the PLAYERS table by providing the player_id and game_id. A composite primary key gives you additional flexibility when querying data. If you provide only the value of player_id, DynamoDB retrieves all the games by player. To provide a subset of games by particular player, you can provide a value for player along with a range of values for game_id.

2.3. Secondary indexes

The primary key uniquely identifies an item in a table, and you may make queries against the table using the primary key. However, sometimes you have additional access patterns that would be inefficient with your primary key. DynamoDB introduces the notion of secondary indexes to enable these additional access patterns. There are two types of secondary indexes:

- Local Secondary Index (LSI). You can only add a local secondary indexes on tables with composite primary key. A local secondary index maintains the same HASH key as the underlying table while allowing for a different RANGE key.

For example on a PLAYERS table primary key is player_id + game_id and LSI can be player_id + game_ts or player_id + results or player_id + duration.

You can define up to 5 LSIs and the index items must be specified at the table creation and cannot be deleted later (You have to create a new table). For the given HASH key, you may only store up to 10GB of data.

LSI support eventual, strong and transactional consistency but you can only query single partition (specified by HASH key). However, you can query any table attributes even if the attributes are not projected on the index.

LSI consumes provisioned throughput of the base table.

- Global Secondary Index (GSI). You can add a global secondary indexes to tables with either simple primary keys or composite keys. You can have a same or different partition/HASH key and/or a same sort/range key than the table’s primary key. You can also omit sort/range key.

For example on a PLAYERS table primary key is player_id + game_id and GSI can be player_id, game_id, player_id + results, game_ts + game_id, game_ts_id + duration.

You can have up to 20 GSIs (soft limit) and there is no restrictions for indexed items. You can or delete GSI any time but you can delete only one GSI at a time.

GSI only support eventual consistency and you can query across partitions (over entire table). However, you can query projected attributes (attributes included in the index)

GSI has its own provisioned throughput

1. DynamoDB Consistency model

1.1. Understanding Consistency

A database consistency model determines the manner and timing in which a successful write or update is reflected in a subsequent read operation of that same value. You have read consistency and write consistency.

1.2. Read and write Consistency

Amazon DynamoDB lets you specify the desired consistency characteristics for each read request within an application. You can specify either a read is eventually consistent, strongly consistent or transactional consistent. You also have a standard write consistency and a transactional write consistency

The eventual consistency option is a default in Amazon DynamoDB and maximizes the read throughput. However, an eventually consistent read may not always reflect the results of the recently completed write. Consistency across all copies of data reached within a second.

The strongly consistency returns a result that reflects all writes that received a succeful response prior to the read. To get a strongly consistent read result, you can specify a ConsistentRead parameter to TRUE in your read request.

It takes more resources to process a strongly consistent read than an eventually consistent read and eventually consistency is 50% cheaper than strong consistency.

Transactional consistent reads are used for ACID support one or more tables within a single AWS account and region. It allows you to write to multiple tables once or not at all. It’s 2x the cost of strongly consistent reads and 2x the cost of standards writes.

2. Pricing Model

Amazon DynamoDB charges for reading, writing and storing data in your tables, along with any optional features you choose to enable. DynamoDB has two capacity modes: provisioned capacity and on-demand capacity.

2.1. Provisioned Capacity

With provisioned capacity, you specify the number of reads and writes per second that you expect your application to require, you pay for the capacity you provision. You can use auto-scaling to automatically adjust the provision capacity to ensure application performance while reducing costs.

Provisioned capacity uses Read Capacity Units (RCUs) and Write Capacity Units (WCUs). These are the units in which you provision your capacity and if you consume beyond that provisioned capacity, it might result in throttling, that why you must use auto-scaling.

Along with provisioned capacity, you can also use Reserved Capacity for discounts over 1 or 3 years term contracts. In this case, you’re charged a one-time fee, plus an hourly fee per 100 RCUs and WCUs.

Provisioned capacity mode might be best if you have predictable application traffic or you run application whose traffic is consistent or ramps gradually or you can forecast capacity requirements to control costs.

In Provisioned capacity mode:

§ A Read Capacity Unit (RCU) consist of blocks of 4KB, last block always rounded up.

§ 1 strongly consistent read/sec = 1RCU

§ 2 eventually consistent read/sec = 1RCU

§ 1 transactional read/sec = 2RCUs

· A Write Capacity Unit (WCU) consist of blocks if 1KB, last block always rounded up

· 1 table write/sec = 1WCU

· 1 transactional write/sec = 2WCUS

Note: If you are underutilizing your provisioned capacity, DynamoDB will preserve about 5 minutes of unused read and write capacity for future use, this called a DynamoDB Burst Capacity. This capacity gets utilized whenever you exceed your provisioned capacity.

2.2. On-Demand Capacity

With on-demand capacity, DynamoDB charges you the data reads and writes your application performs on your tables. You do not need to specify how much read and write throughput you expect your application to perform. DynamoDb instantly accommodates your workloads as they ramp up or down. Throttling can occur if you exceed 2x the previous peak within 30 minutes.

You pay the number of request (read and write) your application makes and then you don’t need to provision capacity units.

On-Demand Capacity uses Read Request Units and Write Request Units, similar to Read Capacity Units and Write Capacity Units and you cannot use reserved capacity with On-Demand mode.

On-Demand capacity mode might be best if you create new tables with unknown workloads or you have unpredictable application traffic or you prefer the ease of paying for only what you use.

In On-Demand capacity mode:

§ A Read Request Unit (RRU) consist of clocks of 4KB, last block always rounded up.

§ 1 strongly consistent read request = 1RRU

§ 2 eventually consistent read request = 1RRU

§ 1 transactional read requests = 2RRUs

· A Write Request Unit (WRU) consist of blocks if 1KB, last block always rounded up

· 1 table write request = 1WRU

· 1 transactional write request = 2WRUS

3. Examples of calculating Capacity Units and Throughput

3.1. Calculating Capacity units

How to calculate capacity units to read 25KB of item?

As 1 RCU consists of blocks of 4KB, 25KB/4KB = 6.25, rounded up to 7 RCUs.

- With Strong consistency reading 25KB will consume about 7 RCUs

- With Eventually consistency reading 25KB will consume ½ x 7 RCUS = 3.5 rounded up to 4 RCUs

- If you are use transactional consistency you consume 2 x 7 RCUs = 14 RCUs

As 1 WCU consist of blocks of 1KB,

- A standard Write is 25KB/1KB = 25WCUs

- A transaction write is 2 x 25 WCUs = 50 WCUs

3.2. Calculating Throughput

How to calculate the amount of throughput your application can support?

A DynamoDB table has provisioned capacity of 15 RCUs and 10WCUs.

- Read Throughput with strong consistency = 4KB x 15 = 60KB/sec

- Read Throughput with eventual consistency = 2 x (60KB/sec) = 120KB/sec

- Transactional read throughput = ½ x (60KB/sec) = 30KB/sec

- A Write throughput = 1KB x 10 = 10KB/sec

Database and Datascience

lundi 10 avril 2023

Amazon DynamoDB Consistency and Throughput

Aucun commentaire:

Enregistrer un commentaire

oracle

oracle