Introduction to DynamoDB – Building DB for Review Data

TOC

Building Simple DB for Review Data with DynamoDB

This is one of the AWS DVA topics related to development with AWS services.
As an introduction to DynamoDB, we will create a simple database.
We will design the database assuming it is for a review system.

Database to be built

We will store the following data in the database.

user: User name
service: Service name
datetime: Posting date and time
rating: Rating value (1~5)
comment：rating text

In concrete terms, the following data will be stored.

We assume that the following queries will be executed against the database:

To retrieve all reviews posted by a specific user.
To retrieve the reviews posted by a specific user for a specific service.
To retrieve the reviews posted by a specific user, starting with the most recent.
To retrieve reviews for a specific service, starting with the highest rated reviews.

DynamoDB Terms

Keys in DynamoDB

First, let’s look at keys in DynamoDB. There are Partition Key and Sort Key.

Partition Key

In DynamoDB, it is necessary to define a primary key that uniquely identifies the items stored in a table. The primary key can be defined as a partition key alone or as a combination of a partition key and a sort key. A partition key is a key that defines a physical partition where data is stored.

DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
Primary Key

A partition key can act as a primary key on its own by specifying unique data, such as a user ID, as the partition key.

Sort Key

The sort key is a key that defines the order of data in the partition defined by the partition key.

DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value.
Primary Key

If the data specified in the partition key and the sort key can be combined to uniquely identify an item, the combination of the two will act as the primary key. This is called a composite primary key.

Secondary Indexes

Next, let’s review secondary indexes. There is a global secondary index and a local secondary index.
The assumption is that DynamoDB is a Key – Value type database, so the basic usage is to specify a primary key and retrieve the corresponding item. In addition, it is possible to execute a query to search for items that satisfy specific conditions. Queries can only be executed against a sort key. However, a sort key can only be set for a single attribute. This means that the query cannot be executed on any attribute other than the one specified as the sort key. To solve this problem, we define a secondary index.

Global Secondary Index

The first is a global secondary index.

Global secondary index – An index with a partition key and sort key that can be different from those on the table.
Secondary Indexes

By defining a global secondary index, you are defining a new partition key and sort key for search, separate from the original partition key and sort key, and you can use it to perform queries.

Local Secondary Index

The second is a local secondary index.

Local secondary index – An index that has the same partition key as the table, but a different sort key.
Secondary Indexes

By defining a local secondary index, you are defining a second sort key for the existing partition key, which allows you to query the existing partition using a different attribute.

Points to consider when designing DynamoDB

Consider primary key first

As mentioned earlier, the primary key is used to uniquely identify an item in a table. The primary key can be defined as a partition key alone or as a composite primary key combining a partition key and a sort key.
In this scenario, there are four data to be stored in the table, and among them, the following two data can be candidates for the primary key.

user
service

However, neither of them by themselves can uniquely identify an item. For example, a user may review many services, or many users may review a single service. Therefore, we combine the two attributes into a primary key, and by combining the two data, we can uniquely identify a review.
And looking at the search requirements again, we can see that there are several queries that focus on user information. Specifically, (1), (2), and (3). So we will use user as the partition key and service as the sort key. This will allow us to deal with search patterns (1) and (2).

Use the partition key and sort key to get the item.

Define local secondary index and query with second sort key

Next, let’s consider how to deal with search pattern (3). This pattern is based on user information, and the query will be performed on the date and time the review was posted. As mentioned earlier, the query can only be executed against the attribute specified in the sort key. However, since we specified service as the sort key, we cannot execute queries using the date and time information at present.
This time, we will create a local secondary index to support this search pattern. Since it is a local secondary index, leave the partition key as user and specify datetime as the new sort key. This will allow us to execute the query of search pattern (3).

Use the partition key and local secondary index to get the item.

Define global secondary index and query with second primary key

Finally, we will consider how to deal with search pattern ④. This pattern is based on service information, and queries are to be executed on evaluated values. Currently, the index for this search is not prepared, so it is not possible to execute the query.
This time, we will create a global index to support the main search pattern. Since it is a global index, specify service as the new partition key and rating as the new sort key. Now, the query of search pattern 4) can be executed.

Summary of DB design for review data

Let’s summarize the key points in the configuration.
First, the primary key should be as follows

Partition key: user
Sort key: service

Next, the local secondary index is as follows

Partition key: user
Sort key: datetime

Finally, the global secondary index is as follows

Partition key: service
Sort key: rating

CloudFormation Template File

We will use CloudFormation to build a DynamoDB table for hands-on use.
The CloudFormation template file and the JSON file for the sample data to be fed into DynamoDB are placed at the following URL.

https://github.com/awstut-an-r/awstut-dva/tree/main/03/002

Explanation of points in template file

We will cover the key points of the template file to configure this architecture.

Specify only what is specified for keys in attributes

Check the parameters related to the table attributes.

Resources:
  Table:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: user
          AttributeType: S
        - AttributeName: service
          AttributeType: S
        - AttributeName: datetime
          AttributeType: S
        - AttributeName: rating
          AttributeType: N
        #- AttributeName: comment
        #  AttributeType: S
Code language: YAML (yaml)

The AttributeDefinitions property is used to set the attributes of the table.
The caveat of this property is that it does not specify all the attributes that are planned for the table.

AttributeDefinitions

A list of attributes that describe the key schema for the table and indexes.
AWS::DynamoDB::Table

In other words, only the attributes that are specified as keys in defining primary keys and secondary indexes should be set to this property. In this table, five attributes will be created, but only four attributes (user, service, datetime, rating) will be used as the primary key or secondary index. Therefore, comment is not specified in this property. Please note that if you specify comment as well, an error will occur and the CloudFormation stack will not be created properly.

Hash key is partition key, range key is sort key

Next, let’s check the primary key settings.

Resources:
  Table:
    Type: AWS::DynamoDB::Table
    Properties:
      KeySchema:
        - AttributeName: user
          KeyType: HASH
        - AttributeName: service
          KeyType: RANGE
Code language: YAML (yaml)

Define the primary key in the KeySchema property. Specify the partition key or sort key in the KeyType property in the same property, and specify the corresponding attribute. The specifications for both keys are as follows

Partition key: HASH (hash key)
Sort key: RANGE (range key)

The reason why the partition key is referred to as the hash key and the sort key as the range key is a remnant of how they used to be called.
In this case, we will specify user as the partition key and service as the sort key.

Specifying projection when creating secondary index

Next, let’s check the local secondary index (LSI).

Resources:
  Table:
    Type: AWS::DynamoDB::Table
    Properties:
      LocalSecondaryIndexes:
        - IndexName: lsi-user-datetime
          KeySchema:
            - AttributeName: user
              KeyType: HASH
            - AttributeName: datetime
              KeyType: RANGE
          Projection:
            NonKeyAttributes:
              - service
              - comment
            ProjectionType: INCLUDE
Code language: YAML (yaml)

Local secondary indexes can be configured with the LocalSecondaryIndexes property.
In the KeySchema property, specify the attribute that is the original partition key and the attribute that will be the sort key for LSI. In this case, the partition key is user and datetime is specified as the LSI sort key.
The key point in defining the secondary index is the projection. Projection is specified by the Projection property. Projection is a setting that determines whether or not attributes other than the primary key will be included in the query results when a search is performed using the secondary index.
In this case, if the projection is not set, only the user and datetime attributes will be available in the query results.
However, this is not enough for the attributes that are important for review data. Therefore, by specifying “INCLUDE” in the ProjectionType property and attributes in the NonKeyAttributes property, we can specify the attributes to be included in the projection. In this case, by specifying the service and comment attributes, we will be able to include both attributes in the query result.

The settings for the global secondary index (GSI) are the same as in LSI.

Resources:
  Table:
    Type: AWS::DynamoDB::Table
    Properties:
      GlobalSecondaryIndexes:
        - IndexName: gsi-service-rating
          KeySchema:
            - AttributeName: service
              KeyType: HASH
            - AttributeName: rating
              KeyType: RANGE
          Projection:
            NonKeyAttributes:
              - user
              - comment
            ProjectionType: INCLUDE
Code language: YAML (yaml)

This time, we will specify service as the GSI partition key and rating as the sort key.

Architecting

We will use CloudFormation to build this environment and check its actual behavior.

Create CloudFormation stack and check resources in stack

For information on how to create a stack and check each stack, please refer to the following page

The DynamoDB table created this time is “dva-03-002-table”, check the creation status from the AWS Management Console.

You can see that the table has been created successfully. Next, we will check the secondary index.

We can see that both indexes have been created successfully. The partition key, sort key, and projection are also set as specified.

Writing test data to the table

From the AWS CLI, submit the data to the table you created.

$ aws dynamodb batch-write-item \
--request-items file://request-items.json
Code language: Bash (bash)

Prepare the data to be written as a JSON file (request-items.json) and pass the file to the request-items option.

Preparing for table operations

To perform operations on a table from the AWS Management Console, click the “Explore table items” button.

Access the Scan/Query page of the DynamoDB table.

From this page, you can scan or query the table.

Table operations: scan

Now that we are ready, we will perform a search on the table. First, we will perform a scan to retrieve all the data in the table.

After selecting “Scan”, click on the “Run” button.

The result of scanning the DynamoDB table.

The results of the scan will be displayed at the bottom of the page. All the data (15 items) were successfully displayed. There are no settings such as projection in Scan, and all attributes can be acquired.

Table operations: query ①

Next, let’s check the operation of the query.
Let’s say we want to retrieve all the review data for user aaa.

After selecting “Query”, specify only the partition key “user”.

The result of querying with the partition key.

Looking at the result of query execution, we can see that we have retrieved the five reviews posted by user aaa. This means that the search requirement (1) “Retrieve all reviews posted by a specific user. was achieved.

Table operation: query ②

Suppose we want to retrieve the reviews of EC2 by user aaa.

Use partition keys and sort keys to query.

Specify “aaa” as the partition key (user) and “EC2” as the sort key (service).

The result of querying using the partition key and sort key.

Looking at the result of the query execution, we can see that we have obtained the review data for EC2 posted by user aaa. From the above, we have satisfied the search requirement (2) “To retrieve reviews posted by a specific user for a specific service. of the search requirement was achieved.

Table operation: query ③

Suppose we want to retrieve the reviews of user bbb, starting with the most recent one.

Use partition keys and local secondary indexes to query.

We will execute the query against the local secondary index, so select “lsi-user-datetime”, specify “bbb” for user, and specify the sort order as descending.

The result of a query using a partition key and a local secondary index.

The result of the query shows that the reviews posted by user bbb are retrieved in order from the most recent one. This satisfies the search requirement (3) “retrieve reviews posted by a specific user in the order of the most recent reviews. was achieved.

Table operation: query ④

Let’s consider retrieving the reviews with the highest rating for the service IAM.

Use partition keys and global secondary indexes to query.

We will execute the query against the global secondary index, so we will select “gsi-service-rating”, specify “IAM” for service, and specify the sort order as descending.

The result of a query using a partition key and a global secondary index.

Looking at the results of the query execution, we can see that three of the IAM reviews were retrieved from those with the highest ratings. As a result, we have satisfied the search requirement (4), “Obtain the reviews with the highest rating among the reviews for a specific service. was achieved.

Summary

We have reviewed the DynamoDB primary key and secondary index.
By setting the primary key and secondary index according to the search requirements, we confirmed that it is possible to respond to a variety of queries.