Create OpenSearch Serverless using CloudFormation

Create OpenSearch Serverless using CloudFormation.

Create OpenSearch Serverless using CloudFormation

This page covers OpenSearch Serverless.

Amazon OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. Serverless removes the operational complexities of provisioning, configuring, and tuning your OpenSearch clusters.

What is Amazon OpenSearch Serverless?

In this case, we will use CloudFormation to create OpenSearch Serverless and access the resource from IAM User and Lambda functions.

Environment

Diagram of ceating OpenSearch Serverless using CloudFormation.

Create OpenSearch Serverless.

Create an IAM user and access the OpenSearch Serverless dashboard.

Create two Lambda functions that access OpenSearch Serverless.
One function’s feature is to index into OpenSearch Serverless.
The feature of the other function is to index documents into OpenSearch Serverless.

The runtime environment for Lambda functions is Python 3.8.

CloudFormation template files

The above configuration is built with CloudFormation.
The CloudFormation template is placed at the following URL

https://github.com/awstut-an-r/awstut-fa/tree/main/147

Explanation of key points of template files

OpenSearch Serverless

Create a CloudFormation template by referring to the following official AWS website.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-cfn.html

Create the following four resources regarding OpenSearch Serverless

  • Collection
  • Encryption Policy
  • Data Access Control Policy
  • Network Access Policy

Collection

Collection is the main resource of OpenSearch Serverless.

A collection in Amazon OpenSearch Serverless is a logical grouping of one or more indexes that represent an analytics workload.

Creating, listing, and deleting Amazon OpenSearch Serverless collections
Resources:
  Collection:
    Type: AWS::OpenSearchServerless::Collection
    DependsOn:
      - EncryptionSecurityPolicy
    Properties:
      Name: !Ref CollectionName
      StandbyReplicas: DISABLED
      Type: TIMESERIES
Code language: YAML (yaml)

There are two points.

The first is the Type property.
You can choose from the following three options.

Time series – The log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data in real-time for operational, security, user behavior, and business insights.

Search – Full-text search that powers applications in your internal networks (content management systems, legal documents) and internet-facing applications, such as ecommerce website search and content search.

Vector search – Semantic search on vector embeddings that simplifies vector data management and powers machine learning (ML) augmented search experiences and generative AI applications, such as chatbots, personal assistants, and fraud detection.

Choosing a collection type

The major difference between Time series and Search is the way data is cached.

Search – Full-text search that powers applications in your internal networks and internet-facing applications. All search data is stored in hot storage to ensure fast query response times.

Time series – Log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data. At least 24 hours of data is stored on hot indexes, and the rest remains in warm storage.

Creating collections

This time, specify “TIMESERIES” to create a collection of time series type.

The second point is the order in which the resources are created.
OpenSearch Serverless collections cannot be created until after the encryption policy described below has been created.
If this order is not followed, the following error will occur.

No matching security policy of encryption type found for collection name: [collection-name]. Please create security policy of encryption type for this collection.

To comply with this specification, use the DependsOn property to configure this resource to be created after the encryption policy is created.

Encryption Policy

By creating an encryption policy, you configure settings related to the encryption of data within OpenSearch Serverless.

Each Amazon OpenSearch Serverless collection that you create is protected with encryption of data at rest, a security feature that helps prevent unauthorized access to your data. Encryption at rest uses AWS Key Management Service (AWS KMS) to store and manage your encryption keys.

Encryption in Amazon OpenSearch Serverless
Resources:
  EncryptionSecurityPolicy:
    Type: AWS::OpenSearchServerless::SecurityPolicy
    Properties:
      Name: !Sub "${Prefix}-encryption-policy"
      Policy: !Sub >-
        {"Rules":[{"ResourceType":"collection","Resource":["collection/${CollectionName}"]}],"AWSOwnedKey":true}
      Type: encryption
Code language: YAML (yaml)

The Policy property defines the contents of the encryption policy.

Please refer to the following page for the notation.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-encryption.html

Check this policy.
Specify the aforementioned OpenSearch Serverless collection as the Resource, making it the target of this policy.
Then set AWSOwnedKey to true to use the KMS key owned by AWS for encryption.

Data Access Policy

By creating a data access policy, you can set detailed access privileges to OpenSearch Serverless for IAM users and others.

With data access control in Amazon OpenSearch Serverless, you can allow users to access collections and indexes, regardless of their access mechanism or network source.

Data access control for Amazon OpenSearch Serverless
Resources:
  DataAccessPolicy1:
    Type: AWS::OpenSearchServerless::AccessPolicy
    Properties:
      Name: !Sub "${Prefix}-data-policy-01"
      Policy: !Sub >-
        [{"Description":"Access for cfn user","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:*"]},
        {"ResourceType":"collection","Resource":["collection/${CollectionName}"],"Permission":["aoss:*"]}],
        "Principal":["${User1Arn}"]}]
      Type: data
Code language: YAML (yaml)

The Policy property defines the contents of the data access policy.

Please refer to the following page for the notation.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html

Check the policy for this time.
Allow all actions on the OpenSearch Serverless index.
Also allows all actions on collections.
Specify IAM user 1 as Principal, described below.

Network Access Policy

The AWS official description of the network access policy is as follows

The network settings for an Amazon OpenSearch Serverless collection determine whether the collection is accessible over the internet from public networks, or whether it must be accessed through OpenSearch Serverless–managed VPC endpoints.

Network access for Amazon OpenSearch Serverless
Resources:
  NetworkSecurityPolicy:
    Type: AWS::OpenSearchServerless::SecurityPolicy
    Properties:
      Name: !Sub "${Prefix}-network-policy"
      Policy: !Sub >-
        [{"Rules":[{"ResourceType":"collection","Resource":["collection/${CollectionName}"]},
        {"ResourceType":"dashboard","Resource":["collection/${CollectionName}"]}],"AllowFromPublic":true}]
      Type: network
Code language: YAML (yaml)

The Policy property defines the contents of the network access policy.

Please refer to the following page for the notation.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html

Check the policy for this time.
By specifying “true” for AllowFromPublic, you will allow direct Internet access to this collection and dashboard.

IAM user

Resources:
  User1:
    Type: AWS::IAM::User
    Properties:
      LoginProfile:
        Password: !Ref Password
      Policies:
        - PolicyName: AllAllowPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - aoss:*
                Resource: "*"
      UserName: !Sub "${Prefix}-user-01"

  User2:
    Type: AWS::IAM::User
    Properties:
      LoginProfile:
        Password: !Ref Password
      Policies:
        - PolicyName: AllAllowPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - aoss:*
                Resource: "*"
      UserName: !Sub "${Prefix}-user-02"
Code language: YAML (yaml)

Create 2 users.
The settings are the same for both users.

As mentioned earlier, the data access policy allows access to collections and dashboards only for IAM user 1.

Lambda Functions

Access the created OpenSearch Serverless from a Lambda function.
The code to execute in the function is based on the following page.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-clients.html#serverless-python

https://opensearch.org/docs/latest/clients/python-low-level/

Function 1

Resources:
  Function2:
    Type: AWS::Lambda::Function
    Properties:
      Architectures:
        - !Ref Architecture
      Environment:
        Variables:
          COLLECTION_ENDPOINT: !Sub "${Collection}.${AWS::Region}.aoss.amazonaws.com"
          REGION: !Ref AWS::Region
      Code:
        ZipFile: |
          from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
          import boto3
          import os

          host = os.environ['COLLECTION_ENDPOINT']
          region = os.environ['REGION']

          service = 'aoss'
          credentials = boto3.Session().get_credentials()
          auth = AWSV4SignerAuth(credentials, region, service)

          client = OpenSearch(
            hosts=[{'host': host, 'port': 443}],
            http_auth=auth,
            use_ssl=True,
            verify_certs=True,
            connection_class=RequestsHttpConnection,
            pool_maxsize=20,
            )

          def lambda_handler(event, context):
            index_name = "python-test-index"
            create_response = client.indices.create(
              index_name
            )
            print(create_response)

            document = {
              'title': 'Moneyball',
              'director': 'Bennett Miller',
              'year': '2011'
              }

            index_response = client.index(
              index=index_name,
              body=document
              )
            print(index_response)
      FunctionName: !Sub "${Prefix}-function-02"
      Handler: !Ref Handler
      Layers:
        - !Ref LambdaLayer
      Runtime: !Ref Runtime
      Role: !GetAtt FunctionRole2.Arn
Code language: YAML (yaml)

Describe the code to be executed in the function in inline format.

For information on how to create a Lambda function using CloudFormation, please see the following page.

あわせて読みたい
3 parterns to create Lambda with CloudFormation (S3/Inline/Container) 【Creating Lambda with CloudFormation】 When creating a Lambda with CloudFormation, there are three main patterns as follows. Uploading the code to an S3 buc...

This function stores data in OpenSearch Serverless.
Specifically, after creating a client object for OpenSearch Serverless, the following two actions are performed.

  • Index creation using the client’s indices.create method
  • Indexing documents using the client’s index method

Documentation is sample data from a reference site.

Function 2

Resources:
  Function3:
    Type: AWS::Lambda::Function
    Properties:
      Architectures:
        - !Ref Architecture
      Environment:
        Variables:
          COLLECTION_ENDPOINT: !Sub "${Collection}.${AWS::Region}.aoss.amazonaws.com"
          REGION: !Ref AWS::Region
      Code:
        ZipFile: |
          from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
          import boto3
          import os

          host = os.environ['COLLECTION_ENDPOINT']
          region = os.environ['REGION']

          service = 'aoss'
          credentials = boto3.Session().get_credentials()
          auth = AWSV4SignerAuth(credentials, region, service)

          client = OpenSearch(
            hosts=[{'host': host, 'port': 443}],
            http_auth=auth,
            use_ssl=True,
            verify_certs=True,
            connection_class=RequestsHttpConnection,
            pool_maxsize=20,
            )

          def lambda_handler(event, context):
            index_name = "python-test-index"
            q = 'miller'
            query = {
              'size': 5,
              'query': {
                'multi_match': {
                  'query': q,
                  'fields': ['title^2', 'director']
                }
              }
            }

            search_response = client.search(
              body=query,
              index=index_name
              )
            print(search_response)
      FunctionName: !Sub "${Prefix}-function-03"
      Handler: !Ref Handler
      Layers:
        - !Ref LambdaLayer
      Runtime: !Ref Runtime
      Role: !GetAtt FunctionRole3.Arn
Code language: YAML (yaml)

This function searches OpenSearch Serverless data.
Specifically, it uses the client’s indices.search method.
This also searches the sample data from the reference site.

IAM Role

Resources:
  FunctionRole2:
    Type: AWS::IAM::Role
    DeletionPolicy: Delete
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action: sts:AssumeRole
            Principal:
              Service:
                - lambda.amazonaws.com
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: FunctionRole2Policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - aoss:APIAccessAll
                Resource:
                  - !Sub "arn:aws:aoss:${AWS::Region}:${AWS::AccountId}:collection/${Collection}"
Code language: YAML (yaml)

IAM roles for both functions.
The above is for function 2, but the policy is exactly the same for the IAM role for function 3.

The key point is “aoss:APIAccessAll” specified in the inline policy.

Starting May 10, 2023, OpenSearch Serverless requires these two new IAM permissions for collection resources. The aoss:APIAccessAll permission allows data plane access, and the aoss:DashboardsAccessAll permission allows OpenSearch Dashboards from the browser. Failure to add the two new IAM permissions results in a 403 error.

Identity and Access Management for Amazon OpenSearch Serverless

Both functions perform indexing and searching, so the former must be allowed.

Data Access Policy

Resources:
  DataAccessPolicy2:
    Type: AWS::OpenSearchServerless::AccessPolicy
    DependsOn:
      - Function2
    Properties:
      Name: !Sub "${Prefix}-data-policy-02"
      Policy: !Sub
        - >-
          [{"Description":"Access for Function2","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:CreateIndex","aoss:WriteDocument","aoss:UpdateIndex"]}],
          "Principal":["${FunctionRole2Arn}"]}]
        - FunctionRole2Arn: !GetAtt FunctionRole2.Arn
      Type: data

  DataAccessPolicy3:
    Type: AWS::OpenSearchServerless::AccessPolicy
    DependsOn:
      - Function3
    Properties:
      Name: !Sub "${Prefix}-data-policy-03"
      Policy: !Sub
        - >-
          [{"Description":"Access for Function3","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:ReadDocument"]}],
          "Principal":["${FunctionRole3Arn}"]}]
        - FunctionRole3Arn: !GetAtt FunctionRole3.Arn
      Type: data
Code language: YAML (yaml)

Create data access policies for Lambda functions as well as for IAM users.

Function 2 will submit data to OpenSearch Serverless, so grant write permissions.
Function 3 searches for data, so give it read permissions.

(Reference) Automatically create a Lambda layer using CloudFormation custom resources

To access OpenSearch Serverless from Python, use the package (opensearch-py) provided by AWS official.
This package is not included in the default Lambda function runtime environment.
This time we will create a Lambda layer and include it so that we can use this package.

In this case, we will use a CloudFormation custom resource to automatically create a Lambda layer.
For more information, please see the following page.

あわせて読みたい
Preparing Lambda Layer Package with CFN Custom Resources – Python Version 【Automatically create and deploy Lambda layer package for Python using CloudFormation custom resources】 The following page covers how to create a Lambda la...

The above page specifies the package to be installed with pip in the SSM Parameter Store.
In this case, specify the same resource as follows.

Resources:
  RequirementsParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Ref Prefix
      Type: String
      Value: |
        urllib3==1.26.6
        opensearch-py
Code language: YAML (yaml)

Architecting

Use CloudFormation to build this environment and check its actual behavior.

Create CloudFormation stacks and check the resources in the stacks

Create CloudFormation stacks.
For information on how to create stacks and check each stack, please see the following page.

あわせて読みたい
CloudFormation’s nested stack 【How to build an environment with a nested CloudFormation stack】 Examine nested stacks in CloudFormation. CloudFormation allows you to nest stacks. Nested ...

This time, we will create an IAM resource (IAM user) to be named, so set the options as follows

$ aws cloudformation create-stack \
--stack-name [stack-name] \
--template-url https://[bucket-name].s3.[region].amazonaws.com/[folder-name]/fa-147.yaml \
--capabilities CAPABILITY_NAMED_IAM
Code language: Bash (bash)

Check the OpenSearch Serverless collection from the AWS Management Console.

Detail of OpenSearch Serverless 1.

The collection has been successfully created.

Check the network access policy.

Detail of OpenSearch Serverless 2.

You can see that this collection is open to the public.

Check the data access policy.
There are three, so check them in order.

Detail of OpenSearch Serverless 3.

The first policy allows all actions for IAM user 1.

Detail of OpenSearch Serverless 4.

The second policy is what gives Function 2 permission to write test data.

Detail of OpenSearch Serverless 5.

The third policy is what gives Function 3 the authority to retrieve data.

Operation Check

Now that we are ready, we will check the actual operation.

Write data

Execute function 2 to write test data.

Detail of Lambda 1.

The log shows that the log has indeed been written.
It was successfully indexed and added.

Searching for data

The written data is then retrieved.

Detail of Lambda 2.

The log did indeed return the written data.
The search was successfully executed.

Dashboard

Access the OpenSearch Serverless dashboard from your IAM user.

First, try to access with IAM user 2.

Detail of OpenSearch Serverless 6.

Could not access.
This is not a problem in terms of the IAM role that IAM user 2 is using, but the data access policy is not set.
In this way, users who use OpenSearch Serverless must properly configure not only IAM policies such as IAM roles, but also data access policies.

Then access with IAM user 1.

Detail of OpenSearch Serverless 7.

Successfully accessed.
IAM user 1 was able to access the dashboard because the data access policy was set properly.

Finally, search from the dashboard.

Detail of OpenSearch Serverless 8.

The written test data is returned.
In this way, OpenSearch Serverless also allows the use of dashboards.

Summary

Using CloudFormation, we created OpenSearch Serverless and accessed the resource from IAM User and Lambda functions.