Create OpenSearch Serverless using CloudFormation
This page covers OpenSearch Serverless.
Amazon OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. Serverless removes the operational complexities of provisioning, configuring, and tuning your OpenSearch clusters.
What is Amazon OpenSearch Serverless?
In this case, we will use CloudFormation to create OpenSearch Serverless and access the resource from IAM User and Lambda functions.
Environment
Create OpenSearch Serverless.
Create an IAM user and access the OpenSearch Serverless dashboard.
Create two Lambda functions that access OpenSearch Serverless.
One function’s feature is to index into OpenSearch Serverless.
The feature of the other function is to index documents into OpenSearch Serverless.
The runtime environment for Lambda functions is Python 3.8.
CloudFormation template files
The above configuration is built with CloudFormation.
The CloudFormation template is placed at the following URL
https://github.com/awstut-an-r/awstut-fa/tree/main/147
Explanation of key points of template files
OpenSearch Serverless
Create a CloudFormation template by referring to the following official AWS website.
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-cfn.html
Create the following four resources regarding OpenSearch Serverless
- Collection
- Encryption Policy
- Data Access Control Policy
- Network Access Policy
Collection
Collection is the main resource of OpenSearch Serverless.
A collection in Amazon OpenSearch Serverless is a logical grouping of one or more indexes that represent an analytics workload.
Creating, listing, and deleting Amazon OpenSearch Serverless collections
Resources:
Collection:
Type: AWS::OpenSearchServerless::Collection
DependsOn:
- EncryptionSecurityPolicy
Properties:
Name: !Ref CollectionName
StandbyReplicas: DISABLED
Type: TIMESERIES
Code language: YAML (yaml)
There are two points.
The first is the Type property.
You can choose from the following three options.
Time series – The log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data in real-time for operational, security, user behavior, and business insights.
Search – Full-text search that powers applications in your internal networks (content management systems, legal documents) and internet-facing applications, such as ecommerce website search and content search.
Vector search – Semantic search on vector embeddings that simplifies vector data management and powers machine learning (ML) augmented search experiences and generative AI applications, such as chatbots, personal assistants, and fraud detection.
Choosing a collection type
The major difference between Time series and Search is the way data is cached.
Search – Full-text search that powers applications in your internal networks and internet-facing applications. All search data is stored in hot storage to ensure fast query response times.
Time series – Log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data. At least 24 hours of data is stored on hot indexes, and the rest remains in warm storage.
Creating collections
This time, specify “TIMESERIES” to create a collection of time series type.
The second point is the order in which the resources are created.
OpenSearch Serverless collections cannot be created until after the encryption policy described below has been created.
If this order is not followed, the following error will occur.
No matching security policy of encryption type found for collection name: [collection-name]. Please create security policy of encryption type for this collection.
To comply with this specification, use the DependsOn property to configure this resource to be created after the encryption policy is created.
Encryption Policy
By creating an encryption policy, you configure settings related to the encryption of data within OpenSearch Serverless.
Each Amazon OpenSearch Serverless collection that you create is protected with encryption of data at rest, a security feature that helps prevent unauthorized access to your data. Encryption at rest uses AWS Key Management Service (AWS KMS) to store and manage your encryption keys.
Encryption in Amazon OpenSearch Serverless
Resources:
EncryptionSecurityPolicy:
Type: AWS::OpenSearchServerless::SecurityPolicy
Properties:
Name: !Sub "${Prefix}-encryption-policy"
Policy: !Sub >-
{"Rules":[{"ResourceType":"collection","Resource":["collection/${CollectionName}"]}],"AWSOwnedKey":true}
Type: encryption
Code language: YAML (yaml)
The Policy property defines the contents of the encryption policy.
Please refer to the following page for the notation.
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-encryption.html
Check this policy.
Specify the aforementioned OpenSearch Serverless collection as the Resource, making it the target of this policy.
Then set AWSOwnedKey to true to use the KMS key owned by AWS for encryption.
Data Access Policy
By creating a data access policy, you can set detailed access privileges to OpenSearch Serverless for IAM users and others.
With data access control in Amazon OpenSearch Serverless, you can allow users to access collections and indexes, regardless of their access mechanism or network source.
Data access control for Amazon OpenSearch Serverless
Resources:
DataAccessPolicy1:
Type: AWS::OpenSearchServerless::AccessPolicy
Properties:
Name: !Sub "${Prefix}-data-policy-01"
Policy: !Sub >-
[{"Description":"Access for cfn user","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:*"]},
{"ResourceType":"collection","Resource":["collection/${CollectionName}"],"Permission":["aoss:*"]}],
"Principal":["${User1Arn}"]}]
Type: data
Code language: YAML (yaml)
The Policy property defines the contents of the data access policy.
Please refer to the following page for the notation.
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html
Check the policy for this time.
Allow all actions on the OpenSearch Serverless index.
Also allows all actions on collections.
Specify IAM user 1 as Principal, described below.
Network Access Policy
The AWS official description of the network access policy is as follows
The network settings for an Amazon OpenSearch Serverless collection determine whether the collection is accessible over the internet from public networks, or whether it must be accessed through OpenSearch Serverless–managed VPC endpoints.
Network access for Amazon OpenSearch Serverless
Resources:
NetworkSecurityPolicy:
Type: AWS::OpenSearchServerless::SecurityPolicy
Properties:
Name: !Sub "${Prefix}-network-policy"
Policy: !Sub >-
[{"Rules":[{"ResourceType":"collection","Resource":["collection/${CollectionName}"]},
{"ResourceType":"dashboard","Resource":["collection/${CollectionName}"]}],"AllowFromPublic":true}]
Type: network
Code language: YAML (yaml)
The Policy property defines the contents of the network access policy.
Please refer to the following page for the notation.
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html
Check the policy for this time.
By specifying “true” for AllowFromPublic, you will allow direct Internet access to this collection and dashboard.
IAM user
Resources:
User1:
Type: AWS::IAM::User
Properties:
LoginProfile:
Password: !Ref Password
Policies:
- PolicyName: AllAllowPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- aoss:*
Resource: "*"
UserName: !Sub "${Prefix}-user-01"
User2:
Type: AWS::IAM::User
Properties:
LoginProfile:
Password: !Ref Password
Policies:
- PolicyName: AllAllowPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- aoss:*
Resource: "*"
UserName: !Sub "${Prefix}-user-02"
Code language: YAML (yaml)
Create 2 users.
The settings are the same for both users.
As mentioned earlier, the data access policy allows access to collections and dashboards only for IAM user 1.
Lambda Functions
Access the created OpenSearch Serverless from a Lambda function.
The code to execute in the function is based on the following page.
https://opensearch.org/docs/latest/clients/python-low-level/
Function 1
Resources:
Function2:
Type: AWS::Lambda::Function
Properties:
Architectures:
- !Ref Architecture
Environment:
Variables:
COLLECTION_ENDPOINT: !Sub "${Collection}.${AWS::Region}.aoss.amazonaws.com"
REGION: !Ref AWS::Region
Code:
ZipFile: |
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3
import os
host = os.environ['COLLECTION_ENDPOINT']
region = os.environ['REGION']
service = 'aoss'
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
client = OpenSearch(
hosts=[{'host': host, 'port': 443}],
http_auth=auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
pool_maxsize=20,
)
def lambda_handler(event, context):
index_name = "python-test-index"
create_response = client.indices.create(
index_name
)
print(create_response)
document = {
'title': 'Moneyball',
'director': 'Bennett Miller',
'year': '2011'
}
index_response = client.index(
index=index_name,
body=document
)
print(index_response)
FunctionName: !Sub "${Prefix}-function-02"
Handler: !Ref Handler
Layers:
- !Ref LambdaLayer
Runtime: !Ref Runtime
Role: !GetAtt FunctionRole2.Arn
Code language: YAML (yaml)
Describe the code to be executed in the function in inline format.
For information on how to create a Lambda function using CloudFormation, please see the following page.
This function stores data in OpenSearch Serverless.
Specifically, after creating a client object for OpenSearch Serverless, the following two actions are performed.
- Index creation using the client’s indices.create method
- Indexing documents using the client’s index method
Documentation is sample data from a reference site.
Function 2
Resources:
Function3:
Type: AWS::Lambda::Function
Properties:
Architectures:
- !Ref Architecture
Environment:
Variables:
COLLECTION_ENDPOINT: !Sub "${Collection}.${AWS::Region}.aoss.amazonaws.com"
REGION: !Ref AWS::Region
Code:
ZipFile: |
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3
import os
host = os.environ['COLLECTION_ENDPOINT']
region = os.environ['REGION']
service = 'aoss'
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
client = OpenSearch(
hosts=[{'host': host, 'port': 443}],
http_auth=auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
pool_maxsize=20,
)
def lambda_handler(event, context):
index_name = "python-test-index"
q = 'miller'
query = {
'size': 5,
'query': {
'multi_match': {
'query': q,
'fields': ['title^2', 'director']
}
}
}
search_response = client.search(
body=query,
index=index_name
)
print(search_response)
FunctionName: !Sub "${Prefix}-function-03"
Handler: !Ref Handler
Layers:
- !Ref LambdaLayer
Runtime: !Ref Runtime
Role: !GetAtt FunctionRole3.Arn
Code language: YAML (yaml)
This function searches OpenSearch Serverless data.
Specifically, it uses the client’s indices.search method.
This also searches the sample data from the reference site.
IAM Role
Resources:
FunctionRole2:
Type: AWS::IAM::Role
DeletionPolicy: Delete
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action: sts:AssumeRole
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: FunctionRole2Policy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- aoss:APIAccessAll
Resource:
- !Sub "arn:aws:aoss:${AWS::Region}:${AWS::AccountId}:collection/${Collection}"
Code language: YAML (yaml)
IAM roles for both functions.
The above is for function 2, but the policy is exactly the same for the IAM role for function 3.
The key point is “aoss:APIAccessAll” specified in the inline policy.
Starting May 10, 2023, OpenSearch Serverless requires these two new IAM permissions for collection resources. The
Identity and Access Management for Amazon OpenSearch Serverlessaoss:APIAccessAll
permission allows data plane access, and theaoss:DashboardsAccessAll
permission allows OpenSearch Dashboards from the browser. Failure to add the two new IAM permissions results in a 403 error.
Both functions perform indexing and searching, so the former must be allowed.
Data Access Policy
Resources:
DataAccessPolicy2:
Type: AWS::OpenSearchServerless::AccessPolicy
DependsOn:
- Function2
Properties:
Name: !Sub "${Prefix}-data-policy-02"
Policy: !Sub
- >-
[{"Description":"Access for Function2","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:CreateIndex","aoss:WriteDocument","aoss:UpdateIndex"]}],
"Principal":["${FunctionRole2Arn}"]}]
- FunctionRole2Arn: !GetAtt FunctionRole2.Arn
Type: data
DataAccessPolicy3:
Type: AWS::OpenSearchServerless::AccessPolicy
DependsOn:
- Function3
Properties:
Name: !Sub "${Prefix}-data-policy-03"
Policy: !Sub
- >-
[{"Description":"Access for Function3","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:ReadDocument"]}],
"Principal":["${FunctionRole3Arn}"]}]
- FunctionRole3Arn: !GetAtt FunctionRole3.Arn
Type: data
Code language: YAML (yaml)
Create data access policies for Lambda functions as well as for IAM users.
Function 2 will submit data to OpenSearch Serverless, so grant write permissions.
Function 3 searches for data, so give it read permissions.
(Reference) Automatically create a Lambda layer using CloudFormation custom resources
To access OpenSearch Serverless from Python, use the package (opensearch-py) provided by AWS official.
This package is not included in the default Lambda function runtime environment.
This time we will create a Lambda layer and include it so that we can use this package.
In this case, we will use a CloudFormation custom resource to automatically create a Lambda layer.
For more information, please see the following page.
The above page specifies the package to be installed with pip in the SSM Parameter Store.
In this case, specify the same resource as follows.
Resources:
RequirementsParameter:
Type: AWS::SSM::Parameter
Properties:
Name: !Ref Prefix
Type: String
Value: |
urllib3==1.26.6
opensearch-py
Code language: YAML (yaml)
Architecting
Use CloudFormation to build this environment and check its actual behavior.
Create CloudFormation stacks and check the resources in the stacks
Create CloudFormation stacks.
For information on how to create stacks and check each stack, please see the following page.
This time, we will create an IAM resource (IAM user) to be named, so set the options as follows
$ aws cloudformation create-stack \
--stack-name [stack-name] \
--template-url https://[bucket-name].s3.[region].amazonaws.com/[folder-name]/fa-147.yaml \
--capabilities CAPABILITY_NAMED_IAM
Code language: Bash (bash)
Check the OpenSearch Serverless collection from the AWS Management Console.
The collection has been successfully created.
Check the network access policy.
You can see that this collection is open to the public.
Check the data access policy.
There are three, so check them in order.
The first policy allows all actions for IAM user 1.
The second policy is what gives Function 2 permission to write test data.
The third policy is what gives Function 3 the authority to retrieve data.
Operation Check
Now that we are ready, we will check the actual operation.
Write data
Execute function 2 to write test data.
The log shows that the log has indeed been written.
It was successfully indexed and added.
Searching for data
The written data is then retrieved.
The log did indeed return the written data.
The search was successfully executed.
Dashboard
Access the OpenSearch Serverless dashboard from your IAM user.
First, try to access with IAM user 2.
Could not access.
This is not a problem in terms of the IAM role that IAM user 2 is using, but the data access policy is not set.
In this way, users who use OpenSearch Serverless must properly configure not only IAM policies such as IAM roles, but also data access policies.
Then access with IAM user 1.
Successfully accessed.
IAM user 1 was able to access the dashboard because the data access policy was set properly.
Finally, search from the dashboard.
The written test data is returned.
In this way, OpenSearch Serverless also allows the use of dashboards.
Summary
Using CloudFormation, we created OpenSearch Serverless and accessed the resource from IAM User and Lambda functions.