Iterate using the Map state in Step Functions
The Map state allows for iterative processing.
Use the Map state to run a set of workflow steps for each item in a dataset. The Map state’s iterations run in parallel, which makes it possible to process a dataset quickly.
Map
This time we will build a Step Functions state machine using Map state.
Environment
Create a Step Functions state machine.
A state machine consists of two main states.
- First state: Array is created with the built-in function States.ArrayRange.
- Second state: Map state that processes each number in the array created in the previous state.
The Map state consists of the following two sub-states
- First substate: Use a Lambda function to return a number received as an argument multiplied by two.
- Second substate: Use a Lambda function to multiply a number received in an argument by two and return it.
The runtime environment for the function is Python 3.8.
CloudFormation template files
The above configuration is built with CloudFormation.
The CloudFormation templates are placed at the following URL
https://github.com/awstut-an-r/awstut-fa/tree/main/121
Explanation of key points of template files
This page focuses on the Map state of Step Functions.
For information on how to create a Step Functions state machine, please see the following page.
State Machine
Resources:
StateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
Definition:
Comment: !Sub "${Prefix}-StateMachine"
StartAt: FirstState
States:
FirstState:
Type: Pass
Parameters:
numbers.$: States.ArrayRange(0, 9, 1)
Next: MapState
MapState:
Type: Map
MaxConcurrency: 5
InputPath: $.numbers
ItemSelector:
number-origin.$: $$.Map.Item.Value
ItemProcessor:
ProcessorConfig:
Mode: INLINE
StartAt: SecondState
States:
SecondState:
Type: Task
Resource: !Ref Function1Arn
Parameters:
number.$: $.number-origin
ResultPath: $.number-squared
Next: LastState
LastState:
Type: Task
Resource: !Ref Function2Arn
Parameters:
number.$: $.number-squared
ResultPath: $.number-doubled
End: true
End: true
LoggingConfiguration:
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt LogGroup.Arn
IncludeExecutionData: true
Level: ALL
RoleArn: !GetAtt StateMachineRole.Arn
StateMachineName: !Ref Prefix
StateMachineType: STANDARD
Code language: YAML (yaml)
First Step
Generates test data to validate Map state.
The built-in function States.ArrayRange is used to generate test data.
In this case, “States.ArrayRange(0, 9, 1)” will generate 10 numbers.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Code language: plaintext (plaintext)
For more information on how to use this function, please refer to the following page.
By setting the Parameters property to “numbers.$: States.ArrayRange(0, 9, 1)”, the generated array is set to numbers.
Specifically, the following data will be generated.
{
"numbers": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
}
Code language: plaintext (plaintext)
Map State
The MaxConcurrency property allows you to set the number of concurrent executions.
In this case, by specifying “5”, 10 pieces of data will be processed twice.
The InputPath property allows you to set the data to be received.
In this case, by specifying “$.numbers”, the following data will be iterated over.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Code language: plaintext (plaintext)
The ItemSelector property allows you to specify the format of the data in a single operation.
By setting “number-origin.$: $$.Map.Item.Value”, for example, the following data will be generated.
{
"number-origin": 0
}
Code language: plaintext (plaintext)
The details of the Map state can be set with the ItemProcessor property.
The ProcessorConfig property allows you to set the mode of the Map state.
Details are available on the following page, but in this case we will specify the inline mode.
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-inline-vs-distributed-map.html
The States property allows you to define sub-states within a Map state.
First sub-state
By setting “number.$: $.number-origin” in the Parameters property, a number is set to number as follows.
{
"number": 0
}
Code language: plaintext (plaintext)
By specifying a Lambda function in the Resource property, the square of the passed number is calculated.
Resources:
Function1:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: |
def lambda_handler(event, context):
num = event['number']
num_squared = num ** 2
return num_squared
FunctionName: !Sub "${Prefix}-function-01"
Handler: !Ref Handler
Runtime: !Ref Runtime
Role: !GetAtt FunctionRole.Arn
Code language: YAML (yaml)
The code to be executed by the Lambda function in inline format.
For more information, please refer to the following page.
Accesses the number of the event object and receives the argument as set earlier in the Parameters property.
The value received as an argument is squared and returned.
By setting “$.number-squared” in the ResultPath property, the result of the aforementioned function can be set to number-squared, as shown below.
{
"number-squared": 0
}
Code language: plaintext (plaintext)
Second sub-state
Basically the same as the first sub-state.
The Lambda function specified in the Resource property will calculate twice the number passed in.
Resources:
Function2:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: |
def lambda_handler(event, context):
num= event['number']
num_doubled = num * 2
return num_doubled
FunctionName: !Sub "${Prefix}-function-02"
Handler: !Ref Handler
Runtime: !Ref Runtime
Role: !GetAtt FunctionRole.Arn
Code language: YAML (yaml)
By setting “$.number-doubled” in the ResultPath property, the result of the aforementioned function can be set to number-doubled, as shown below.
{
"number-doubled": 0
}
Code language: plaintext (plaintext)
Architecting
Use CloudFormation to build this environment and check its actual behavior.
Create CloudFormation stacks and check the resources in the stacks
Create CloudFormation stacks.
For information on how to create stacks and check each stack, please refer to the following pages.
After reviewing the resources in each stack, information on the main resources created in this case is as follows
- Step Functions state machine: fa-121
- Lambda function 1: fa-121-function-01
- Lambda function 2: fa-122-function-02
Check the resources created from the AWS Management Console.
Check the state machine.
It has been created successfully.
You can see that the state machine containing the Map state has been created.
Within the Map state, there are two sub-states, each configured to call a Lambda function.
Check the Lambda function.
Indeed, two functions are created.
Operation Check
Now that you are ready, run the state machine.
The state machine begins to operate.
Parallel processing is taking place in the Map state.
After a short wait, the state machine execution completes successfully.
Check the output after execution.
You can see the result of the numbers 0-9 processed by the two functions.
Thus, Map state can be used to perform parallel processing.
Summary
We built a Step Functions state machine using Map state.