Iteration using Map in Step Functions

Iteration using Map in Step Functions

Iterate using the Map state in Step Functions

The Map state allows for iterative processing.

Use the Map state to run a set of workflow steps for each item in a dataset. The Map state’s iterations run in parallel, which makes it possible to process a dataset quickly.

Map

This time we will build a Step Functions state machine using Map state.

Environment

Diagram of iteration using Map in Step Functions

Create a Step Functions state machine.
A state machine consists of two main states.

  • First state: Array is created with the built-in function States.ArrayRange.
  • Second state: Map state that processes each number in the array created in the previous state.

The Map state consists of the following two sub-states

  • First substate: Use a Lambda function to return a number received as an argument multiplied by two.
  • Second substate: Use a Lambda function to multiply a number received in an argument by two and return it.

The runtime environment for the function is Python 3.8.

CloudFormation template files

The above configuration is built with CloudFormation.
The CloudFormation templates are placed at the following URL

https://github.com/awstut-an-r/awstut-fa/tree/main/121

Explanation of key points of template files

This page focuses on the Map state of Step Functions.

For information on how to create a Step Functions state machine, please see the following page.

https://awstut.com/en/2022/06/18/introduction-to-step-functions-with-cfn-en

State Machine

Resources:
  StateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      Definition:
        Comment: !Sub "${Prefix}-StateMachine"
        StartAt: FirstState
        States:
          FirstState:
            Type: Pass
            Parameters:
              numbers.$: States.ArrayRange(0, 9, 1)
            Next: MapState
          MapState:
            Type: Map
            MaxConcurrency: 5
            InputPath: $.numbers
            ItemSelector:
              number-origin.$: $$.Map.Item.Value
            ItemProcessor:
              ProcessorConfig:
                Mode: INLINE
              StartAt: SecondState
              States:
                SecondState:
                  Type: Task
                  Resource: !Ref Function1Arn
                  Parameters:
                    number.$: $.number-origin
                  ResultPath: $.number-squared
                  Next: LastState
                LastState:
                  Type: Task
                  Resource: !Ref Function2Arn
                  Parameters:
                    number.$: $.number-squared
                  ResultPath: $.number-doubled
                  End: true
            End: true
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
              LogGroupArn: !GetAtt LogGroup.Arn
        IncludeExecutionData: true
        Level: ALL
      RoleArn: !GetAtt StateMachineRole.Arn
      StateMachineName: !Ref Prefix
      StateMachineType: STANDARD
Code language: YAML (yaml)

First Step

Generates test data to validate Map state.

The built-in function States.ArrayRange is used to generate test data.
In this case, “States.ArrayRange(0, 9, 1)” will generate 10 numbers.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Code language: plaintext (plaintext)

For more information on how to use this function, please refer to the following page.

https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-intrinsic-functions.html#asl-intrsc-func-arrays

By setting the Parameters property to “numbers.$: States.ArrayRange(0, 9, 1)”, the generated array is set to numbers.
Specifically, the following data will be generated.

{
"numbers": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
}
Code language: plaintext (plaintext)

Map State

The MaxConcurrency property allows you to set the number of concurrent executions.
In this case, by specifying “5”, 10 pieces of data will be processed twice.

The InputPath property allows you to set the data to be received.
In this case, by specifying “$.numbers”, the following data will be iterated over.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Code language: plaintext (plaintext)

The ItemSelector property allows you to specify the format of the data in a single operation.
By setting “number-origin.$: $$.Map.Item.Value”, for example, the following data will be generated.

{
"number-origin": 0
}
Code language: plaintext (plaintext)

The details of the Map state can be set with the ItemProcessor property.

The ProcessorConfig property allows you to set the mode of the Map state.
Details are available on the following page, but in this case we will specify the inline mode.

https://docs.aws.amazon.com/step-functions/latest/dg/concepts-inline-vs-distributed-map.html

The States property allows you to define sub-states within a Map state.

First sub-state

By setting “number.$: $.number-origin” in the Parameters property, a number is set to number as follows.

{
"number": 0
}
Code language: plaintext (plaintext)

By specifying a Lambda function in the Resource property, the square of the passed number is calculated.

Resources:
  Function1:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          def lambda_handler(event, context):
            num = event['number']
            num_squared = num ** 2

            return num_squared
      FunctionName: !Sub "${Prefix}-function-01"
      Handler: !Ref Handler
      Runtime: !Ref Runtime
      Role: !GetAtt FunctionRole.Arn
Code language: YAML (yaml)

The code to be executed by the Lambda function in inline format.
For more information, please refer to the following page.

https://awstut.com/en/2022/02/02/3-parterns-to-create-lambda-with-cloudformation-s3-inline-container

Accesses the number of the event object and receives the argument as set earlier in the Parameters property.
The value received as an argument is squared and returned.

By setting “$.number-squared” in the ResultPath property, the result of the aforementioned function can be set to number-squared, as shown below.

{
"number-squared": 0
}
Code language: plaintext (plaintext)

Second sub-state

Basically the same as the first sub-state.

The Lambda function specified in the Resource property will calculate twice the number passed in.

Resources:
  Function2:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          def lambda_handler(event, context):
            num= event['number']
            num_doubled = num * 2

            return num_doubled
      FunctionName: !Sub "${Prefix}-function-02"
      Handler: !Ref Handler
      Runtime: !Ref Runtime
      Role: !GetAtt FunctionRole.Arn
Code language: YAML (yaml)

By setting “$.number-doubled” in the ResultPath property, the result of the aforementioned function can be set to number-doubled, as shown below.

{
"number-doubled": 0
}
Code language: plaintext (plaintext)

Architecting

Use CloudFormation to build this environment and check its actual behavior.

Create CloudFormation stacks and check the resources in the stacks

Create CloudFormation stacks.
For information on how to create stacks and check each stack, please refer to the following pages.

https://awstut.com/en/2021/12/11/cloudformations-nested-stack

After reviewing the resources in each stack, information on the main resources created in this case is as follows

  • Step Functions state machine: fa-121
  • Lambda function 1: fa-121-function-01
  • Lambda function 2: fa-122-function-02

Check the resources created from the AWS Management Console.

Check the state machine.

Detail of Step Functions 1.

It has been created successfully.
You can see that the state machine containing the Map state has been created.
Within the Map state, there are two sub-states, each configured to call a Lambda function.

Check the Lambda function.

Detail of Lambda 1.
Detail of Lambda 2.

Indeed, two functions are created.

Operation Check

Now that you are ready, run the state machine.

Detail of Step Functions 2.
Detail of Step Functions 3.

The state machine begins to operate.

Detail of Step Functions 4.

Parallel processing is taking place in the Map state.

After a short wait, the state machine execution completes successfully.

Detail of Step Functions 5.

Check the output after execution.

Detail of Step Functions 6.

You can see the result of the numbers 0-9 processed by the two functions.

Thus, Map state can be used to perform parallel processing.

Summary

We built a Step Functions state machine using Map state.