Introduction to

AWS Step Functions

About St. Louis Serverless

About Jack Frosch

1st Career

About Jack Frosch

2nd Career

Announcements

  • October 1st
    • St. Louis Serverless Meetup
    • Deep Dive into AWS Cloud Development Kit (CDK)
  • October 10th
  • November 5th
    • St. Louis Serverless Meetup
    • Lessons Learned with Lambda (Guest Speaker)
  • December 3rd
    • St. Louis Serverless Meetup
    • Serverless on Azure (Guest Speaker)

Overview

  • What is AWS Step Functions
  • Why it's important
  • Step Functions By The Numbers
  • Core Concepts
  • Step Functions Example
  • Service Integrations
  • Developing Step Functions

What is AWS Step Functions?

AWS Step Functions

  • An Amazon hosted service for orchestrating workflows
  • Highly available
  • Security using Identity and Access Management (IAM)
  • Works with other AWS services as well as on-premises services
  • Built-in error and retry handling
  • Execution event history
  • Use via console, CLI, API, and the CDK 
  • It's a serverless offering
    • No servers to configure and maintain
    • Automatically scales up and down, all way to zero
    • Pay nothing when not in use

AWS Step Functions are based on concepts of tasks and state machines

State Machine

  • An abstract machine
  • Can be in exactly one of finite number of states
  • State transitions occur as a result of some event
    • Internal event; e.g. task completion
    • External event; e.g. message received
  • Great for modeling workflows and processes

Example - Batch Job Workflow

Task

Choice

Task

Task (end=true)

Wait

Fail

Why it's important

Meet the Monolith

Pros

  • Already exists
  • Easy to debug from beginning to end
  • One thing to manage
  • Works most of the time

Cons

  • Hard to fully understand 
  • Increasing accidental complexity
  • High coupling / low cohesion
  • To scale one, must scale all
  • Expensive regression testing
  • Bugs harder to find
  • Old technology lock-in
  • Inflexible architecture
  • Developer recruitment and retention
  • Slower release cadence
  • Desire for risk avoidance overcomes desire to innovate
  • Unconstrained competitors

Microservices

Aurora

DynamoDB

S3

Elasticache

SNS

SQS

API Gateway

SES

EventBridge

Lambda

ECS

Batch

Fargate

Aurora

DynamoDB

S3

Elasticache

SNS

SQS

SES

EventBridge

Lambda

ECS

Batch

Fargate

  • Handle service invocations
  • Handle scheduling
  • Handle branching
  • Handle errors
  • Handle retries
  • Handle failures
  • Chain to other workflows
  • This leads to
    • Increased complexity
    • Increased coupling

We need to externalize the  workflow orchestration.

We want to spend our time solving business problems, but we write code to

Microservices

Aurora

DynamoDB

S3

Elasticache

SNS

SQS

API Gateway

SES

EventBridge

Lambda

ECS

Batch

Fargate

Step f() By The Numbers

  • Price
    • 4,000 state transitions per month free
    • $0.025 per 1,000 state transitions
  • State Machine Execution Limits
    • 1,000,000 open executions
    • 1 year max execution length / idle time
    • 25,000 execution events in history
    • 90 days execution history retention
  • Task Execution Limits
    • 1,000 pollers calling GetActivityTask
    • 32,768 characters input/output
  • Account Limits
    • 10,000 registered activities
    • 10,000 registered state machines
    • 1MB per API request​

Core Concepts

  • AWS States Language
  • Defining a State Machine
  • States
  • State Transitions
  • Data handling in states
  • Handling Errors
  • State Types

Amazon States Language

  • A DSL for specifying a state machine in JSON format
  • Copyright Amazon.com
  • Code examples in DSL are Apache 2.0

Defining a State Machine

  • Must have a "States" field
  • Must have one and only one "StartAt" field referencing one of the states
  • May have a "Comment field"
  • May have a "Version" field specifying the States Language version (if omitted, 1.0 is assumed)
{
  "Comment": "My Batch Job workflow",
  "StartAt": "Submit Job",
  "States": {
    "Submit Job": {

States

  • States are defined in the top-level "States" object
  • States describe tasks ("units of work") or flow control
  • State name must be unique in scope of state machine
  • State name must <= 128 Unicode characters
  • Each state must have a "Type" field
  • Each state may have a "Comment" field
  • Most state types have additional requirements
  • Any state other than types Choice, Succeed, and Fail may have a boolean field named "End"
  • Terminal states have {"End": true} or are of type Succeed or Fail

Simple State Example

"HelloWorld": {
  "Type": "Task",
  "Resource": "arn:aws:lambda:us-east-1:123456789012:function:HelloWorld",
  "Next": "NextState",
  "Comment": "Executes the HelloWorld Lambda function"
}

State Transitions

  • After executing the action in a non-terminal state, the state machine transitions to the state specified in "Next" field
  • The state transitioned to from a Choice state type is determined by the logic in the Choice state
  • A state can have multiple incoming transitions from other states

States Data

  • Interpreter passes data between states
  • All data must be in JSON format
  • Initial data may be provided to the start state
  • If no data provided an empty JSON object is passed; i.e. { }
  • A state can create output data which must be JSON
  • Numbers generally conform to JavaScript double precision, IEEE-854 values
  • Strings, booleans, and numbers are valid JSON texts

States Data - Path Expressions

  • When states need to access specific fields, they can use JsonPath expressions
  • A Path expression starts with a $
  • $$ path expression means path is taken from context object
  • A Reference Path is a Path that resolves to a single node in the JSON data
  • The operators “@”, “,”, “:”, and “?” are not supported

States Data - Timestamps

  • Timestamps used in data must conform to RFC3339 profile of ISO 8601
  • A 'T character must be used to separate date and time
  • If a numeric timezone offset not used, a capital 'Z' must terminate the string; e.g.
2016-03-14T01:59:00Z

States Data - Example

{
    "foo": 123,
    "bar": ["a", "b", "c"],
    "car": {
        "cdr": true
    }
}
$.foo => 123
$.bar => ["a", "b", "c"]
$.car.cdr => true

More Valid Examples

$.store.book
$.store\.book
$.\stor\e.boo\k
$.store.book.title
$.foo.\.bar
$.foo\@bar.baz\[\[.\?pretty
$.&Ж中.\uD800\uDF46
$.ledgers.branch[0].pending.count
$.ledgers.branch[0]
$.ledgers[0][22][315].foo
$['store']['book']
$['store'][0]['book']

States Data - Input/Output

  • By default, states append the results data with the full input data to form the output data
  • However, a state may only be interested in a subset of the input data, or even restructure it differently than the input
  • Four optional fields exist for this...
  • InputPath
    • A path selecting some or all of the state's input
    • Default is $
  • Parameters
    • Any value constructed from the input
    • Becomes the effective input
    • Has no default

States Data - Input/Output (cont'd)

  • ResultPath
    • Input data + state result
    • Must be a reference path
    • Default is $ (overwrites and replaces input)
  • OutputPath
    • A path applied to the ResultPath
    • Yields the effective output that is input to next state
    • Defaults to $; i.e. the ResultPath

States Data - Input/Output (cont'd)

States Data - Input/Output Example 1

{
  "StartAt": "Add",
  "States": {   
    "Add": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:Add",
      "InputPath": "$.numbers",
      "ResultPath": "$.sum"
      "End": true
    }
  }
}

States Data - Input/Output Example 1

{
  "title": "Numbers to add",
  "numbers": { "val1": 3, "val2": 4 }
}

Input:

{
  "title": "Numbers to add",
  "numbers": { "val1": 3, "val2": 4 },
  "sum": 7
}

Output:

States Data - Input/Output Example 2

"Preapre Data": {
  "Type": "Task",
  "Resource": "arn:aws:swf:us-east-1:123456789012:task:X",
  "Next": "Y",
  "Parameters": {
    "flagged": true,
    "parts": {
      "first.$": "$.vals[0]",
      "last3.$": "$.vals[3:]"
    }
  }
}

States Data - Input/Output Example 2

{
  "flagged": 7,
  "vals": [0, 10, 20, 30, 40, 50]
}

Input:

{
  "flagged": true,
  "parts": {
    "first": 0,
    "last3": [30, 40, 50]
  }
}

Output:

Handling Errors

  • Runtime errors are identified by case-sensitive strings, called Error Names
  • The States Language has some reserved Error Names that all begin with "States."
  • Custom error names must not start with "States."
  • Error handling has two flavors, which can be used together
    • Retriers
    • Catchers

States - Predefined Errors

States - Built-in Error Names

States - Retriers

  • Task and Parallel State types may include a "Retry" field that defines an array of Retriers
  • Required Retrier fields:
    • "ErrorEquals" array - specifies error names handled
  • Optional Retrier fields...

States - Optional Retrier Fields

  • "MaxAttempts"
    • Non-negative, integer field (0 means never)
    • Default = 3
  • "IntervalSeconds"
    • Positive, integer field representing delay after error before first retry until
    • Default = 1
  • "BackoffRate"
    • Multiplier of retry interval applied after every retry
    • Default = 2.0

States - Retrier Example

"Retry" : [ {
      "ErrorEquals": [ "States.Timeout" ],
      "IntervalSeconds": 3,
      "MaxAttempts": 3,
      "BackoffRate": 2.0
    } ]

Interpretation:

- If a timeout error occurs, wait 3 seconds, then retry

- If a timeout occurs again, wait 3.0s x 2.0 (6.0 seconds), then retry

- If a timeout occurs again, wait 6.0s x 2.0 (12.0 seconds), then retry

State Errors - Catchers

  • Task States and Parallel States may define a "Catch" field
  • The "Catch" field defines an array of Catchers
  • A Catcher must specify an "ErrorEquals" field specifying an array of error names
  • A Catcher must specify a "Next" field specifying an existing State name
  • Retriers, if specified, execute first
  • If error still exists, Catchers are evaluated and processed

State Errors - Catchers (cont'd)

  • When a Catcher causes a transition to the "Next" state, the output must contain a string field named "Error" containing the error name
  • The error output should contain a string field named "Cause" containing human-readable error information
  • A Catcher may have a "ResultPath" field so the error output is appended to the input data

State Errors - Catchers Example

"Catch": [
  {
    "ErrorEquals": [ "java.lang.Exception" ],
    "ResultPath": "$.error-info",
    "Next": "RecoveryState"
  },
  {
    "ErrorEquals": [ "States.ALL" ],
    "Next": "EndMachine"
  }
]

State Types and Fields

State Type: Pass

  • A no-op state used for development and testing
  • Merely passes the input data to the output
  • Optional "Result" field representing the stub data

State Type: Pass Example

"No-op": {
  "Type": "Pass",         
  "Result": {
    "x-datum": 0.381018,
    "y-datum": 622.226993
  },
  "ResultPath": "$.coords",
  "Next": "End"
}
{ "geo-ref": "Home" }

Input

{ 
  "geo-ref": "Home",
  "coords": {
    "x-datum": 0.381018,
    "y-datum": 622.226993
  }
}

Output

State Type: Task

  • A state that does work through a specified resource
  • The "Resource" field specifying a URI is required
  • The States Language does not restrict the URI, but in AWS the value will be an Amazon Resource Name (ARN)
  • Optionally timeouts in positive integer seconds
    • "TimeoutSeconds" (default = 60)
    • "HeartbeatSeconds"
      • Must be smaller than TimeoutSeconds
      • Workers can call back via SendTaskHeartbeat
    • ​If either timeout exceeded, a "States.Timeout" error occurs

State Type: Task Example

"TaskState": {
  "Comment": "Task State example",
  "Type": "Task",
  "Resource": "arn:aws:swf:us-east-1:123456789012:task:HelloWorld",
  "Next": "NextState",
  "TimeoutSeconds": 300,
  "HeartbeatSeconds": 60
}

State Type: Choice

  • A state that adds branching logic to a state machine
  • Must have a "Choices" field with non-empty array
  • Each element of the array is called a Choice Rule
  • A Choice Rule has
    • A comparison operation
    • A "Next" field which must match an existing state
  • The first Choice Rule in the array with an exact match is the choice taken
  • Choice states may have a "Default" field used if no Choice Rule matches

Choice State Comparison Operators

  • StringEquals
  • StringLessThan
  • StringGreaterThan
  • StringLessThanEquals
  • StringGreaterThanEquals
  • NumericEquals
  • NumericLessThan
  • NumericGreaterThan
  • NumericLessThanEquals
  • NumericGreaterThanEquals
  • BooleanEquals
  • TimestampEquals
  • TimestampLessThan
  • TimestampGreaterThan
  • TimestampLessThanEquals
  • TimestampGreaterThanEquals
  • And
  • Or
  • Not

State Type: Choice Example

"ChoiceStateX": {
  "Type" : "Choice",
  "Choices": [
    {
        "Not": {
          "Variable": "$.type",
          "StringEquals": "Private"
        },
        "Next": "Public"
    },
    {
      "And": [
        {
          "Variable": "$.value",
          "NumericGreaterThanEquals": 20
        },
        {
          "Variable": "$.value",
          "NumericLessThan": 30
        }
      ],
      "Next": "ValueInTwenties"
    }
  ],
  "Default": "DefaultState"
},
...
...

"Public": {
  "Type" : "Task",
  "Resource": 
  "arn:aws:lambda:...:function:Foo",
  "Next": "NextState"
},

"ValueInTwenties": {
  "Type" : "Task",
  "Resource": 
  "arn:aws:lambda:...:function:Bar",
  "Next": "NextState"
},

"DefaultState": {
  "Type": "Fail",
  "Cause": "No Matches!"
}

State Type: Wait

  • A state that adds a delay to a state machine
  • The delay time can be specified using different fields
  • "Seconds" - wait duration (in seconds)
  • "SecondsPath" - Delay seconds from input path
  • "Timestamp" - Datetime expiry value in ISO-8601 format
  • "TimestampPath" - Datetime expiry value from input

State Type: Wait Examples

"wait_ten_seconds" : {
  "Type" : "Wait",
  "Seconds" : 10,
  "Next": "NextState"
}
"wait_until" : {
  "Type": "Wait",
  "Timestamp": "2016-03-14T01:59:00Z",
  "Next": "NextState"
}
"wait_some_seconds" : {
  "Type" : "Wait",
  "SecondsPath" : "$.secondsDelay",
  "Next": "NextState"
}
"wait_until" : {
    "Type": "Wait",
    "TimestampPath": "$.expirydate",
    "Next": "NextState"
}

State Type: Parallel

  • A state that causes parallel execution of "branches"
  • Must have a "Branches" field that is an array of branch objects
  • Each branch must have a "StartAt" field
  • Each branch must have a "States" field
  • A branch's State may have a "Next" field pointing to a State in that branch's States array, but not outside it
  • Parallel State's "Next" field not processed until all branches complete

State Type: Parallel (cont'd)

  • Any uncaught error or transitioning to a Fail state in a branch fails the whole Parallel State
  • If the Parallel State does not handle the error, the entire state machine will be marked as failed
  • Parallel State passes the input (or InputPath) to each branch's "StartAt" state
  • Parallel State aggregates each branch's output into an output array, without requirement for each branches output to be same shape

Parallel State Example 1

"LookupCustomerInfo": {
  "Type": "Parallel",
  "Branches": [
    {
      "StartAt": "LookupAddress",
      "States": {
        "LookupAddress": {
          "Type": "Task",
          "Resource": 
            "arn:aws:lambda:us-east-1:123456789012:function:AddressFinder",
          "End": true
        }
      }
    },
    {
      "StartAt": "LookupPhone",
      "States": {
        "LookupPhone": {
          "Type": "Task",
          "Resource": 
            "arn:aws:lambda:us-east-1:123456789012:function:PhoneFinder",
          "End": true
        }
      }
    }
  ],
  "Next": "NextState"
}

Parallel State Example 2

"FunWithMath": {
  "Type": "Parallel",
  "Branches": [
    {
      "StartAt": "Add",
      "States": {
        "Add": {
          "Type": "Task",
          "Resource": "arn:aws:swf:::task:Add",
          "End": true
        }
      }
    },
    {
      "StartAt": "Subtract",
      "States": {
        "Subtract": {
          "Type": "Task",
          "Resource": "arn:aws:swf:::task:Subtract",
          "End": true
        }
      }
    }
  ],
  "Next": "NextState"
}
[3, 2]

Input:

[5, 1]

Output:

State Type: Succeed

  • A state that terminates a state machine successfully
  • A terminal state; i.e. has no "Next" field
  • Useful as candidates for "Next" states in Choice States
"Completed": {
  "Type": "Succeed"
}

State Type: Fail

  • A state that terminates a state machine as failed
  • A terminal state; i.e. has no "Next" field
  • Must have a field named "Error" that specifies error name
  • Must have a field named "Cause" used to provide human readable message

State Type: Fail Example

"FailState": {
    "Type": "Fail",
    "Error": "States.Timeout",
    "Cause": "Report generation timed out"
}

A Batch Job Example

State machine example

{
  "Comment": "An example of the Amazon States Language.",
  "StartAt": "Submit Job",
  "States": {
    "Submit Job": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:SubmitJob",
      "ResultPath": "$.guid",
      "Next": "Wait X Seconds",
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    },
    "Wait X Seconds": {
      "Type": "Wait",
      "SecondsPath": "$.wait_time",
      "Next": "Get Job Status"
    },
    ...

State machine example cont'd

...
"Get Job Status": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:CheckJob",
      "Next": "Job Complete?",
      "InputPath": "$.guid",
      "ResultPath": "$.status",
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    },
    "Job Complete?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "Next": "Job Failed"
        },
        {
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "Get Final Job Status"
        }
      ],
      "Default": "Wait X Seconds"
    },
...    

State machine example cont'd

...
"Job Failed": {
      "Type": "Fail",
      "Cause": "AWS Batch Job Failed",
      "Error": "DescribeJob returned FAILED"
    },
    "Get Final Job Status": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:CheckJob",
      "InputPath": "$.guid",
      "End": true,
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    }
  }
}

A Simple State Machine

Task

Choice

Task

Task (end=true)

Wait

Fail

A Simple State Machine

{
  "Comment": "A simple AWS Batch workflow",
  "StartAt": "Submit Job",
  "States": {
    "Submit Job": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:SubmitJob",
      "ResultPath": "$.guid",
      "Next": "Wait X Seconds",
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    },
    ...
}

A Simple State Machine

...
  "Wait X Seconds": {
    "Type": "Wait",
    "SecondsPath": "$.wait_time",
    "Next": "Get Job Status"
   },
...

A Simple State Machine

...
"Get Job Status": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:CheckJob",
      "Next": "Job Complete?",
      "InputPath": "$.guid",
      "ResultPath": "$.status",
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    },
...

A Simple State Machine

...
"Job Complete?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "Next": "Job Failed"
        },
        {
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "Get Final Job Status"
        }
      ],
      "Default": "Wait X Seconds"
    },
...

A Simple State Machine

...
"Get Final Job Status": {
      "Type": "Task",
      "Resource":
      "arn:<PARTITION>:lambda:::function:CheckJob",
      "InputPath": "$.guid",
      "End": true,
      "Retry": [
        {
          "ErrorEquals": ["States.ALL"],
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ]
    }
...

A Simple State Machine

...
"Job Failed": {
      "Type": "Fail",
      "Cause": "AWS Batch Job Failed",
      "Error": "DescribeJob returned FAILED"
    },
...

Service Integrations

  • Step Functions works directly with some AWS Services

  • No Lambda required!

  • Examples

    • Launch a Batch Job and consume its results

    • Insert or get a record from DynamoDB

    • Publish to SNS topic or to a SQS queue

    • Even launch another Step Functions State Machine

Service Integration Patterns

  • Request / Response

  • Run a job

  • Wait for a callback with the Task Token

  • Request sent to resource
  • When HTTP response received, transition to next state
  • Will not wait for job to complete
"Send message to SNS":{  
   "Type":"Task",
   "Resource":"arn:aws:states:::sns:publish",
   "Parameters":{  
      "TopicArn":"arn:aws:sns:us-east-1:123456789012:myTopic",
      "Message":"Hello from Step Functions!"
   },
   "Next":"NEXT_STATE"
}

Request / Response Pattern

"Run a Job" Pattern

  • Request sent to resource identified by ARN with .sync
  • Wait for job to complete
"Manage Batch task": {
  "Type": "Task",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:us-east-2:123456789012:job-definition/testJobDefinition",
    "JobName": "testJob",
    "JobQueue": "arn:aws:batch:us-east-2:123456789012:job-queue/testQueue"
  }, "Next": "NEXT_STATE"
}

"Wait for callback with Task Token" Pattern

  • Request sent to resource with ARN ending in .waitForTaskToken
  • Pass a TaskToken parameter
  • Wait for a callback to the SendTokenSuccess or SendTokenFailure API endpoint
  • The callback must contain the TaskToken

"Wait for callback with Task Token" Pattern Example

Post Message w/ TT

Pull Message

Step F()

SQS

Callback w/ TT

"Wait for callback with Task Token" Pattern Example

"Send message to SQS": {
  "Type": "Task",
  "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
  "Parameters": {
    "QueueUrl": "https://sqs.us-east-2.../myQueue",
    "MessageBody": {
        "Message": "Hello from Step Functions!",
        "TaskToken.$": "$$.Task.Token"
     }
  }, "Next": "NEXT_STATE" }

Service Integrations

Developing Step Functions

  • AWS Console (Demo)
  • Step Functions Local
  • Cloud Development Kit (CDK)
  • SDK, HTTPS API, CLI <eom>

Console Demos

  • stls_HelloWorldExample

  • WaitForCallbackStateMachine

  • DynamoDBToSQS

  • NestingPatternMainStateMachine

Step Functions Local 

  • Download and run a Java JAR
    • https://s3.amazonaws.com/stepfunctionslocal/StepFunctionsLocal.tar.gz
      • https://s3.amazonaws.com/stepfunctionslocal/StepFunctionsLocal.tar.gz.md5
    • https://s3.amazonaws.com/stepfunctionslocal/StepFunctionsLocal.zip
      • https://s3.amazonaws.com/stepfunctionslocal/StepFunctionsLocal.zip.md5
    • java -jar StepFunctionsLocal.jar -v
  • Download and Run a Docker image
    • docker pull amazon/aws-stepfunctions-local
    • docker run -p 8083:8083 amazon/aws-stepfunctions-local

Step Functions Local - JAR

Step Functions Local - Docker

docker run -p 8083:8083 /

--env-file aws-stepfunctions-local-credentials.txt /

amazon/aws-stepfunctions-local

AWS Cloud Development Kit (CDK) - (Deep Dive in October!)

import sfn = require('@aws-cdk/aws-stepfunctions');
import tasks = require('@aws-cdk/aws-stepfunctions-tasks');

const submitLambda = new lambda.Function(this, 'SubmitLambda', { ... });
const getStatusLambda = new lambda.Function(this, 'CheckLambda', { ... });

const submitJob = new sfn.Task(this, 'Submit Job', {
    task: new tasks.InvokeFunction(submitLambda),
    // Put Lambda's result here in the execution's state object
    resultPath: '$.guid',
});

const waitX = new sfn.Wait(this, 'Wait X Seconds', {
    duration: sfn.WaitDuration.secondsPath('$.wait_time'),
});

const getStatus = new sfn.Task(this, 'Get Job Status', {
    task: new tasks.InvokeFunction(getStatusLambda),
    // Pass just the field named "guid" into the Lambda, put the
    // Lambda's result in a field called "status"
    inputPath: '$.guid',
    resultPath: '$.status',
});

const jobFailed = new sfn.Fail(this, 'Job Failed', {
    cause: 'AWS Batch Job Failed',
    error: 'DescribeJob returned FAILED',
});

const finalStatus = new sfn.Task(this, 'Get Final Job Status', {
    task: new tasks.InvokeFunction(getStatusLambda),
    // Use "guid" field as input, output of the Lambda becomes the
    // entire state machine output.
    inputPath: '$.guid',
});

const definition = submitJob
    .next(waitX)
    .next(getStatus)
    .next(new sfn.Choice(this, 'Job Complete?')
        // Look at the "status" field
        .when(sfn.Condition.stringEquals('$.status', 'FAILED'), jobFailed)
        .when(sfn.Condition.stringEquals('$.status', 'SUCCEEDED'), finalStatus)
        .otherwise(waitX));

new sfn.StateMachine(this, 'StateMachine', {
    definition,
    timeout: Duration.minutes(5)
});

Summary

  • AWS Step Functions represent a shared canvas for discussions with business
  • They help us decouple workflow management from core business logic
  • They have robust error and retry support
  • They have very limited programming logic support

Resources

Questions?

Introduction to AWS Step Functions

By Jack Frosch

Introduction to AWS Step Functions

  • 2,402