This document describes a domain specific language (DSL) for AWS Step Function (SFN) state machines. The using the Python stepfunctions library the DSL can be compiled down to the AWS States Language.
For more information on the Python stepfunctions library or its use visit the libraries page.
Copyright 2016 The Johns Hopkins University Applied Physics Laboratory
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
When Amazon released AWS Step Functions they provided a language definition for writing state machine in. While functional, it is cumbersome to write and maintain state machines in their Json format. This DSL is designed to make it easier to read, write, and maintain step function state machines.
The biggest benefit of using the DSL for writing a state machine is that when compiled to the AWS Json format by a library like stepfunctions library the states can be automatically linked together, instead of manually having to specify the next state for each state.
The single flow control state has be translated into two of the basic flow control operations used in programming (if/elif/else and while loop).
The DSL's style is influenced by Python code style.
- It is an indent based language where the level of indent is used to specify a block of code.
- Strings can be defined using
'
,"
,'''
,"""
- Doc String style comments for states
The SFN DSL format is an optional top level comment followed by a list of states.
"""Simple Example of the SFN DSL"""
Lambda('HelloWorld')
Execution of the state machine is started at the first state in the file and execution proceedes until the state at the end of the file is reached or until a state terminates execution.
In this example there is one state. The full ARN for the Lambda will be determined when the DSL is compiled into the AWS Json format. The full ARN can be passed if the desired Lambda doesn't reside in the same account or region as the connection used to compile and create the state machine.
There is a predefined set of basic errors that can happen. State machine errors reference.
The SFN DSL supports comparison against timestamp values. The way a timestamp is
determined, compared to a regular string, is that it can be parsed as a timestamp
according to RFC3339. This format often looks like yyyy-mm-ddThh:mm:ssZ
. If a
timestamp is not in the correct format the comparison will be performed as a
string comparison.
State machines use a version of JsonPath for referencing data that is is being processed. State machine path reference.
A SFN DSL file consists of three optional fields followed by a list of States.
"""State machine comment"""
version: "1.0"
timeout: int
The top level comment is the comment for the state machine that is created.
version
: Is a string of the version number of the State Machine language to compile down to. Currently only"1.0"
is supported.timeout
: The overall timeout in seconds for the whole state machine execution. If the state machine has not finished execution within this time the execution fails with aStates.Timeout
error.
The different types of state machine states are divided into two categories. Basic states are those that perform a single action and, potentially, link to another state. Flow control states are those that apply some flow control logic.
A terminal state, Success()
will cause the state machine to terminate execution
successfully and return a result value.
Success()
"""State Name
State Comment"""
input: JsonPath
output: JsonPath
States can have a Python style doc string. If given, the first line of the doc string is the state's name and the rest if the states comment. If no name is given (or an empty name) the state's name is built from the line number.
Modifiers:
input
: JsonPath selecting a value from the input object to be passed to the current state (Default:"$"
)output
: JsonPath selecting a value from the output object to be passed to the next state (Default:"$"
)
A terminal state, Fail()
will cause the state machine to terminate execution
unsuccessfully with the given error and cause values.
Fail(error, cause)
"""State Name
State Comment"""
Arguments:
error
: String containing the error valuecause
: String containing the error's cause, a more readable value
A state that does nothing Pass()
can be used to modify the data being passed
around or inject new data into the results.
Pass()
"""State Name
State Comment"""
input: JsonPath
result: JsonPath
output: JsonPath
parameters:
Key1: {"JSON": "Text"}
Key2: "string-value"
data:
Json
Modifiers:
result
: JsonPath of where to place the results of the state, relative to the raw input (before theinput
modifier was applied) (Default:"$"
)parameters
: Keyword arguments to be passed in the API call. The value is JSON text. The parameters will override the the incoming input, but JsonPaths may be used to select from existing inputs. NOTE: If the value contains a JsonPath the key must end wit.$
AWS documentation: https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.htmldata
: A block of Json data that will be used as the result of the state
A task is a unit of work to be performed by the state machine and it is broken into two different categories. The first is AWS maintained APIs that can be used to execute a limited number of functions from other AWS services. The second is user created and/or maintained workers that perform whatever custom logic is coded.
AWS maintained APIs are in the format _Type_._Function_()
and use the
parameters
block to pass argument. Some of the APIs are for asynchronous
functions. By default Heaviside is configured to make these API calls as
synchronous calls, so that the Step Function waits for the function's return.
To disable this default behavior the sync: False
parameter can be used in
the parameters
block to convert the API call back to a asynchronous call.
User maintained workers are either Lambda()
or Activity()
, with the difference
being where the code that will be executed is located. For Lambda()
the code
is an AWS Lambda function. For Activity()
the code can be running anywhere,
and is responsible for polling AWS to see if there is new work for it to perform.
Activity ARNs are created in the Step Functions section of AWS (console or API). Once defined multiple workers can start polling for work and state machines can send data to the worker(s) for processing.
_Type_('name')
_Type_._Function_()
Arn('arn')
"""State Name
State Comment"""
timeout: int
heartbeat: int
input: JsonPath
result: JsonPath
output: JsonPath
parameters:
Key1: {"JSON": "Text"}
Key2.$: "$.json_path.to.input.data"
retry error(s) retry interval (seconds), max attempts, backoff rate
catch error(s): JsonPath
State(s)
Tasks:
_Type_('name')
: EitherLambda()
orActivity()
_Type_._Function_()
: One of the following (follow the link for details about what arguments are valid and required for theparameters
block.Arn('arn')
: Raw access for specifying the full ARN of the task to execute
Arguments:
name
: The name of the Lambda or Activity. The full ARN will be created at compile time using the given AWS region and account information.arn
: The full ARN of an Activity, Lambda, or other AWS API to be executed. This allows calling Activities hosted in different regions / accounts or calling new AWS APIs before this library is updated to handle them.
Modifiers:
timeout
: Number of seconds before the task times out (Default: 60 seconds)heatbeat
: Number of seconds before the task times out if no heartbeat has been received from the task. Needs to be less than thetimeout
value.parameters
: Keyword arguments to be passed in the API call. The value is a JSON text, which may be a JsonPath referencing data from the state's input data object. NOTE: If the value contains a JsonPath the key must end with.$
retry
: If the given error(s) were encountered, rerun the stateerror(s)
: A single string, array of strings, or empty array of errors to match against. An empty array matches against all errors.retry interval
: Number of seconds to wait before the first retrymax attempts
: Number of retries to attempt before passing errors tocatch
modifiers. Zero (0) is a valid value, meaning don't retry.backoff rate
: The multipler that increases theretry interval
on each attempt
catch
: If the given error(s) were encountered and not handled by aretry
then execute the given states. If the states in the catch block don't terminate, then execution will continue on the next valid state.error(s)
: A single string, array of strings, or empty array of errors to match against. An empty array matches against all errors.- JsonPath: An optional JsonPath string about where to place the error information in relationship to the errored state's input. By default the error information will replace the errored state's input. Error information is a dictionary containing the key 'Error' and possibly the key 'Cause'.
Note: Ordering of everything besides retry
and catch
is currently fixed. There
can be multiple retry
and catch
statements and there is no ordering of
those modifiers.
There are four different versions of the Wait()
state, but each pauses execution
for a given amount of time.
Wait(seconds=int)
Wait(timestamp='yyyy-mm-ddThh:mm:ssZ')
Wait(seconds_path=JsonPath)
Wait(timestamp_path=JsonPath)
"""State Name
State Comment"""
input: JsonPath
output: JsonPath
Arguments:
seconds
: Number of seconds to waittimestamp
: Wait until the specified timeseconds_path
: Read the number of seconds to wait from the given JsonPathtimestamp_path
: Read the timestamp to wait until from the givne JsonPath
This state defines an independent state machine that runs on each element of the input array.
Modifiers:
iterator
: Required - the independent state machine is defined here.max_concurrency
: The maximum instances of the iterator that may run in parallel.items_path
: JsonPath to the array to run the map on. Defaults to"$"
.parameters
: Keyword arguments to be passed in the API call. The value is a JSON text, which may be a JsonPath referencing data from the state's input data object. NOTE: If the value contains a JsonPath the key must end with.$
result
: JsonPath of where to place the results of the state, relative to the raw input (before theinput
modifier was applied) (Default:"$"
)output
: JsonPath selecting a value from the output object to be passed to the next state (Default:"$"
)retry
: If the given error(s) were encountered, rerun the stateerror(s)
: A single string, array of strings, or empty array of errors to match against. An empty array matches against all errors.retry interval
: Number of seconds to wait before the first retrymax attempts
: Number of retries to attempt before passing errors tocatch
modifiers. Zero (0) is a valid value, meaning don't retry.backoff rate
: The multipler that increases theretry interval
on each attempt
catch
: If the given error(s) were encountered and not handled by aretry
then execute the given states. If the states in the catch block don't terminate, then execution will continue on the next valid state.error(s)
: A single string, array of strings, or empty array of errors to match against. An empty array matches against all errors.- JsonPath: An optional JsonPath string about where to place the error information in relationship to the errored state's input. By default the error information will replace the errored state's input. Error information is a dictionary containing the key 'Error' and possibly the key 'Cause'.
Example showing map
used modifiers:
version: "1.0"
timeout: 60
Pass()
"""CreateSomeInputs"""
result: '$'
data:
{
"the_array": [1, 2, 3, 4, 5],
"foo": "bar"
}
map:
"""TransformInputsWithMap"""
iterator:
Pass()
"""MapPassState"""
Wait(seconds=2)
"""WaitState"""
parameters:
foo.$: "$.foo"
element.$: "$$.Map.Item.Value"
items_path: "$.the_array"
result: "$.transformed"
output: "$.transformed"
max_concurrency: 4
retry [] 1 0 1.0
catch []:
Pass()
"""SomeErrorHandler"""
Value Type | Supported Operators |
---|---|
Boolean | ==, != |
Integer | ==, !=, <, >, <=, >= |
Float | ==, !=, <, >, <=, >= |
String | ==, !=, <, >, <=, >= |
Timestamp | ==, !=, <, >, <=, >= |
Comparison operators can be composed using (order of list is order of precedence):
- ()
- not
- and
- or
The basic if
statement. Multiple (or no) elif
statements can be included. The
else
statement is also optional.
if JsonPath operator value:
"""State Name
State Comment"""
State(s)
elif JsonPath operator value:
State(s)
else:
State(s)
transform:
input: JsonPath
output: JsonPath
The transform
block contains the same input
and output
modifiers that the
simple states use.
Modifiers:
input
: JsonPath selecting a value from the input object to be passed to the current state, applied before performing the comparison (Default:"$"
)output
: JsonPath selecting a value from the output object to be passed to the next state, applied before executing the first state in the selected branch (Default:"$"
)
The basic switch
statement. Multiple case
statements are compared against the
JsonPath variable in the switch
statement. The default
statement is optional.
switch JsonPath:
"""State Name
State Comment"""
case value:
State(s)
default:
State(s)
transform:
input: JsonPath
output: JsonPath
The basic while
loop the continues to execute the given states until the condition
is no longer true.
while JsonPath operator value:
"""State Name
State Comment"""
State(s)
transform:
input: JsonPath
output: JsonPath
The parallel
control structure allows running multiple branches of execution
in parallel. The parallel state waits until all branches have finished before
moving to the next state. If there is any unhandled error in any branch, the
whole state is considered to have failed.
The state's input is passed to the first state in each branch and the results of the parallel state is an array of outputs from each branch.
parallel:
"""State Name
State Comment"""
State(s)
parallel:
State(s)
transform:
input: JsonPath
result: JsonPath
output: JsonPath
error:
retry error(s) retry interval (seconds), max attempts, backoff rate
catch error(s): JsonPath
State(s)
The transform
block contains the same input
, result
, and output
modifiers
that the simple states use.
Modifiers:
input
: JsonPath selecting a value from the input object to be passed to the first state in each parallel branch (Default:"$"
)result
: JsonPath of where to place the array of results from each of the parallel branches (before theinput
modifier was applied) (Default:"$"
)output
: JsonPath selecting a value from the output object to be passed to the next state (Default:"$"
)
The error
block contains the same retry
and catch
modifiers as the task state.
The goto
control statement allows jumping to another state. This can be used
to create common error handling routines (among other uses).
The state can only target states within the current branch of execution. This means either the main body of the Step Function or within a branch of a Parallel state.
goto "State Name"