-
Notifications
You must be signed in to change notification settings - Fork 73
Pipeline Design
NOTICE: This page is out-of-date! The general principles are correct but some of the details (exact directory structure) are inaccurate.
Pipelines (or workflows) in DeepForge allow users to diagram batches of operations to perform on data. This may include data normalization, transformation, ensembling models, etc.
Pipelines are composed of three main components:
- Start Operations (DataRetrievers)
- Operations
- Data
Start operations (or data retrievers) are operations which retrieve data. That is, start operations start the workflow and are the only nodes which do not require input data.
Operations are a generic concept in the DeepForge pipeline. Operations are simply lua scripts which, given the input data (and any specified attributes/references), perform some operation on the data and return a number of lua objects (and, potentially, resource files).
Data is visualized as connections in the pipeline. Data can have associated attributes, such as dimensionality. When a pipeline is executed, the data connection will also reference the output data of it's source operation.
When DeepForge executes a pipeline, a snapshot of the pipeline is taken and executed.
The associated data of a data node is expected to contain an init.lua file and any other files required by the init.lua file. The init.lua file should not define any globals.
Operations create variable (known) amounts of output data. If the operation has multiple return types, the return types are required to be named.
Operations execute in the following environment:
init.lua
attributes.lua
references.lua
res/
input/
output/
The input/
directory is populated with all incoming data types (by connection name - which match the incoming arg names). For example, an operation with two arguments, say a
and b
would have the following structure:
input/a/init.lua
input/a/res.yml
input/a/res2.yml
input/b/init.lua
input/b/res.xml
In the above example, the value of a
and b
are retrieved by loading the respective init.lua
files.
a = require('./input/a') -- requiring './input/a/init.lua'
b = require('./input/b')
-- rest of the file...
The output/
directory is the target directory which contains the files returned to the output connections displayed in the DeepForge UI (combined w/ the respective init.lua
files associated with each return value). For example, if we have an operation which is an image classifier grouping images by class where the classes are dog, cat, and fish, then we will may have the following structure for output/
:
output/dog/dogs.t7
output/cat/cats.t7
output/fish/fish.t7
The operation will also specify init files for each of the outputs. Following the completion of the operation, these init files and the contents of each directory are zipped and associated with the respective outgoing connection(s) in the DeepForge UI. When any subsequent operation uses these values, these zip files are unzipped to input/<ARG_NAME>
where <ARG_NAME>
is the name of the argument expected by this subsequent operation.
The attributes.lua
file returns a table associating the given operation node's attributes with their values. If the attribute is an asset, then the value for the key is a path to the asset in res/
.
The references.lua
file contains a table associating the operation node's pointer names with generated artifacts of the target of the pointer name. For example, a training node may have a reference, say network
, to the architecture which it is using for the training. In this case, the references.lua
will have a key network
associated with the path to the lua code that creates the given network.
As mentioned above, the res/
directory contains any resources acquired from the resolution of an attribute or reference.
Combining all these parts, we have operations that are executed by running th init.lua
, retrieve input args from input/<ARG_NAME>
(autogenerated from model), serialize outputs to output/<NAME>
, and can access operation attributes and references using attributes.lua
, references.lua
and res/
.
When defining custom operations, the following must be specified:
- input args (and types!)
- output(s) (and types!)
- Each output will use a standard
init.lua
for each return type. Therefore, the operation must serialize the output results in a standard way.
- Each output will use a standard
-
init.lua
script- The actual lua code for the operation
- Special Operations allow for the saving of data to DeepForge libraries for future reuse.
- When data is saved, a 'data' node can be created with the required content stored. This node can then be referenced by various data retrievers for other pipelines
- These data retrievers would simply return the value stored by their target reference
- This would give us a directory for all the models that we have been training
- When data is saved, a 'data' node can be created with the required content stored. This node can then be referenced by various data retrievers for other pipelines
Intro
Development
Design Notes