Task

A Task is a runnable Python function provided by the developer. It represents one of the steps that the developer wants to implement in his/her pipeline.

For example, a task could be a pre-processing function to clean the initial dataset. It could also be a more complex function that computes a training model using machine learning algorithms.

Since a task represents a function, it can take a set of parameters as input and return a set of results as output. Each input parameter and each output result is modeled as a data node.

The attributes of a task (the input data nodes, the output data nodes, the Python function) are populated based on the task configuration TaskConfig that must be provided when instantiating a new task. (Please refer to the configuration details documentation for more details on configuration).

In our example

We create three tasks:

tasks and data nodes

The first is the training task that takes the sales history as the input data node and returns the trained model as the output data node.

The second is the predict task that takes the trained model and the current month as input and returns the sales predictions.

And the third task is the production planning task that takes the capacity and the sales predictions as input data nodes and returns the production orders as output.

Important

The data nodes sales history, current month, and capacity are considered as input data nodes since no task computes them.
The trained model and sales predictions' data nodes are considered as intermediate data nodes while the production orders data node is considered as an output data node since no task reads it.

The next section introduces the Pipeline concept.