Data node

A data node is one of the most important concepts in Taipy Core. It does not contain the data itself but holds all the necessary information to read and write the actual data. It can be seen as a dataset descriptor or data reference.

A data node can reference any data:

  • a text,
  • a numeric value,
  • a list of parameters,
  • a custom python object,
  • the content of a JSON file, a CSV file, a Pickle file, etc.
  • the content of one or multiple database table(s),
  • any other data.

It is designed to model any type of data: input, intermediate, or output data, internal or external data, local or remote data, historical data, a set of parameters, a trained model, etc.

The data node information depends on the data itself, its exposed format, and its storage type.

First example: If the data is stored in an SQL database, the corresponding data node should contain the username, password, host, port, the queries to read and write the data,as well as the python class used for deserialization.

Second example: If the data is stored in a CSV file, the corresponding data node should contain, for instance, the path to the file and the python class used for deserialization.

Let's take a realistic example.

Let's assume we want to build an application to predict the monthly sales demand in order to adjust production planning, constrained by some capacity.

The flowchart below represents the various data nodes we want to be processed byt the tasks (in orange).

tasks and data nodes

For that purpose, we have six data nodes modeling the data (the dark blue boxes). One each for the sales history, the trained model, the current month, the sales predictions, the production capacity, and the production orders.

Note

Taipy proposes various predefined data nodes corresponding to the most popular storage types. More details on the Data node configuration page

In our example, the sales history comes as a CSV file. For example, the sales history comes from our company record system, so we do not control its storage type. We got the data as a CSV file. We can use a predefined CSV data node to model the sales history.

As for the production orders data node, we want to write the data into a database shared by other systems. We can use the SQL data node to model the production orders.

We have no particular specification for the other data nodes. We can use the default storage type: Pickle.

The data node's attributes are populated based on the data node configuration taipy.core.data.data_node.DataNodeConfig that must be provided when instantiating a new data node. (Please refer to the configuration details documentation for more details on configuration).

The next section introduces the Task concept.