Skip to content

Data node configs

For Taipy to instantiate a Data node, a data node configuration must be provided. taipy.core.data.data_node.DataNodeConfig is used to configure the various data nodes that Taipy will manipulate. To configure a new taipy.core.data.data_node.DataNodeConfig, one can use the function Config.configure_data_node().

1
2
3
from taipy import Config

data_node_cfg = Config.configure_data_node(id="data_node_cfg")

In the previous code, we configured a simple data node just providing an identifier as a string "data_node_cfg".

More optional attributes are available on data nodes, including:

  • id is the identifier of the data node config.
    It is a mandatory parameter that must be unique. It must be a valid Python identifier.

  • scope is a taipy.core.Scope.
    It corresponds to the scope of the data node that will be instantiated from the data node configuration. The default value is Scope.SCENARIO.

  • storage_type is an attribute that indicates the type of storage of the data node.
    The possible values are "pickle" (the default value), "csv", "excel", "sql", "in_memory", or "generic".
    As explained in the following subsections, depending on the storage_type, other configuration attributes must be provided in the parameter properties parameter.

  • Any other custom attribute can be provided through the parameter properties, which is a dictionary (a description, a tag, etc.)
    This properties dictionary is used to configure the parameters specific to each storage type. Note also that all this dictionary properties is copied in the dictionary properties of all the data nodes instantiated from this data node configuration.

Below are two examples of data node configurations.

1
2
3
4
5
6
7
8
9
from taipy import Config, Scope

date_cfg = Config.configure_data_node(id="date_cfg", description="The current date of the scenario")

model_cfg = Config.configure_data_node(id="model_cfg",
                                       scope=Scope.CYCLE,
                                       storage_type="pickle",
                                       description="The trained model shared by all scenarios",
                                       code=54)

In line 3, we configured a simple data node with the id "date_cfg". The default value for scope is SCENARIO. The storage_type also has the default value "pickle".
An optional custom property called description is also added: this property is propagated to the data nodes instantiated from this config.

In lines 5-9, we add another data node configuration with the id "model_cfg". scope is set to CYCLE, so the corresponding data nodes will be shared by all the scenarios from the same cycle. storage_type is "pickle" as well, and two optional custom properties are added: a description string and an integer code. These two properties are propagated to the data nodes instantiated from this config.

Storage type

Taipy proposes various predefined data nodes corresponding to the most popular storage types. Thanks to predefined data nodes, the Python developer does not need to spend much time configuring the storage types or the query system. Most of the time, a predefined data node corresponding to a basic and standard use case satisfies the user's needs like pickle file, csv file, sql table, Excel sheet, etc.

The various predefined storage types are mainly used for input data. Indeed, the input data is usually provided by an external component, and the Python developer user does not control the format.

However, in most cases, particularly for intermediate or output data nodes, it is not relevant to prefer one storage type. The end-user wants to manipulate the corresponding data within the Taipy application. Still, the user does not have any particular specifications regarding the storage type. In such a case, the Python developer is recommended to use the default storage type pickle that does not require any configuration.

In case a more specific method to store, read and write the data is needed by the user, Taipy proposes a Generic data node that can be used for any storage type or any kind of query system. The user only needs to provide two python functions, one for reading and one for writing the data.

Each predefined data node is described in a subsequent section.

Pickle

A taipy.core.data.PickleDataNode is a specific data node used to model pickle data. To add a new pickle data node configuration, the Config.configure_pickle_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, two optional parameters can be provided.

  • The default_path parameter represents the default file path used by Taipy to read and write the data. If the pickle file already exists (in the case of a shared input data node, for instance), it is necessary to provide the default file path as the default_path parameter. If no value is provided, Taipy will use an internal path in the Taipy storage folder (more details on the Taipy storage folder configuration available on the Global configuration documentation).

  • If the default_data is provided, the data node is automatically written with the corresponding value. Any serializable Python object can be used.

1
2
3
4
5
6
from taipy import Config
from datetime import datetime

date_cfg = Config.configure_pickle_data_node(id="date_cfg", default_data=datetime(2022, 1, 25))

model_cfg = Config.configure_pickle_data_node(id="model_cfg", default_path="path/to/my/model.p",description="The trained model")

In line 4, we configure a simple pickle data node with the id "date_cfg". The scope is SCENARIO (default value), and a default data is provided.

In line 6, we add another pickle data node configuration with the id "model_cfg". The default SCENARIO scope is used. Since the data node config corresponds to a pre-existing pickle file, a default path "path/to/my/model.p" is provided. We also added an optional custom description.

Note

To configure a pickle data node, it is equivalent to use the method Config.configure_pickle_data_node() or the method Config.configure_data_node() with parameter storage_type="pickle".

Csv

A taipy.core.data.CSVDataNode data node is a specific data node used to model csv file data. To add a new csv data node configuration, the Config.configure_csv_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, one mandatory and two optional parameters can be provided.

  • The default_path parameter is a mandatory parameter and represents the default csv file path used by Taipy to read and write the data.

  • The has_header parameter represents if the file has a header of not. By default, has_header is True and Taipy will use the 1st row in the CSV file as the header.

  • When the exposed_type is given as a parameter, if the exposed_type value provided is numpy, the data node will read the csv file to a numpy array. If the provided value is a custom class, data node will create a list of custom object with the given custom class, each object will represent a row in the csv file.If exposed_type is not provided, the data node will read the csv file as a pandas DataFrame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

temp_cfg = Config.configure_csv_data_node(id="historical_temperature",
                                          default_path="path/hist_temp.csv",
                                          has_header=True,
                                          exposed_type="numpy")

sales_cfg = Config.configure_csv_data_node(id="sale_history",
                                           default_path="path/sale_history.csv",
                                           exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow representing a row of the CSV file.

In lines 7-10, we configure a basic csv data node with the id "historical_temperature". Its scope is by default SCENARIO. The default path corresponds to the file path/hist_temp.csv. The property has_header being True, representing the csv file has a header.

In lines 12-14, we add another csv data node configuration with the id "sale_history". The default SCENARIO scope is used again. Since we have a custom class pre-defined for this csv file (SaleRow), we provide it as the exposed_type parameter.

Note

To configure a csv data node, it is equivalent to use the method Config.configure_csv_data_node() or the method Config.configure_data_node() with parameter storage_type="csv".

Excel

An taipy.core.data.ExcelDataNode is a specific data node used to model xlsx file data. To add a new Excel data node configuration, the Config.configure_excel_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, a mandatory and three optional parameters can be provided.

  • The default_path is a mandatory parameter that represents the default Excel file path used by Taipy to read and write the data.

  • The has_header parameter specifies if the file has a header of not. If has_header is True (by default or was specified), Taipy will use the 1st row in the Excel file as the header.

  • The sheet_name parameter represents which specific sheet in the Excel file to read. If sheet_name is provided with a list of sheet names, the data node will return a dictionary with the key being the sheet name and the value being the data of the corresponding sheet. If a string is provided, the data node will read only the data of the corresponding sheet. The default value of sheet_name is None and the data node will return all sheets in the provided Excel file when reading it.

  • When the exposed_type is given as a parameter, if the exposed_type value provided is numpy, the data node will read the Excel file to a numpy array. If the provided value is a custom class, data node will create a list of custom objects with the given custom class, each object will represent a row in the Excel file. If exposed_type is not provided, the data node will read the Excel file as a pandas DataFrame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

hist_temp_cfg = Config.configure_excel_data_node(id="historical_temperature",
                                                 default_path="path/hist_temp.xlsx",
                                                 exposed_type="numpy")

sales_cfg = Config.configure_excel_data_node(id="sale_history",
                                             default_path="path/sale_history.xlsx",
                                             sheet_name=["January", "February"],
                                             exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow, representing an the Excel file row.

In lines 7-9, we configure an Excel data node. The id identifier is "historical_temperature". Its scope is SCENARIO (default value), and the default path is the file hist_temp.xlsx. With has_header being True, the Excel file must have a header. The sheet_name is not provided so Taipy will use the default value "Sheet1".

In lines 10-13, we add another excel data node configuration. The id identifier is "sale_history", the default SCENARIO scope is used. Since we have a custom class pre-defined for this Excel file, we will provide it in the exposed_type. We also provide the list of specific sheets we want to use as the sheet_name parameter.

Note

To configure an Excel data node, it is equivalent to use the method Config.configure_excel_data_node() or the method Config.configure_data_node() with parameter storage_type="excel".

Sql

A taipy.core.data.SQLDataNode is a specific data node used to model Sql data. To add a new sql data node configuration, the Config.configure_sql_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, multiple parameters can be provided.

  • The db_username parameter represents the database username that will be used by Taipy to access the database.
  • The db_password parameter represents the database user's password that will be used by Taipy to access the database.
  • The db_name parameter represents the name of the database.
  • The db_engine parameter represents the engine of the database.
  • The read_query parameter represents the SQL query that will be used by Taipy to read the data from the database.
  • The write_table parameter represents the name of the table in the database that Taipy will be writing the data to.
  • The db_port parameter represents the database port that will be used by Taipy to access the database. The default value of db_port is 1433.
  • The db_host parameter represents the database host that will be used by Taipy to access the database. The default value of db_host is "localhost".
  • The db_driver parameter represents the database driver that will be used by Taipy. The default value of db_driver is "ODBC Driver 17 for SQL Server".
1
2
3
4
5
6
7
8
9
from taipy import Config

forecasts_cfg = Config.configure_sql_data_node(id="forecasts",
                                               db_username="admin",
                                               db_password="password",
                                               db_name="taipy",
                                               db_engine="mssql",
                                               read_query="SELECT * from forecast_table",
                                               write_table= "forecast_table")

In the previous example, we configure a sql data node with the id "forecasts". Its scope is the default value SCENARIO. The database username is "admin", the user's password is "password", the database name is "taipy", and the database engine is mssql (short for Microsoft SQL). The read query will be "SELECT * from forecast_table", and the table the data will be written to is "forecast_table".

Note

To configure a sql data node, it is equivalent to use the method Config.configure_sql_data_node() or the method Config.configure_data_node() with parameter storage_type="sql".

Generic

A taipy.core.data.GenericDataNode is a specific data node used to model generic data types where the read and the write functions are defined by the user. To add a new generic data node configuration, the Config.configure_generic_data_node() method can be used. In addition to the parameters described in the previous section Data node configuration, two optional parameters can be provided.

  • The read_fct is a mandatory parameter that represents a Python function provided by the user. It will be used to read the data. More optional parameters can be passed through the read_fct_params parameter.

  • The write_fct is a mandatory parameter representing a Python function provided by the user. It will be used to write/serialize the data. The provided function must have at least one parameter dedicated to receiving data to be written. More optional parameters can be passed through the write_fct_params parameter.

  • The parameter read_fct_params represents the parameters that are passed to the read_fct to read/de-serialize the data. It must be a List type object.

  • The parameter write_fct_params represents the parameters that are passed to the write_fct to write the data. It must be a List type object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from taipy import Config

def read_data():
    pass

def write_data(data: Any, path: str):
    pass

historical_data_cfg = Config.configure_generic_data_node(id="historical_data",
                                                         read_fct=read_data,
                                                         write_fct=write_data,
                                                         write_fct_params=['../path/'])

In this small example, we configure a generic data node with the id "historical_data". We provide two Python functions (previously defined) as read_fct and write_fct parameters to read and write the data. We also provided a list object for the write_fct_params with a path to let the write_fct know where to write the data.

Note

To configure a generic data node, it is equivalent to use the method Config.configure_generic_data_node() or the method Config.configure_data_node() with parameter storage_type="generic".

In memory

An taipy.core.data.InMemoryDataNode is a specific data node used to model any data in the RAM. The Config.configure_in_memory_data_node() method can be used to add a new in_memory data node configuration. In addition to the generic parameters described in the previous section Data node configuration, an optional parameter can be provided.

  • If the default_data is given as a parameter, the data node is automatically written with the corresponding value (note that any python object can be used).
1
2
3
4
from taipy import Config
from datetime import datetime

date_cfg = Config.configure_in_memory_data_node(id="date", default_data=datetime(2022, 1, 25))

In this example, we configure an in_memory data node with the id "date", the scope is SCENARIO (default value), and a default data is provided.

Warning

Since the data is stored in memory, it cannot be used in a multiprocess environment. (See Job configuration for more details).

Note

To configure an in_memory data node, it is equivalent to use the method Config.configure_in_memory_data_node() or the method Config.configure_data_node() with parameter storage_type="in_memory".

The next section introduces the task configuration.