Skip to content

Data node configs

For Taipy to instantiate a Data node, a data node configuration must be provided. DataNodeConfig is used to configure the various data nodes that Taipy will manipulate. To configure a new DataNodeConfig, one can use the function Config.configure_data_node().

1
2
3
from taipy import Config

data_node_cfg = Config.configure_data_node(id="data_node_cfg")

In the previous code, we configured a simple data node just providing an identifier as a string "data_node_cfg".

More optional attributes are available on data nodes, including:

  • id is the identifier of the data node config.
    It is a mandatory parameter that must be unique. It must be a valid Python identifier.

  • scope is a Scope.
    It corresponds to the scope of the data node that will be instantiated from the data node configuration. The default value is Scope.SCENARIO.

  • storage_type is an attribute that indicates the type of storage of the data node.
    The possible values are "pickle" (the default value), "csv", "excel", "json", "sql", "sql_table", "in_memory", or "generic".
    As explained in the following subsections, depending on the storage_type, other configuration attributes must be provided in the parameter properties parameter.

  • cacheable is an attribute that indicates if the data node can be cached during the execution of the tasks it is connected to.

  • Any other custom attribute can be provided through the parameter properties, which is a dictionary (a description, a tag, etc.)
    This properties dictionary is used to configure the parameters specific to each storage type. Note also that all this dictionary properties is copied in the dictionary properties of all the data nodes instantiated from this data node configuration.

Below are two examples of data node configurations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from taipy import Config, Scope

date_cfg = Config.configure_data_node(id="date_cfg",
                                      description="The current date of the scenario")

model_cfg = Config.configure_data_node(
    id="model_cfg",
    scope=Scope.CYCLE,
    storage_type="pickle",
    description="Trained model shared by all scenarios",
    code=54)

In lines 3-4, we configured a simple data node with the id "date_cfg". The default value for scope is SCENARIO. The storage_type also has the default value "pickle".
An optional custom property called description is also added: this property is propagated to the data nodes instantiated from this config.

In lines 6-11, we add another data node configuration with the id "model_cfg". scope is set to CYCLE, so the corresponding data nodes will be shared by all the scenarios from the same cycle. storage_type is "pickle" as well, and two optional custom properties are added: a description string and an integer code. These two properties are propagated to the data nodes instantiated from this config.

Storage type

Taipy proposes various predefined data nodes corresponding to the most popular storage types. Thanks to predefined data nodes, the Python developer does not need to spend much time configuring the storage types or the query system. Most of the time, a predefined data node corresponding to a basic and standard use case satisfies the user's needs like pickle file, CSV file, SQL table, Excel sheet, etc.

The various predefined storage types are mainly used for input data. Indeed, the input data is usually provided by an external component, and the Python developer user does not control the format.

However, in most cases, particularly for intermediate or output data nodes, it is not relevant to prefer one storage type. The end-user wants to manipulate the corresponding data within the Taipy application. Still, the user does not have any particular specifications regarding the storage type. In such a case, the Python developer is recommended to use the default storage type pickle that does not require any configuration.

In case a more specific method to store, read and write the data is needed by the user, Taipy proposes a Generic data node that can be used for any storage type or any kind of query system. The user only needs to provide two python functions, one for reading and one for writing the data.

Each predefined data node is described in a subsequent section.

Pickle

A PickleDataNode is a specific data node used to model pickle data. To add a new pickle data node configuration, the Config.configure_pickle_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, two optional parameters can be provided.

  • default_path represents the default file path used by Taipy to read and write the data.
    If the pickle file already exists (in the case of a shared input data node, for instance), it is necessary to provide the default file path as the default_path parameter.
    If no value is provided, Taipy will use an internal path in the Taipy storage folder (more details on the Taipy storage folder configuration available on the Global configuration documentation).

  • default_data indicates data that is automatically written to the data node upon creation.
    Any serializable Python object can be used. The default value is None.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from taipy import Config
from datetime import datetime

date_cfg = Config.configure_pickle_data_node(
    id="date_cfg",
    default_data=datetime(2022, 1, 25))

model_cfg = Config.configure_pickle_data_node(
    id="model_cfg",
    default_path="path/to/my/model.p",
    description="The trained model")

In lines 4-6, we configure a simple pickle data node with the id "date_cfg". The scope is SCENARIO (default value), and a default data is provided.

In lines 8-11, we add another pickle data node configuration with the id "model_cfg". The default SCENARIO scope is used. Since the data node config corresponds to a pre-existing pickle file, a default path "path/to/my/model.p" is provided. We also added an optional custom description.

Note

To configure a pickle data node, it is equivalent to use the method Config.configure_pickle_data_node() or the method Config.configure_data_node() with parameter storage_type="pickle".

CSV

A CSVDataNode data node is a specific data node used to model CSV file data. To add a new CSV data node configuration, the Config.configure_csv_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, the following parameters can be provided:

  • default_path is a mandatory parameter and represents the default CSV file path used by Taipy to read and write the data.

  • has_header indicates if the file has a header of not.
    By default, has_header is True and Taipy will use the 1st row in the CSV file as the header.

  • exposed_type indicates the data type returned when reading the data node (more examples of reading from CSV data node with different exposed_type is available on Read / Write a data node documentation):

    • By default, exposed_type is "pandas", and the data node will read the CSV file as a pandas.DataFrame.
    • If the exposed_type value provided is "modin", the data node will read the CSV file as a modin.pandas.DataFrame.
    • If the exposed_type value provided is "numpy", the data node will read the CSV file as a numpy array.
    • If the provided exposed_type value is a custom Python class, the data node will create a list of custom objects with the given custom class, each object will represent a row in the CSV file.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

temp_cfg = Config.configure_csv_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.csv",
    has_header=True,
    exposed_type="numpy")

sales_cfg = Config.configure_csv_data_node(
    id="sale_history",
    default_path="path/sale_history.csv",
    exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow representing a row of the CSV file.

In lines 7-11, we configure a basic CSV data node with the identifier "historical_temperature". Its scope is by default SCENARIO. The default path corresponds to the file "path/hist_temp.csv". The property has_header being True, representing the CSV file has a header.

In lines 13-16, we add another csv data node configuration with the id "sale_history". The default SCENARIO scope is used again. Since we have a custom class pre-defined for this csv file (SaleRow), we provide it as the exposed_type parameter.

Note

To configure a CSV data node, it is equivalent to use the method Config.configure_csv_data_node() or the method Config.configure_data_node() with parameter storage_type="csv".

Excel

An ExcelDataNode is a specific data node used to model xlsx file data. To add a new Excel data node configuration, the Config.configure_excel_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, a mandatory and three optional parameters can be provided.

  • default_path is a mandatory parameter that represents the default Excel file path used by Taipy to read and write the data.

  • has_header indicates if the file has a header of not.
    By default, has_header is True and Taipy will use the 1st row in the Excel file as the header.

  • sheet_name represents which specific sheet in the Excel file to read:

    • By default, sheet_name is None and the data node will return all sheets in the Excel file when reading it.
    • If sheet_name is provided as a string, the data node will read only the data of the corresponding sheet.
    • If sheet_name is provided with a list of sheet names, the data node will return a dictionary with the key being the sheet name and the value being the data of the corresponding sheet.
  • exposed_type indicates the data type returned when reading the data node (more examples of reading from Excel data node with different exposed_type is available on Read / Write a data node documentation):

    • By default, exposed_type is "pandas", and the data node will read the Excel file as a pandas.DataFrame.
    • If the exposed_type value provided is "modin", the data node will read the CSV file as a modin.pandas.DataFrame.
    • If the exposed_type value provided is "numpy", the data node will read the Excel file as a numpy array.
    • If the provided exposed_type value is a custom Python class, the data node will create a list of custom objects with the given custom class, each object will represent a row in the Excel file.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

hist_temp_cfg = Config.configure_excel_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.xlsx",
    exposed_type="numpy")

sales_cfg = Config.configure_excel_data_node(id="sale_history",
                                             default_path="path/sale_history.xlsx",
                                             sheet_name=["January", "February"],
                                             exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow, representing a row in the Excel file.

In lines 7-10, we configure an Excel data node. The identifier is "historical_temperature". Its scope is SCENARIO (default value), and the default path is the file hist_temp.xlsx. With has_header being True, the Excel file must have a header. The sheet_name is not provided so Taipy will use the default value "Sheet1".

In lines 12-15, we add another Excel data node configuration. The id identifier is "sale_history", the default SCENARIO scope is used. Since we have a custom class pre-defined for this Excel file, we will provide it in the exposed_type. We also provide the list of specific sheets we want to use as the sheet_name parameter.

Note

To configure an Excel data node, it is equivalent to use the method Config.configure_excel_data_node() or the method Config.configure_data_node() with parameter storage_type="excel".

SQL Table

Important

To be able to use a SQLTableDataNode with Microsoft SQL Server you need to run internal dependencies with pip install taipy[mssql] and install your corresponding Microsoft ODBC Driver for SQL Server.

A SQLTableDataNode is a specific data node that models data stored in a single SQL table. To add a new SQL table data node configuration, the Config.configure_sql_table_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, the following parameters can be provided:

  • db_username represents the database username that will be used by Taipy to access the database.
  • db_password represents the database user's password that will be used by Taipy to access the database.
  • db_name represents the name of the database.
  • db_engine represents the engine of the database.
    Possible values are "sqlite" or "mssql".
  • table_name represents the name of the table to read from and write into.
  • db_port represents the database port that will be used by Taipy to access the database.
    The default value of db_port is 1433.
  • db_host represents the database host that will be used by Taipy to access the database.
    The default value of db_host is "localhost".
  • db_driver represents the database driver that will be used by Taipy.
    The default value of db_driver is "ODBC Driver 17 for SQL Server".
  • exposed_type indicates the data type returned when reading the data node (more examples of reading from SQL Table data node with different exposed_type is available on Read / Write a data node documentation):
    • By default, exposed_type is "pandas", and the data node will read the SQL table as a pandas.DataFrame.
    • If the exposed_type value provided is "numpy", the data node will read the SQL table as a numpy array.
    • If the provided exposed_type value is a custom Python class, the data node will create a list of custom objects with the given custom class, each object will represent a row in the SQL table.
1
2
3
4
5
6
7
8
9
from taipy import Config

forecasts_cfg = Config.configure_sql_table_data_node(
    id="forecasts",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    table_name="forecast_table")

In the previous example, we configure a SQL_table data node with the id "forecasts". Its scope is the default value SCENARIO. The database username is "admin", the user's password is "password", the database name is "taipy", and the database engine is mssql (short for Microsoft SQL). The table name is "forecast_table". When the data node is read, it will read all the rows from the table "forecast_table", and when the data node is written, it will delete all the data in the table and insert the new data.

Note

To configure a SQL table data node, it is equivalent to use the method Config.configure_sql_table_data_node() or the method Config.configure_data_node() with parameter storage_type="sql_table".

SQL

Important

A SQLDataNode is a specific data node used to model data stored in a SQL Database. To add a new SQL data node configuration, the Config.configure_sql_data_node() method can be used. In addition to the generic parameters described in the previous section Data node configuration, the following parameters can be provided:

  • db_username represents the database username that will be used by Taipy to access the database.
  • db_password represents the database user's password that will be used by Taipy to access the database.
  • db_name represents the name of the database.
  • db_engine represents the engine of the database.
    Possible values are "sqlite" or "mssql".
  • read_query represents the SQL query that will be used by Taipy to read the data from the database.
  • write_query_builder is a callable function that takes in the data as an input parameter and returns a list of SQL queries to be executed when the write data node method is called.
  • db_port represents the database port that will be used by Taipy to access the database.
    The default value of db_port is 1433.
  • db_host represents the database host that will be used by Taipy to access the database.
    The default value of db_host is "localhost".
  • db_driver represents the database driver that will be used by Taipy.
    The default value of db_driver is "ODBC Driver 17 for SQL Server".
  • exposed_type indicates the data type returned when reading the data node:
    • By default, exposed_type is "pandas", and the data node will return a pandas.DataFrame when execute the read_query.
    • If the exposed_type value provided is "numpy", the data node will return a numpy array.
    • If the provided exposed_type value is a custom Python class, the data node will create a list of custom objects with the given custom class, each object will represent a record in the table returned by the read_query.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from taipy import Config
import pandas as pd

def write_query_builder(data: pd.DataFrame):
    insert_data = list(
        data[["date", "nb_sales"]].itertuples(index=False, name=None))
    return [
        "DELETE FROM forecast_table",
        ("INSERT INTO forecast_table VALUES (?, ?)", insert_data)
    ]

forecasts_cfg = Config.configure_sql_data_node(
    id="forecasts",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    read_query="SELECT * from forecast_table",
    write_query_builder= write_query_builder)

In the previous example, we configure a sql data node with the id "forecasts". Its scope is the default value SCENARIO. The database username is "admin", the user's password is "password", the database name is "taipy", and the database engine is mssql (short for Microsoft SQL). The read query will be "SELECT * from forecast_table".

The write query builder in this example is a callable function that takes in a pandas.DataFrame and return a list of queries. The first query will delete all the data in the table "sales", and the second query is a prepared statement that takes in two values, which is the data from the two columns "date" and "nb_sales" in the pandas. DataFrame. Since this is a prepared statement, it must be passed as a tuple with the first element being the query and the second element being the data.

The data parameter of the write query builder is expected to have the same data type as the return type of the task function whose output is the data node. In this example, the task function returns a pandas.DataFrame, so the data parameter of the write query builder is also expected to be a pandas.DataFrame.

Note

To configure an SQL data node, it is equivalent to use the method Config.configure_sql_data_node() or the method Config.configure_data_node() with parameter storage_type="sql".

Generic

A GenericDataNode is a specific data node used to model generic data types where the read and the write functions are defined by the user. To add a new generic data node configuration, the Config.configure_generic_data_node() method can be used. In addition to the parameters described in the previous section Data node configuration, two optional parameters can be provided.

  • The read_fct is a mandatory parameter that represents a Python function provided by the user. It will be used to read the data. More optional parameters can be passed through the read_fct_params parameter.

  • The write_fct is a mandatory parameter representing a Python function provided by the user. It will be used to write/serialize the data. The provided function must have at least one parameter dedicated to receiving data to be written. More optional parameters can be passed through the write_fct_params parameter.

  • The parameter read_fct_params represents the parameters that are passed to the read_fct to read/de-serialize the data. It must be a List type object.

  • The parameter write_fct_params represents the parameters that are passed to the write_fct to write the data. It must be a List type object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from taipy import Config

def read_data():
    pass

def write_data(data: Any, path: str):
    pass

historical_data_cfg = Config.configure_generic_data_node(id="historical_data",
                                                         read_fct=read_data,
                                                         write_fct=write_data,
                                                         write_fct_params=['../path/'])

In this small example, we configure a generic data node with the id "historical_data". We provide two Python functions (previously defined) as read_fct and write_fct parameters to read and write the data. We also provided a list object for the write_fct_params with a path to let the write_fct know where to write the data.

Note

To configure a generic data node, it is equivalent to use the method Config.configure_generic_data_node() or the method Config.configure_data_node() with parameter storage_type="generic".

In memory

An InMemoryDataNode is a specific data node used to model any data in the RAM. The Config.configure_in_memory_data_node() method can be used to add a new in_memory data node configuration. In addition to the generic parameters described in the previous section Data node configuration, an optional parameter can be provided.

  • If the default_data is given as a parameter, the data node is automatically written with the corresponding value (note that any python object can be used).
1
2
3
4
from taipy import Config
from datetime import datetime

date_cfg = Config.configure_in_memory_data_node(id="date", default_data=datetime(2022, 1, 25))

In this example, we configure an in_memory data node with the id "date", the scope is SCENARIO (default value), and a default data is provided.

Warning

Since the data is stored in memory, it cannot be used in a multiprocess environment. (See Job configuration for more details).

Note

To configure an in_memory data node, it is equivalent to use the method Config.configure_in_memory_data_node() or the method Config.configure_data_node() with parameter storage_type="in_memory".

JSON

A JSONDataNode is a type of data node used to model JSON file data. To add a new JSON data node configuration, the Config.configure_json_data_node_node() method can be used. In addition to the generic parameters described in Data node configuration section, the following parameters can be provided:

  • default_path is a mandatory parameter that represents the JSON file path used by Taipy to read and write data.

  • encoder and decoder parameters are optional parameters that represent the encoder (json.JSONEncoder) and decoder (json.JSONDecoder) used to serialize and deserialize JSON data.
    Check out JSON Encoders and Decoders documentation for more details.

1
2
3
4
5
6
from taipy import Config

hist_temp_cfg = Config.configure_json_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.json",
)

In this example, we configure a JSON data node. The id argument is "historical_temperature". Its scope is SCENARIO (default value), and the path is the file hist_temp.json.

Without specific encoder and decoder parameters, hist_temp_cfg will use default encoder and decoder provided by Taipy, which have the capability to encode and decode Python enum.Enum, datetime.datetime, and dataclass object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from taipy import Config
import json

class SaleRow:
    date: str
    nb_sales: int

class SaleRowEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, SaleRow):
            return {
                '__type__': "SaleRow",
                'date': obj.date,
                'nb_sales': obj.nb_sales}
        return json.JSONEncoder.default(self, obj)

class SaleRowDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        json.JSONDecoder.__init__(self,
                                  object_hook=self.object_hook,
                                  *args,
                                  **kwargs)

    def object_hook(self, d):
        if d.get('__type__') == "SaleRow":
            return SaleRow(date=d['date'], nb_sales=d['nb_sales'])
        return d

sales_cfg = Config.configure_json_data_node(
    id="sale_history",
    path="path/sale_history.json",
    encoder=SaleRowEncoder,
    decoder=SaleRowDecoder)

In this next example, we config a JSONDataNode with custom JSON encoder and decoder:

  • In lines 4-6, we define a custom class SaleRow, representing data in a JSON object.

  • In line 8-27, we define custom encoder and decoder for the SaleRow class.

    • When write to JSONDataNode, the SaleRowEncoder will encode a SaleRow object to JSON format. For example, after create a scenario,
      scenario.sale_history.write(SaleRow("12/24/2018", 1550))
      
      will write
      {
          "__type__": "SaleRow",
          "date": "12/24/2018",
          "nb_sales": 1550,
      }
      
      to the file path/sales.json.
    • When reading a JSONDataNode, the SaleRowDecoder is used to convert a JSON object with the attribute __type__ into a Python object corresponding to the value of the attribute. In this example, the "SaleRow"` data class.
  • In lines 29-33, we create a JSON data node configuration. The id identifier is "sales_history", the default SCENARIO scope is used. The encoder and decoder are the custom encoder and decoder defined above.

Note

To configure a JSON data node, it is equivalent to use the method Config.configure_json_data_node() or the method Config.configure_data_node() with parameter storage_type="json".

The next section introduces the task configuration.