Data node configs

New data node config¶

To create an instance of a Data node, a data node configuration must first be provided. DataNodeConfig is used to configure data nodes. To configure a new DataNodeConfig, one can use the function Config.configure_data_node().

from taipy import Config

data_node_cfg = Config.configure_data_node(id="data_node_cfg")

We configured a simple data node in the previous code by providing an identifier as the string "data_node_cfg". The Config.configure_data_node() method actually creates a data node configuration, and registers it in the Config singleton.

The attributes available on data nodes are:

id is the string identifier of the data node config.
It is the only mandatory parameter and must be a unique and valid Python identifier.
scope is a Scope.
It corresponds to the scope of the data node that will be instantiated from the data node configuration. The default value is Scope.SCENARIO.
storage_type is an attribute that indicates the storage type of the data node.
The possible values are "pickle" (the default value), "csv", "excel", "json", "mongo_collection", "parquet", "sql", "sql_table", "in_memory", or "generic".
As explained in the following subsections, depending on the storage_type, other configuration attributes must be provided in the properties parameter.
Any other custom attribute can be provided through the parameter properties, a kwargs dictionary accepting any number of custom parameters (a description, a label, a tag, etc.) (It is recommended to read doc if you are not familiar with **kwargs arguments)
This properties dictionary is used to configure the parameters specific to each storage type. It is copied in the dictionary properties of all the data nodes instantiated from this data node configuration.

Below are two examples of data node configurations.

from taipy import Config, Scope

date_cfg = Config.configure_data_node(id="date_cfg",
                                      description="The current date of the scenario")

model_cfg = Config.configure_data_node(id="model_cfg",
                                       scope=Scope.CYCLE,
                                       storage_type="pickle",
                                       description="Trained model shared by all scenarios",
                                       code=54)

In lines 3-4, we configured a simple data node with the id "date_cfg". The default value for scope is SCENARIO. The storage_type is set to the default value "pickle".
An optional custom property called description is also added: this property is propagated to the data nodes instantiated from this config.

In lines 6-10, we add another data node configuration with the id "model_cfg". The scope is set to CYCLE so that all the scenarios from the same cycle will share the corresponding data nodes. The storage_type is "pickle", and two optional custom properties are added: a description string and an integer code. These two properties are propagated to the data nodes instantiated from this config.

Storage type¶

Taipy proposes predefined data nodes corresponding to the most popular storage types. Thanks to predefined data nodes, the Python developer does not need to spend much time configuring the storage types or the query system. A predefined data node will often satisfy the user's required format: pickle, CSV, SQL table, MongoDB collection, Excel sheet, etc.

The various predefined storage types are typically used for input data. Indeed, the input data is usually provided by external sources, where the Python developer user does not control the format.

For intermediate or output data nodes, the developer often does not have any particular specifications regarding the storage type. In such a case, using the default storage type pickle that does not require any configuration is recommended.

If a more specific method to store, read and write the data is needed, Taipy provides a Generic data node that can be used for any storage type (or any kind of query system). The developer only needs to provide two Python functions, one for reading and one for writing the data. Please refer to the generic data node config section for more details on generic data node.

All predefined data nodes are described in the subsequent sections.

Pickle¶

A PickleDataNode is a specific data node used to model pickle data. The Config.configure_pickle_data_node() method configures a new pickle data node configuration. In addition to the generic parameters described in the Data node configuration section, two optional parameters can be provided.

default_path represents the default file path used to read and write the data of the data nodes instantiated from the pickle configuration.
It is used to populate the path property of the entities (pickle data nodes) instantiated from the pickle data node configuration. That means by default all the entities (pickle data nodes) instantiated from the same pickle configuration will inherit/share the same pickle file provided in the default_path. To avoid this, the path property of a pickle data node entity can be changed at runtime right after its instantiation.
If no value is provided, Taipy will use an internal path in the Taipy storage folder (more details on the Taipy storage folder configuration are available in the Global configuration documentation).
default_data indicates data automatically written to the data node pickle upon creation.
Any serializable Python object can be used. The default value is None.

from taipy import Config
from datetime import datetime

date_cfg = Config.configure_pickle_data_node(
    id="date_cfg",
    default_data=datetime(2022, 1, 25))

model_cfg = Config.configure_pickle_data_node(
    id="model_cfg",
    default_path="path/to/my/model.p",
    description="The trained model")

In lines 4-6, we configure a simple pickle data node with the id "date_cfg". The scope is SCENARIO (default value), and default data is provided.

In lines 8-11, we add another pickle data node configuration with the id "model_cfg". The default SCENARIO scope is used. Since the data node config corresponds to a pre-existing pickle file, a default path "path/to/my/model.p" is provided. We also added an optional custom description.

Note

To configure a pickle data node, it is equivalent to using the method Config.configure_pickle_data_node() or the method Config.configure_data_node() with parameter storage_type="pickle".

CSV¶

A CSVDataNode data node is a specific data node used to model CSV file data. To add a new CSV data node configuration, the Config.configure_csv_data_node() method can be used. In addition to the generic parameters described in the Data node configuration section, the following parameters can be provided:

default_path represents the default file path used to read and write data pointed by the data nodes instantiated from the csv configuration.
It is used to populate the path property of the entities (csv data nodes) instantiated from the csv data node configuration. That means by default all the entities (csv data nodes) instantiated from the same csv configuration will inherit/share the same csv file provided in the default_path. To avoid this, the path property of a csv data node entity can be changed at runtime right after its instantiation.
has_header indicates if the file has a header of not.
By default, has_header is True and Taipy will use the 1st row in the CSV file as the header.
exposed_type indicates the data type returned when reading the data node (more examples of reading from CSV data node with different exposed_type is available on Read / Write a data node documentation):
- By default, exposed_type is "pandas", and the data node reads the CSV file as a Pandas DataFrame (pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "modin", the data node reads the CSV file as a Modin DataFrame (modin.pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "numpy", the data node reads the CSV file as a NumPy array (numpy.ndarray) when executing the read method.
- If the provided exposed_type is a custom Python class, the data node creates a list of custom objects with the given custom class. Each object represents a row in the CSV file.

from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

temp_cfg = Config.configure_csv_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.csv",
    has_header=True,
    exposed_type="numpy")

log_cfg = Config.configure_csv_data_node(
    id="log_history",
    default_path="path/hist_log.csv",
    exposed_type="modin")

sales_history_cfg = Config.configure_csv_data_node(
    id="sales_history",
    default_path="path/sales.csv",
    exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow representing a row of the CSV file.

In lines 7-11, we configure a basic CSV data node with the identifier "historical_temperature". Its scope is by default SCENARIO. The default path points to the file "path/hist_temp.csv". The property has_header is set to True.

In lines 13-16, we configure another CSV data node with the identifier "log_history". It uses the default SCENARIO scope again. The default path points to "path/hist_log.csv". The exposed_type provided is "modin".

In lines 18-21, we add another CSV data node configuration with the identifier "sales_history". The default SCENARIO scope is used again. Since we have a custom class called SaleRow that is defined for this CSV file, we provide it as the exposed_type parameter.

Note

To configure a CSV data node, it is equivalent to using the method Config.configure_csv_data_node() or the method Config.configure_data_node() with parameter storage_type="csv".

Excel¶

An ExcelDataNode is a specific data node used to model xlsx file data. To add a new Excel data node configuration, the Config.configure_excel_data_node() method can be used. In addition to the generic parameters described in the Data node configuration section, a mandatory and three optional parameters are provided.

default_path represents the default file path used to read and write data pointed by the data nodes instantiated from the Excel configuration.
It is used to populate the path property of the entities (Excel data nodes) instantiated from the Excel data node configuration. That means by default all the entities (Excel data nodes) instantiated from the same Excel configuration will inherit/share the same Excel file provided in the default_path. To avoid this, the path property of a Excel data node entity can be changed at runtime right after its instantiation.
has_header indicates if the file has a header of not.
By default, has_header is True and Taipy will use the 1st row in the Excel file as the header.
sheet_name represents which specific sheet in the Excel file to read:
- By default, sheet_name is None and the data node will return all sheets in the Excel file when reading it.
- If sheet_name is provided as a string, the data node will read only the data of the corresponding sheet.
- If sheet_name is provided with a list of sheet names, the data node will return a dictionary with the key being the sheet name and the value being the data of the corresponding sheet.
exposed_type indicates the data type returned when reading the data node (more examples of reading from Excel data node with different exposed_type is available on Read / Write a data node documentation):
- By default, exposed_type is "pandas", and the data node reads the Excel file as a Pandas DataFrame (pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "modin", the data node reads the Excel file as a Modin DataFrame (modin.pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "numpy", the data node reads the Excel file as a NumPy array (numpy.ndarray) when executing the read method.
- If the provided exposed_type is a custom Python class, the data node creates a list of custom objects with the given custom class. Each object represents a row in the Excel file.

from taipy import Config

class SaleRow:
    date: str
    nb_sales: int

hist_temp_cfg = Config.configure_excel_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.xlsx",
    exposed_type="numpy")

hist_log_cfg = Config.configure_excel_data_node(
    id="log_history",
    default_path="path/hist_log.xlsx",
    exposed_type="modin")

sales_history_cfg = Config.configure_excel_data_node(
    id="sales_history",
    default_path="path/sales.xlsx",
    sheet_name=["January", "February"],
    exposed_type=SaleRow)

In lines 3-5, we define a custom class SaleRow, representing a row in the Excel file.

In lines 7-10, we configure an Excel data node. The identifier is "historical_temperature". Its scope is SCENARIO (default value), and the default path is the file hist_temp.xlsx. has_header is set to True, the Excel file must have a header. The sheet_name is not provided so Taipy uses the default value "Sheet1".

In lines 12-15, we configure a new Excel data node. The identifier is "log_history", the default SCENARIO scope is used, and the default path is "path/hist_log.xlsx". "modin" is used as the exposed_type.

In lines 17-21, we add another Excel data node configuration. The identifier is "sales_history", the default SCENARIO scope is used. Since we have a custom class pre-defined for this Excel file, we provide it in the exposed_type. We also provide the list of specific sheets we want to use as the sheet_name parameter.

Note

To configure an Excel data node, it is equivalent to using the method Config.configure_excel_data_node() or the method Config.configure_data_node() with parameter storage_type="excel".

SQL Table¶

Important

To be able to use a SQLTableDataNode with Microsoft SQL Server you need to run internal dependencies with pip install taipy[mssql] and install your corresponding Microsoft ODBC Driver for SQLServer.
To be able to use a SQLTableDataNode with MySQL Server you need to run internal dependencies with pip install taipy[mysql] and install your corresponding MySQL Driver for MySQL.
To be able to use a SQLTableDataNode with PostgreSQL Server you need to run internal dependencies with pip install taipy[postgresql] and install your corresponding Postgres JDBC Driver for PostgreSQL.

A SQLTableDataNode is a specific data node that models data stored in a single SQL table. To add a new SQL table data node configuration, the Config.configure_sql_table_data_node() method can be used. In addition to the generic parameters described in the Data node configuration section, the following parameters can be provided:

db_username represents the database username that will be used by Taipy to access the database.
db_password represents the database user's password that will be used by Taipy to access the database.
db_name represents the name of the database.
db_engine represents the engine of the database.
Possible values are "sqlite", "mssql", "mysql", or "postgresql".
table_name represents the name of the table to read from and write into.
db_port represents the database port that will be used by Taipy to access the database.
The default value of db_port is 1433.
db_host represents the database host that will be used by Taipy to access the database.
The default value of db_host is "localhost".
db_driver represents the database driver that will be used by Taipy.
The default value of db_driver is "ODBC Driver 17 for SQL Server".
exposed_type indicates the data type returned when reading the data node (more examples of reading from SQL Table data node with different exposed_type is available on Read / Write a data node documentation):
- By default, exposed_type is "pandas", and the data node reads the SQL table as a Pandas DataFrame (pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "modin", the data node reads the SQL table as a Modin DataFrame (modin.pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "numpy", the data node reads the SQL table as a NumPy array (numpy.ndarray) when executing the read method.
- If the provided exposed_type is a custom Python class, the data node creates a list of custom objects with the given custom class. Each object represents a record in the SQL table.

from taipy import Config

sales_history_cfg = Config.configure_sql_table_data_node(
    id="sales_history",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    table_name="sales"
)

In the previous example, we configure a SQL table data node with the id "sales_history". Its scope is the default value SCENARIO. The database username is "admin", the user's password is "password" (refer to advance configuration to pass password as an environment variable), the database name is "taipy", and the database engine is mssql (short for Microsoft SQL). The table name is "sales".

When the data node is read, it reads all the rows from the table "sales", and when the data node is written, it deletes all the data in the table and insert the new data.

Note

To configure a SQL table data node, it is equivalent to using the method Config.configure_sql_table_data_node() or the method Config.configure_data_node() with parameter storage_type="sql_table".

SQL¶

Important

To be able to use a SQLTableDataNode with Microsoft SQL Server you need to run internal dependencies with pip install taipy[mssql] and install your corresponding Microsoft ODBC Driver for SQL Server.
To be able to use a SQLTableDataNode with MySQL Server you need to run internal dependencies with pip install taipy[mysql] and install your corresponding MySQL Driver for MySQL.
To be able to use a SQLTableDataNode with PostgreSQL Server you need to run internal dependencies with pip install taipy[postgresql] and install your corresponding Postgres JDBC Driver for PostgreSQL.

A SQLDataNode is a specific data node used to model data stored in a SQL Database. To add a new SQL data node configuration, the Config.configure_sql_data_node() method can be used. In addition to the generic parameters described in the Data node configuration section, the following parameters can be provided:

db_username represents the database username that will be used by Taipy to access the database.
db_password represents the database user's password that will be used by Taipy to access the database.
db_name represents the name of the database.
db_engine represents the engine of the database.
Possible values are "sqlite", "mssql", "mysql", or "postgresql".
read_query represents the SQL query that will be used by Taipy to read the data from the database.
write_query_builder is a callable function that takes in the data as an input parameter and returns a list of SQL queries to be executed when the write method is called.
db_port represents the database port that will be used by Taipy to access the database.
The default value of db_port is 1433.
db_host represents the database host that will be used by Taipy to access the database.
The default value of db_host is "localhost".
db_driver represents the database driver that will be used by Taipy.
The default value of db_driver is "ODBC Driver 17 for SQL Server".
exposed_type indicates the data type returned when reading the data node:
- By default, exposed_type is "pandas", and the data node reads the data as a Pandas DataFrame (pandas.DataFrame) when execute the read_query.
- If the exposed_type provided is "modin", the data node reads the CSV file as a Modin DataFrame (modin.pandas.DataFrame) when execute the read_query.
- If the exposed_type provided is "numpy", the data node reads the CSV file as a NumPy array (numpy.ndarray) when execute the read_query.
- If the provided exposed_type is a custom Python class, the data node creates a list of custom objects with the given custom class. Each object represents a record in the table returned by the read_query.

from taipy import Config
import pandas as pd


def write_query_builder(data: pd.DataFrame):
    insert_data = list(
        data[["date", "nb_sales"]].itertuples(index=False, name=None))
    return [
        "DELETE FROM sales",
        ("INSERT INTO sales VALUES (?, ?)", insert_data)
    ]


sales_history_cfg = Config.configure_sql_data_node(
    id="sales_history",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    read_query="SELECT * from sales",
    write_query_builder=write_query_builder
)

In the previous example, we configure a SQL data node with the id "sales_history". Its scope is the default value SCENARIO. The database username is "admin", the user's password is "password", the database name is "taipy", and the database engine is mssql (short for Microsoft SQL). The read query will be "SELECT * from sales".

The write_query_builder is a callable function that takes in a pandas.DataFrame and return a list of queries. The first query will delete all the data in the table "sales", and the second query is a prepared statement that takes in two values, which is the data from the two columns "date" and "nb_sales" in the pandas.DataFrame. Since this is a prepared statement, it must be passed as a tuple with the first element being the query and the second element being the data.

The very first parameter of write_query_builder (i.e. data) is expected to have the same type as the return type of the task function whose output is the data node. In this example, the task function must return a pandas.DataFrame, since the data parameter of the write_query_builder is a pandas.DataFrame.

Note

To configure a SQL data node, it is equivalent to using the method Config.configure_sql_data_node() or the method Config.configure_data_node() with parameter storage_type="sql".

JSON¶

A JSONDataNode is a predefined data node that models JSON file data. The Config.configure_json_data_node() method adds a new JSON data node configuration. In addition to the generic parameters described in the Data node configuration section, the following parameters can be provided:

default_path represents the default file path used to read and write data pointed by the data nodes instantiated from the json configuration.
It is used to populate the path property of the entities (json data nodes) instantiated from the json data node configuration. That means by default all the entities (json data nodes) instantiated from the same json configuration will inherit/share the same json file provided in the default_path. To avoid this, the path property of a json data node entity can be changed at runtime right after its instantiation.
encoder and decoder parameters are optional parameters representing the encoder (json.JSONEncoder) and decoder (json.JSONDecoder) used to serialize and deserialize JSON data.
Check out JSON Encoders and Decoders documentation for more details.

from taipy import Config

hist_temp_cfg = Config.configure_json_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.json",
)

In this example, we configure a JSON data node. The id argument is "historical_temperature". Its scope is SCENARIO (default value), and the path points to hist_temp.json file.

Without specific encoder and decoder parameters, hist_temp_cfg will use default encoder and decoder provided by Taipy, which can encode and decode Python enum.Enum, datetime.datetime, datetime.timedelta, and dataclass object.

from taipy import Config
import json


class SaleRow:
    date: str
    nb_sales: int


class SaleRowEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, SaleRow):
            return {
                '__type__': "SaleRow",
                'date': obj.date,
                'nb_sales': obj.nb_sales}
        return json.JSONEncoder.default(self, obj)


class SaleRowDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        json.JSONDecoder.__init__(self,
                                  object_hook=self.object_hook,
                                  *args,
                                  **kwargs)

    def object_hook(self, d):
        if d.get('__type__') == "SaleRow":
            return SaleRow(date=d['date'], nb_sales=d['nb_sales'])
        return d


sales_history_cfg = Config.configure_json_data_node(
    id="sales_history",
    path="path/sales.json",
    encoder=SaleRowEncoder,
    decoder=SaleRowDecoder)

In this next example, we configure a JSONDataNode with a custom JSON encoder and decoder:

In lines 5-7, we define a custom class SaleRow, representing data in a JSON object.
In lines 9-30, we define a custom encoder and decoder for the SaleRow class.
- When writing a JSONDataNode, the SaleRowEncoder encodes a SaleRow object in JSON format. For example, after the creation of the scenario scenario,
```
scenario.sales_history.write(SaleRow("12/24/2018", 1550))
```
  the previous code writes the following object
```
{
    "__type__": "SaleRow",
    "date": "12/24/2018",
    "nb_sales": 1550,
}
```
  to the file path/sales.json.
- When reading a JSONDataNode, the SaleRowDecoder converts a JSON object with the attribute __type__ into a Python object corresponding to the attribute's value. In this example, the "SaleRow"` data class.
In lines 33-37, we create a JSON data node configuration. The id identifier is "sales_history". The default SCENARIO scope is used. The encoder and decoder are the custom encoder and decoder defined above.

Note

To configure a JSON data node, it is equivalent to using the method Config.configure_json_data_node() or the method Config.configure_data_node() with parameter storage_type="json".

Parquet¶

A ParquetDataNode data node is a specific data node used to model Parquet file data. The Config.configure_parquet_data_node() adds a new Parquet data node configuration. In addition to the generic parameters described in the Data node configuration section, the following parameters can be provided:

default_path represents the default file path used to read and write data pointed by the data nodes instantiated from the Parquet configuration.
It is used to populate the path property of the entities (Parquet data nodes) instantiated from the Parquet data node configuration. That means by default all the entities (Parquet data nodes) instantiated from the same Parquet configuration will inherit/share the same Parquet file provided in the default_path. To avoid this, the path property of a Parquet data node entity can be changed at runtime right after its instantiation.
engine represents the Parquet library to use.
Possible values are "fastparquet" or "pyarrow". The default value is "pyarrow".
Using the "fastparquet" engine requires installation with pip install taipy[fastparquet].
compression is the name of the compression to use.
Possible values are "snappy", "gzip", "brotli" and None. The default value is "snappy". Use None for no compression.
read_kwargs is a dictionary of additional parameters passed to the pandas.read_parquet method.
write_kwargs is a dictionary of additional parameters passed to the pandas.DataFrame.to_parquet method.
The parameters in "read_kwargs" and "write_kwargs" have a higher precedence than the top-level parameters (engine and compression) which are also passed to Pandas. Passing read_kwargs= {"engine": "fastparquet", "compression": "gzip"} will override the engine and compression properties of the data node.

Tip

The ParquetDataNode.read_with_kwargs and ParquetDataNode.write_with_kwargs methods provide an alternative for specifying keyword arguments at runtime. See examples of these methods on the Data Node Management page.

exposed_type indicates the data type returned when reading the data node (more examples of reading from Parquet data node with different exposed_type are available on Read / Write a data node documentation):
- By default, exposed_type is "pandas", and the data node reads the Parquet file as a Pandas DataFrame (pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "modin", the data node reads the Parquet file as a Modin DataFrame (modin.pandas.DataFrame) when executing the read method.
- If the exposed_type provided is "numpy", the data node reads the Parquet file as a NumPy array (numpy.ndarray) when executing the read method.
- If the provided exposed_type is a Callable, the data node creates a list of objects as returned by the Callable. Each object represents a record in the Parquet file. The Parquet file is read as a pandas.DataFrame and each row of the DataFrame is passed to the Callable as keyword arguments where the key is the column name, and the value is the corresponding value for that row.

from taipy import Config

temp_cfg = Config.configure_parquet_data_node(
    id="historical_temperature",
    default_path="path/hist_temp.parquet")

In lines 3-5, we configure a basic Parquet data node. The only two required parameters are id and default_path.

from taipy import Config

read_kwargs = {"filters": [("log_level", "in", {"ERROR", "CRITICAL"})]}
write_kwargs = {"partition_cols": ["log_level"], "compression": None}

log_cfg = Config.configure_parquet_data_node(
    id="log_history",
    default_path="path/hist_log.parquet",
    engine="pyarrow", # default
    compression="snappy", # default, but overridden by the key in write_kwargs
    exposed_type="modin",
    read_kwargs=read_kwargs,
    write_kwargs=write_kwargs)

In this larger example, we illustrate some specific benefits of using ParquetDataNode for storing tabular data. This time, we provide the read_kwargs and write_kwargs dictionary parameters to be passed as keyword arguments to pandas.read_parquet and pandas.DataFrame.to_parquet respectively.

Here, the dataset is partitioned (using partition_cols on line 4) by the "log_level" column when written to disk. Also, filtering is performed (using filters on line 3) to read only the rows where the "log_level" column value is either "ERROR" or "CRITICAL", speeding up the read, especially when dealing with a large amount of data.

Note that even though line 10 specifies the compression as "snappy", since the "compression" key was also provided in the write_kwargs dictionary on line 4, the last value is used, hence the compression is None.

Note

To configure a Parquet data node, it is equivalent to using the method Config.configure_parquet_data_node() or the method Config.configure_data_node() with parameter storage_type="parquet".

Info

Taipy ParquetDataNode wraps pandas.read_parquet and pandas.DataFrame.to_parquet methods for reading and writing Parquet data, respectively.

Mongo Collection¶

A MongoCollectionDataNode is a specific data node used to model data stored in a Mongo collection. To add a new mongo_collection data node configuration, the Config.configure_mongo_collection_data_node() method can be used. In addition to the generic parameters described in the Data node configuration section, multiple parameters can be provided.

db_name represents the name of the database in MongoDB.
collection_name represents the name of the data collection in the database.
custom_document represents the custom class used to store, encode, and decode data when reading and writing to a Mongo collection. The data returned by the read method is a list of custom_document object(s), and the data passed as a parameter of the write method is a (list of) custom_document object(s). The custom_document can have:
- An optional decoder() method to decode data in the Mongo collection to a custom object when reading.
- An optional encoder() method to encode the object's properties to the Mongo collection format when writing.
db_username represents the username to be used to access MongoDB.
db_password represents the user's password to be used by Taipy to access MongoDB.
db_port represents the database port to be used to access MongoDB.
The default value of db_port is 27017.
db_host represents the database host to be used to access MongoDB.
The default value of db_host is "localhost".

from taipy import Config

historical_data_cfg = Config.configure_mongo_collection_data_node(
    id="historical_data",
    db_username="admin",
    db_password="pa$$w0rd",
    db_name="taipy",
    collection_name="historical_data_set",
)

In this example, we configure a mongo_collection data node with the id "historical_data":

Its scope is the default value SCENARIO.
The database username is "admin", the user's password is "pa$$w0rd"
The database name is "taipy"
The collection name is "historical_data_set".
Without being specified, the custom document class is defined as taipy.core.DefaultCustomDocument.

from taipy import Config
from datetime import datetime

class DailyMinTemp:
    def __init__(self, Date : datetime=None, Temp : float=None):
        self.Date = Date
        self.Temp = Temp

    def encode(self):
        return {
            "date": self.Date.isoformat(),
            "temperature": self.Temp,
        }

    @classmethod
    def decode(cls, data):
        return cls(
            datetime.fromisoformat(data["date"]),
            data["temperature"],
        )

historical_data_cfg = Config.configure_mongo_collection_data_node(
    id="historical_data",
    db_username="admin",
    db_password="pa$$w0rd",
    db_name="taipy",
    collection_name="historical_data_set",
    custom_document=DailyMinTemp,
)

In this next example, we configure another mongo_collection data node, with the custom document is defined as DailyMinTemp class.

The custom encode method encodes datetime.datetime to the ISO 8601 string format.
The corresponding decode method decodes an ISO 8601 string to datetime.datetime.
The _id of the Mongo document is discarded.

Without these two methods, the default decoder will map the key of each document to the corresponding property of a DailyMinTemp object, and the default encoder will convert DailyMinTemp object's properties to a dictionary without any special formatting.

Note

To configure a Mongo collection data node, it is equivalent to using the method Config.configure_mongo_collection_data_node() or the method Config.configure_data_node() with parameter storage_type="mongo_collection".

Generic¶

A GenericDataNode is a specific data node used to model generic data types where the user defines the read and the write functions. The Config.configure_generic_data_node() method adds a new generic data node configuration. In addition to the parameters described in the Data node configuration section, the following parameters can be provided:

read_fct is a mandatory parameter representing a Python function provided by the user. It is used to read the data. More optional parameters can be passed through the read_fct_params parameter.
write_fct is a mandatory parameter representing a Python function provided by the user. It is used to write/serialize the data. The provided function must have at least one parameter to receive data to be written. It must be the first parameter. More optional parameters can be passed through the write_fct_params parameter.
read_fct_params represents the parameters passed to the read_fct to read/de-serialize the data. It must be a List type object.
write_fct_params represents the parameters passed to the write_fct to write the data. It must be a List type object.

from taipy import Config


def read_text(path: str) -> str:
    with open(path, 'r') as text_reader:
        data = text_reader.read()
    return data


def write_text(data: str, path: str):
    with open(path, 'w') as text_writer:
        text_writer.write(data)


historical_data_cfg = Config.configure_generic_data_node(
    id="historical_data",
    read_fct=read_text,
    write_fct=write_text,
    read_fct_params=["../path/data.txt"],
    write_fct_params=["../path/data.txt"])

In this small example is configured a generic data node with the id "historical_data".

In lines 17-18, we provide two Python functions (previously defined) as read_fct and write_fct parameters to read and write the data in a text file. Note that the first parameter of write_fct is mandatory and is used to pass the data on writing.

In line 19, we provide read_fct_params with a path to let the read_fct know where to read the data.

In line 20, we provide a list of parameters to write_fct_params with a path to let the write_fct know where to write the data. Note that the data parameter will be automatically passed at runtime when writing the data.

The generic data node can also be used in situations requiring a specific business logic in reading or writing data, and the user can easily provide that. Follows an example using a custom delimiter when writing and reading a CSV file.

import csv
from typing import List
from taipy import Config


def read_csv(path: str, delimiter: str = ",") -> str:
    with open(path, newline=' ') as csvfile:
        data = csv.reader(csvfile, delimiter=delimiter)
    return data


def write_csv(data: List[str], path: str, delimiter: str = ","):
    headers = ["country_code", "country"]
    with open(path, 'w') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(headers)
        for row in data:
            writer.writerow(row)


csv_country_data_cfg = Config.configure_generic_data_node(
    id="csv_country_data",
    read_fct=read_csv,
    write_fct=write_csv,
    read_fct_params=["../path/data.csv", ";"],
    write_fct_params=["../path/data.csv", ";"])

It is also possible to use a generic data node custom functions to perform some data preparation:

from datetime import datetime as dt
import pandas as pd
from taipy import Config


def read_csv(path: str) -> str:
    # reading a csv file, define some column types and parse a string into datetime
    custom_parser = lambda x: dt.strptime(x, "%Y %m %d %H:%M:%S")
    data = pd.read_csv(
        path,
        parse_dates=['date'],
        date_parser=custom_parser,
        dtype={
            "name": str,
            "grade": int
        }
    )
    return data


def write_csv(data: pd.DataFrame, path: str):
    # dropping not a number values before writing
    data.dropna().to_csv(path)


student_data = Config.configure_generic_data_node(
    id="student_data",
    read_fct=read_csv,
    write_fct=write_csv,
    read_fct_params=["../path/data.csv"],
    write_fct_params=["../path/data.csv"])

Note

To configure a generic data node, it is equivalent to using the method Config.configure_generic_data_node() or the method Config.configure_data_node() with parameter storage_type="generic".

In memory¶

An InMemoryDataNode is a specific data node used to model any data in the RAM. The Config.configure_in_memory_data_node() method is used to add a new in_memory data node configuration. In addition to the generic parameters described in the Data node configuration section, an optional parameter can be provided:

If the default_data is given as a parameter of the data node configuration, the data node entity is automatically written with the corresponding value (note that any serializable Python object can be used) upon its instantiation.

from taipy import Config
from datetime import datetime

date_cfg = Config.configure_in_memory_data_node(
    id="date",
    default_data=datetime(2022, 1, 25))

In this example, we configure an in_memory data node with the id "date". The scope is SCENARIO (default value), and default data is provided.

Warning

Since the data is stored in memory, it cannot be used in a multi-process environment. (See Job configuration for more details).

Note

To configure an in_memory data node, it is equivalent to using the method Config.configure_in_memory_data_node() or the method Config.configure_data_node() with parameter storage_type="in_memory".

The next section introduces the task configuration.