Data node management
Data nodes get created when scenarios or pipelines are created. Please refer to the Entities' creation section for more details.
In this section, it is assumed that my_config.py
module contains a Taipy configuration already implemented.
Data node attributes¶
A DataNode
entity is identified by a unique identifier id
that Taipy generates.
A data node also holds various properties and attributes accessible through the entity:
- config_id: The id of the data node config.
- scope: The scope of this data node (scenario, pipeline, etc.).
- id: The unique identifier of this data node.
- name: The user-readable name of the data node.
- owner_id: The identifier of the owner (pipeline_id, scenario_id, cycle_id) or
None
. - last_edit_date: The date and time of the last data modification made through Taipy. Note that only for file-based data nodes (CSV, Excel, pickle, JSON, Parquet, ...), the file's last modification date is used to compute the last_edit_date value. That means if a file is modified manually or by an external process, the last_edit_date value is automatically updated within Taipy.
- edits: The ordered list of
Edit
s, representing the successive modifications of the data node. - version: The string indicates the application version of the data node to instantiate. If not provided, the current version is used.
- validity_period: The validity period of a data node. If validity_period is set to None, the data node is always up-to-date.
- edit_in_progress: The boolean flag signals if the data node is locked for modification.
- properties: The dictionary of additional arguments.
Get data node¶
The first method to access a data node is by calling the get()
method
passing the data node id as parameter:
Example
1 2 3 4 5 6 7 8 9 |
|
The data nodes that are part of a scenario, pipeline or task can be directly accessed as attributes:
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Get all data nodes¶
All data nodes that are part of a scenario or a pipeline can be directly accessed as attributes:
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
All the data nodes can be retrieved using the method get_data_nodes()
which returns a list of all existing
data nodes.
Example
1 2 3 4 5 6 |
|
Read / Write a data node¶
To access the content of a data node you can use the DataNode.read()
method. The read method returns the data
stored in the data node according to the type of data node:
Example
1 2 3 4 5 6 7 8 9 10 11 |
|
To write some data on the data node, like the output of a task, you can use the DataNode.write()
method. The
method takes a data object (string, dictionary, lists, NumPy arrays, Pandas dataframes, etc. based on the data
node type and its exposed type) as a parameter and writes it on the data node:
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Pickle¶
When reading from a Pickle data node, Taipy returns whichever data stored in the pickle file.
Pickle data node can write any data object that can be picked, including but not limited to:
- integers, floating-point numbers.
- strings, bytes, or bytearrays.
- tuples, lists, sets, and dictionaries containing only picklable objects.
- functions, classes.
- instances of classes with picklable properties.
Check out What can be pickled and unpickled? for more details.
CSV¶
When reading from a CSV data node, Taipy returns the data of the CSV file based on the exposed_type parameter. Check out CSV Data Node configuration for more details on exposed_type.
Assume that the content of the sales.csv
file is the following.
path/sales.csv
date,nb_sales
12/24/2018,1550
12/25/2018,2315
12/26/2018,1832
The following examples represent the results when reading from a CSV data node with different exposed_type:
data_node.read()
returns
pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
modin.pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
numpy.array(
[
["12/24/2018", "1550"],
["12/25/2018", "2315"],
["12/26/2018", "1832"]
],
)
[
SaleRow("12/24/2018", 1550),
SaleRow("12/25/2018", 2315),
SaleRow("12/26/2018", 1832),
]
When writing data to a CSV data node, the CSVDataNode.write()
method can take several datatype as the input:
- list, numpy array
- dictionary, or list of dictionaries
- pandas dataframes
The following examples will write to the path of the CSV data node:
data_node.write()
examples
When write a list to CSV data node, each element of a list contains 1 row of data.
# write a list
data_node.write(
["12/24/2018", "12/25/2018", "12/26/2018"]
)
# or write a list of lists
data_node.write(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
]
)
data_node.write(
np.array([
["12/24/2018", 1550],
["12/24/2018", 2315],
["12/24/2018", 1832],
])
)
# "list" form
data_node.write(
{
"date": ["12/24/2018", "12/25/2018", "12/26/2018"],
"nb_sales": [1550, 2315, 1832]
}
)
# "records" form
data_node.write(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/24/2018", "nb_sales": 2315},
{"date": "12/24/2018", "nb_sales": 1832},
]
)
data = pandas.DataFrame(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/24/2018", "nb_sales": 2315},
{"date": "12/24/2018", "nb_sales": 1832},
]
)
data_node.write(data)
When write a list or numpy array to CSV data node, the column name will be numbered from 1.
To write with custom column names, use the CSVDataNode.write_with_column_names()
method.
CSVDataNode.write_with_column_names()
examples
data_node.write(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
],
columns=["date", "nb_sales"]
)
Excel¶
When reading from an Excel data node, Taipy returns the data of the Excel file based on the exposed_type parameter. Check out Excel Data Node configuration for more details on exposed_type.
For the example in this section, assume that sales_history_cfg
in my_config.py
is an Excel data node configuration with default_path="path/sales.xlsx"
.
Assume that the content of the sales.xlsx
file is the following.
path/sales.xlsx
date | nb_sales |
---|---|
12/24/2018 | 1550 |
12/25/2018 | 2315 |
12/26/2018 | 1832 |
The following examples represent the results when reading from an Excel data node with different exposed_type:
data_node.read()
returns
pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
modin.pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
numpy.array(
[
["12/24/2018", "1550"],
["12/25/2018", "2315"],
["12/26/2018", "1832"]
],
)
[
SaleRow("12/24/2018", 1550),
SaleRow("12/25/2018", 2315),
SaleRow("12/26/2018", 1832),
]
When writing data to an Excel data node, the ExcelDataNode.write()
method can take several datatype as the input:
- list, numpy array
- dictionary, or list of dictionaries
- pandas dataframes
The following examples will write to the path of the Excel data node:
data_node.write()
examples
When write a list to Excel data node, each element of a list contains 1 row of data.
# write a list
data_node.write(
["12/24/2018", "12/25/2018", "12/26/2018"]
)
# or write a list of lists
data_node.write(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
]
)
data_node.write(
np.array([
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
])
)
# "list" form
data_node.write(
{
"date": ["12/24/2018", "12/25/2018", "12/26/2018"],
"nb_sales": [1550, 2315, 1832]
}
)
# "records" form
data_node.write(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
)
data = pandas.DataFrame(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
)
data_node.write(data)
When write a list or numpy array to Excel data node, the column name will be numbered from 1.
To write with custom column names, use the ExcelDataNode.write_with_column_names()
method.
ExcelDataNode.write_with_column_names()
examples
data_node.write(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
],
columns=["date", "nb_sales"]
)
SQL Table¶
When reading from a SQL Table data node, Taipy returns the data of the SQL Table file based on the exposed_type parameter. Check out SQL Table Data Node configuration for more details on exposed_type.
For the example in this section, assume that sales_history_cfg
in my_config.py
is a SQL Table data node configuration with table_name="sales"
.
Assume that the content of the "sales"
table is the following.
A selection from the "sales" table
ID | date | nb_sales |
---|---|---|
1 | 12/24/2018 | 1550 |
2 | 12/25/2018 | 2315 |
3 | 12/26/2018 | 1832 |
The following examples represent the results when reading from a SQL Table data node with different exposed_type:
data_node.read()
returns
pandas.DataFrame
(
ID date nb_sales
0 1 12/24/2018 1550
1 2 12/25/2018 2315
2 3 12/26/2018 1832
)
modin.pandas.DataFrame
(
ID date nb_sales
0 1 12/24/2018 1550
1 2 12/25/2018 2315
2 3 12/26/2018 1832
)
numpy.array(
[
["1", "12/24/2018", "1550"],
["2", "12/25/2018", "2315"],
["3", "12/26/2018", "1832"]
],
)
[
SaleRow("12/24/2018", 1550),
SaleRow("12/25/2018", 2315),
SaleRow("12/26/2018", 1832),
]
When writing data to a SQL Table data node, the SQLTableDataNode.write()
method can take several datatype as the input:
- list of lists or list of tuples
- numpy array
- dictionary, or list of dictionaries
- pandas dataframes
Assume that the "ID" column is the auto-increment primary key. The following examples will write to the SQL Table data node:
data_node.write()
examples
# write a list of lists
data_node.write(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
]
)
# or write a list of tuples
data_node.write(
[
("12/24/2018", 1550),
("12/25/2018", 2315),
("12/26/2018", 1832),
]
)
data = np.array(
[
["12/24/2018", 1550],
["12/25/2018", 2315],
["12/26/2018", 1832],
]
)
data_node.write(data)
# write 1 record to the SQL table
data_node.write(
{"date": "12/24/2018", "nb_sales": 1550}
)
# write multiple records using a list of dictionaries
data_node.write(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
)
data = pandas.DataFrame(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
)
data_node.write(data)
SQL¶
A SQL data node is designed to give the user more flexibility on how to read and write to SQL table (or multiple SQL tables).
Let's consider the orders_cfg
in
my_config.py
which configures a SQL data node.
When reading from a SQL data node, Taipy executes the read query and returns the data of the SQL file based on the exposed_type parameter:
- The exposed_type parameter of
orders_cfg
is undefined, therefore it takes the default value as "pandas". Check out SQL Data Node configuration for more details on exposed_type. - The read_query of
orders_cfg
isSELECT orders.ID, orders.date, products.price, orders.number_of_products FROM orders INNER JOIN products ON orders.product_id=products.ID
- When reading from the SQL data node using
data_node.read()
method, Taipy will execute the above query and return apandas.DataFrame
represents the "orders" table inner join with the "products" table.
A selection from the "orders" table
ID | date | product_id | number_of_products |
---|---|---|---|
1 | 01/05/2019 | 2 | 200 |
2 | 01/05/2019 | 3 | 450 |
3 | 01/05/2019 | 5 | 350 |
4 | 01/06/2019 | 1 | 520 |
5 | 01/06/2019 | 3 | 250 |
6 | 01/07/2019 | 2 | 630 |
7 | 01/07/2019 | 4 | 480 |
A selection from the "products" table
ID | price | description |
---|---|---|
1 | 30 | foo product |
2 | 50 | bar product |
3 | 25 | foo product |
4 | 60 | bar product |
5 | 40 | foo product |
data_node.read()
returns
pandas.DataFrame
(
ID date price number_of_products
0 1 01/05/2019 50 200
1 2 01/05/2019 25 450
2 3 01/05/2019 40 350
3 4 01/06/2019 30 520
4 5 01/06/2019 25 250
5 6 01/07/2019 50 630
6 7 01/07/2019 60 480
)
When writing to a SQL data node, Taipy will first pass the data to write_query_builder and then execute a list of queries returned by the query builder:
- The write_query_builder parameter of
orders_cfg
in this example is defined as thewrite_orders_plan()
method. - After being called with the write data as a
pd.DataFrame
, thewrite_orders_plan()
method will return a list of SQL queries. - The first query deletes all records from "orders" table.
- The following query will insert a list of records to the "orders" table according to the data, assume that "ID" column in "orders" table is the auto-increment primary key.
data_node.write()
data = pandas.DataFrame(
[
{"date": "01/08/2019", "product_id": 1 "number_of_products": 450},
{"date": "01/08/2019", "product_id": 3 "number_of_products": 320},
{"date": "01/08/2019", "product_id": 4 "number_of_products": 350},
]
)
data_node.write(data)
The "orders" table after being written:
ID | date | product_id | number_of_products |
---|---|---|---|
8 | 01/08/2019 | 1 | 450 |
9 | 01/08/2019 | 3 | 320 |
10 | 01/08/2019 | 4 | 350 |
JSON¶
When reading from a JSON data node, Taipy will return a dictionary or a list based on the format of the JSON file.
When writing data to a JSON data node, the JSONDataNode.write()
method can take a list,
dictionary, or list of dictionaries as the input.
In JSON, values must be one of the following data types:
- A string
- A number
- An object (embedded JSON object)
- An array
- A boolean
null
However, the content of a JSON data node can vary. By default, JSON data node provided by Taipy can also encode and decode:
- Python
enum.Enum
. - A
datetime.datetime
object. - A dataclass object.
For the example in this section, assume that sales_history_cfg
in
my_config.py
is a JSON data node configuration with default_path="path/sales.json"
.
Read and write from a JSON data node using default encoder and decoder
data = [
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
data_node.write(data)
results in:
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/25/2018", "nb_sales": 2315},
{"date": "12/26/2018", "nb_sales": 1832},
]
from datetime import datetime
data = [
{"date": datetime.datetime(2018, 12, 24), "nb_sales": 1550},
{"date": datetime.datetime(2018, 12, 25), "nb_sales": 2315},
{"date": datetime.datetime(2018, 12, 26), "nb_sales": 1832},
]
data_node.write(data)
results in:
[
{"date": {"__type__": "Datetime", "__value__": "2018-12-24T00:00:00"}, "nb_sales": 1550},
{"date": {"__type__": "Datetime", "__value__": "2018-12-24T00:00:00"}, "nb_sales": 2315},
{"date": {"__type__": "Datetime", "__value__": "2018-12-24T00:00:00"}, "nb_sales": 1832},
]
The read method will return a list of dictionaries, with "date" are datetime.datetime
as data
when written.
from enum import Enum
class SaleRank(Enum):
A = 2000
B = 1800
C = 1500
D = 1200
F = 1000
data = [
{"date": "12/24/2018", "nb_sales": SaleRank.C},
{"date": "12/25/2018", "nb_sales": SaleRank.A},
{"date": "12/26/2018", "nb_sales": SaleRank.B},
]
data_node.write(data)
results in:
[
{"date": "12/24/2018", "nb_sales": {"__type__": "Enum-SaleRank-C", "__value__": 1500}},
{"date": "12/25/2018", "nb_sales": {"__type__": "Enum-SaleRank-A", "__value__": 2000}},
{"date": "12/26/2018", "nb_sales": {"__type__": "Enum-SaleRank-B", "__value__": 1800}},
]
The read method will return a list of dictionaries, with "nb_sales" are Enum.enum as data
when written.
from dataclasses import dataclass
@dataclass
class SaleRow:
date: str
nb_sales: int
data = [
SaleRow("12/24/2018", 1550),
SaleRow("12/25/2018", 2315),
SaleRow("12/26/2018", 1832),
]
data_node.write(data)
results in:
[
{"__type__": "dataclass-SaleRow", "__value__": {"date": "12/24/2018", "nb_sales": 1550}},
{"__type__": "dataclass-SaleRow", "__value__": {"date": "12/25/2018", "nb_sales": 2315}},
{"__type__": "dataclass-SaleRow", "__value__": {"date": "12/26/2018", "nb_sales": 1832}},
]
The read method will return a list of SaleRow objects as data
when written.
You can also specify custom JSON encoder and decoder to handle different data types. Check out JSON Data Node configuration for more details on how to config custom JSON encoder and decoder.
Parquet¶
When read from a Parquet data node, Taipy returns the data of the Parquet file based on exposed_type parameter. Check out Parquet Data Node configuration for more details on exposed_type.
Assume that the content of the sales.parquet
file populates the following table.
path/sales.parquet
date | nb_sales |
---|---|
12/24/2018 | 1550 |
12/25/2018 | 2315 |
12/26/2018 | 1832 |
The following examples represent the results when read from Parquet data node with different exposed_type:
data_node.read()
returns
pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
modin.pandas.DataFrame
(
date nb_sales
0 12/24/2018 1550
1 12/25/2018 2315
2 12/26/2018 1832
)
numpy.array(
[
["12/24/2018", "1550"],
["12/25/2018", "2315"],
["12/26/2018", "1832"]
],
)
[
SaleRow("12/24/2018", 1550),
SaleRow("12/25/2018", 2315),
SaleRow("12/26/2018", 1832),
]
When writing data to a Parquet data node, the ParquetDataNode.write()
method can take several
datatype as the input depending on the exposed type:
- pandas dataframes
- modin dataframes
- numpy arrays
- any object, which will be passed to the
pd.DataFrame
constructor (e.g., list of dictionaries)
The following examples will write to the path of the Parquet data node:
data_node.write()
examples
data = pandas.DataFrame(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/24/2018", "nb_sales": 2315},
{"date": "12/24/2018", "nb_sales": 1832},
]
)
data_node.write(data)
# "list" form
data_node.write(
{
"date": ["12/24/2018", "12/25/2018", "12/26/2018"],
"nb_sales": [1550, 2315, 1832]
}
)
# "records" form
data_node.write(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/24/2018", "nb_sales": 2315},
{"date": "12/24/2018", "nb_sales": 1832},
]
)
data = modin.pandas.DataFrame(
[
{"date": "12/24/2018", "nb_sales": 1550},
{"date": "12/24/2018", "nb_sales": 2315},
{"date": "12/24/2018", "nb_sales": 1832},
]
)
data_node.write(data)
Additionally, Parquet data node entities also expose two new methods, namely: ParquetDataNode.read_with_kwargs
and ParquetDataNode.write_with_kwargs
. These two methods may be used to pass additional keyword arguments
to the pandas
pandas.read_parquet
and pandas.DataFrame.to_parquet
methods, on top of the arguments which were defined in the
ParquetDataNode configuration.
The following examples demonstrate reading and writing to a Parquet data node with additional keyword arguments:
Reading data with ParquetDataNode.read_with_kwargs
columns = ["nb_sales"]
data_node.read_with_kwargs(columns=columns)
Here, the ParquetDataNode.read_with_kwargs
method is used to specify a keyword parameter, "columns",
which is the list of column names to be read from the Parquet dataset. In this case, only the "nb_sales"
column will be read.
Writing data with ParquetDataNode.write_with_kwargs
data_node.write_with_kwargs(index=False)
Here, the ParquetDataNode.write_with_kwargs
method is used to specify a keyword parameter, "index",
which is a boolean value determining if the index of the DataFrame should be written. In this case, the
index will not be not written.
Mongo collection¶
When reading from a Mongo collection data node, Taipy will return a list of objects as instances of a document class defined by custom_document.
When writing data to a Mongo collection data node, the MongoCollectionDataNode.write()
method takes
a list of objects as instances of a document class
defined by custom_document as the input.
By default, Mongo collection data node uses taipy.core.DefaultCustomDocument
as the document class.
A DefaultCustomDocument
can have any attribute, however, the type of the value should be supported by
MongoDB, including but not limited to:
- Boolean, integers, and floating-point numbers.
- String.
- Object (embedded document object).
- Arrays − arrays or list or multiple values.
For the example in this section, assume that sales_history_cfg
in
my_config.py
is a Mongo collection data node configuration.
Check out MongoDB supported data types for more details.
Read and write from a Mongo collection data node using default document class
from taipy.core import DefaultCustomDocument
data = [
DefaultCustomDocument(date="12/24/2018", nb_sales=1550),
DefaultCustomDocument(date="12/25/2018", nb_sales=2315),
DefaultCustomDocument(date="12/26/2018", nb_sales=1832),
]
data_node.write(data)
will write 3 documents to MongoDB:
[
{"_id": ObjectId("634cd1b3383279c68cee1c21"), "date": "12/24/2018", "nb_sales": 1550},
{"_id": ObjectId("634cd1b3383279c68cee1c22"), "date": "12/25/2018", "nb_sales": 2315},
{"_id": ObjectId("634cd1b3383279c68cee1c23"), "date": "12/26/2018", "nb_sales": 1832},
]
The read method will return a list of DefaultCustomDocument objects, including "_id" attribute.
You can also specify custom document class to handle specific attribute, encode and decode data when reading and writing to the Mongo collection.
Check out Mongo collection Data Node configuration for more details on how to config a custom document class.
Generic¶
A Generic data node has the read and the write functions defined by the user:
- When reading from a generic data node, Taipy runs the function defined by read_fct with parameters defined by read_fct_params.
- When writing to a generic data node, Taipy runs the function defined by write_fct with parameters defined by write_fct_params.
In memory¶
Since an In memory data node stores data in RAM as a Python variable, the read / write methods are rather straightforward.
When reading from an In memory data node, Taipy returns whichever data stored in RAM corresponding to the data node.
Correspondingly, In memory data node can write any data object that is valid data for a Python variable.
Warning
Since the data is stored in memory, it cannot be used in a multiprocess environment. (See Job configuration for more details).
Filter read results¶
It is also possible to partially read the contents of data nodes, which comes in handy when dealing
with large amounts of data.
This can be achieved by providing an operator, a Tuple of (field_name, value, comparison_operator),
or a list of operators to the DataNode.filter()
method:
1 2 3 4 |
|
If a list of operators is provided, it is necessary to provide a join operator that will be used to combine the filtered results from the operators.
It is also possible to use pandas style filtering:
1 2 |
|
Warning
For now, the DataNode.filter()
method is only implemented for CSVDataNode
, ExcelDataNode
,
SQLTableDataNode
, SQLDataNode
with "pandas"
as the exposed_type value.
Get parent scenarios, pipelines and tasks¶
To get the parent entities of a data node (scenarios or pipelines or tasks) you can use either the method
DataNode.get_parents()
or the function get_parents()
. Both return the parents of the data node.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|