Skip to content

Data integration

In this section, we explore how to integrate data into your Taipy application using data nodes. A DataNode is the cornerstone of Taipy's data management capabilities, providing a flexible and consistent way to handle data from various sources. Whether your data resides in files, in databases, in custom data stores, or on local or remote environments, data nodes simplify the process of accessing, processing, and managing your data.

What is a Data Node?

A data node in Taipy is an abstraction that represents some data. It provides a uniform interface for reading and writing data, regardless of the underlying storage mechanism. This abstraction allows you to focus on your application's logic without worrying about the intricacies of data management.

A data node does not contain the data itself but holds all the necessary information to read and write the actual data. It can be seen as a dataset descriptor or data reference. It is design to model data:

  • For any format: a built-in Python object (e.g. an integer, a string, a dictionary or list of parameters, etc.) or a more complex object (e.g. a file, a machine learning model, a list of custom objects, the result of a database query, etc.).

  • For any type: internal or external data, local or remote data, historical data, a parameter or a parameter set, a trained model, etc.

  • For any usage: independent data or data related to others through data processing pipelines or scenarios.

To create a data node, you first need to define a data node configuration using a DataNodeConfig object. This configuration is used to instantiate one (or multiple) data node(s) with the desired properties.

Why use Data Nodes?

The main advantages of using data nodes in a Taipy project are:

  1. Easy to configure: Thanks to the various predefined data nodes, many types of data can be easily integrated. For more details, see the data node configuration page.

  2. Easy to use: Taipy already implements the necessary utility methods to create, get, read, write, filter, or append data nodes. For more details, see the data node usage page.

  3. Taipy visual elements: Benefit from smart visual elements to empower end users just in one line of code. Manage, display, and edit data nodes in a user-friendly graphical interface. For more details, see the data node selector or the data node viewer pages.

  4. Data history and validity period: Keep track of the data editing history, and monitor the data validity. For more information, see the data node history page.

  5. Seamless integration with Task orchestration and Scenario management: Data pipelines in Taipy are modeled as execution graphs within scenarios connecting data nodes through tasks. Task orchestration and scenario management are key features of Taipy. For more information, see the task orchestration or scenario and data management pages.

  6. Support multiple alternative datasets for What-if analysis: Easily manage alternative data nodes as different versions or variations of your dataset within the same application. This is particularly useful for What-if analysis. For more information, see the what-if analysis page.

How to use Data Nodes?

A DataNode is instantiated from a DataNodeConfig object. It encapsulates the necessary information to create the data node (e.g. the data source, the data format, the data type, the way to read and write the data).

To integrate a data node into your Taipy application, you need to follow these steps:

  1. Define a DataNodeConfig: Create a global DataNodeConfig object using the various predefined methods available in Taipy such as Config.configure_data_node(), Config.configure_csv_data_node(), Config.configure_json_data_node(), etc.
    For more details, see the data node configuration page.

  2. Instantiate a DataNode: Once you have defined the data node configuration, you can instantiate your DataNode. Use the tp.create_global_data_node() method.
    For more details, see the data node usage page.

  3. Access or visualize your Data: You can now retrieve your DataNode, Read, write, filter, or append data as needed. For more details, see the data node usage page.
    You can also use the Taipy visual elements to manage, display, and edit your data nodes. For more details, see the data node visual elements page.

Examples

Here is an example of how to integrate some data and use a global data node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import taipy as tp
from taipy import Config

if __name__ == "__main__":
    # Configure a global data node
    dataset_cfg = Config.configure_data_node("my_dataset", scope=tp.Scope.GLOBAL)

    # Instantiate a global data node
    dataset = tp.create_global_data_node(dataset_cfg)

    # Retrieve the list of all data nodes
    all_data_nodes = tp.get_data_nodes()

    # Write the data
    dataset.write("Hello, World!")

    # Read the data
    print(dataset.read())

The previous code snippet shows how to configure a data node, instantiate it, retrieve it, write some data, and read it back.
Here is the complete python code corresponding to the example.

Here is another example of how to integrate some data and visualize it using the data node visual elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from datetime import datetime
import taipy as tp
from taipy import Config, Orchestrator, Gui, Scope
import pandas as pd

# Creating a data node variable to be bound to the visual element
data_node = None

if __name__ == "__main__":
    # Creating various data sources. A CSV file (out.csv), an integer parameter and a datetime object
    pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}).to_csv("out.csv")
    parameter, date = 15, datetime.now()

    # Configure global data nodes to integrate previous data
    ds_cfg = Config.configure_csv_data_node(id="dataset", scope=Scope.GLOBAL, default_path="out.csv")
    parameter_cfg = Config.configure_data_node(id="parameter", scope=Scope.GLOBAL, default_data=parameter)
    date_cfg = Config.configure_data_node(id="date", scope=Scope.GLOBAL, default_data=date)

    # Instantiate the three data nodes
    Orchestrator().run()
    tp.create_global_data_node(ds_cfg)
    tp.create_global_data_node(parameter_cfg)
    tp.create_global_data_node(date_cfg)

    # Running the GUI service with a data node selector and a data node viewer
    page = ("<|{data_node}|data_node_selector|>"
            "<|{data_node}|data_node|>")
    Gui(page=page).run()

In the previous code snippet three data node configurations are created. Some default data is passed to each of them. Then, the data nodes are instantiated. Finally, a GUI service is started with two visual elements to visualize and edit the data nodes though a user-friendly interface.
Here is the complete python code corresponding to the example.