Step 3: Data Node types

  • Pickle (default): Taipy can read and write any data that can be serializable.

  • CSV: Taipy can read and write any data frame as a CSV.

  • JSON: Taipy can read and write any JSONable data as a JSON file.

  • SQL: Taipy can read and write from/to a SQL table or a SQL database.

  • Mongo: Taipy can read and write from/to a Mongo Collection

  • Parquet: Taipy can read and write data frames from/to a Parquet format

  • Generic: Taipy provides a generic Data Node that can read and store any data based on a custom reading and writing function created by the user.

This section will use the simple DAG/execution configuration described below. The configuration consists of the following:

  1. Three Data Nodes:
  2. historical data: This is a CSV-type Data Node. It reads from a CSV file into the initial data frame. You can find the dataset used in the Getting Started here.

  3. month_data: This is a pickle Data Node. It stores in a pickle format the data frame generated by the task 'filter' (obtained after some filtering of the initial data frame).

  4. nb_of_values: This is also a pickle Data Node. It stores an integer generated by the 'count_values' task.

  5. Two tasks linking these Data Nodes:

  6. filter: filters on the current month of the data frame

  7. count_values: calculates the number of elements in this month

  8. One single pipeline in this scenario configuration grouping these two tasks.

def filter_current(df):
    current_month =
    df['Date'] = pd.to_datetime(df['Date']) 
    df = df[df['Date'].dt.month == current_month]
    return df

def count_values(df):
    return len(df)


  • Create the beginning of the Config with Data Nodes following the graph.

  • Change the details of historical_data in the 'Details' section of Taipy Studio

    • name: historical_data

    • Details: default_path='xxxx/yyyy.csv', storage_type=csv

  • Add tasks: filter_current and count_values

  • Finish the Config by connecting tasks and Data Nodes and creating the pipeline and scenario

To use this configuration in our code ( for example), we must load it and retrieve the scenario_cfg. This scenario_cfg is the basis to instantiate our scenarios.


# my_scenario is the id of the scenario configured
scenario_cfg = Config.scenarios['my_scenario']
# here is a CSV Data Node
historical_data_cfg = Config.configure_csv_data_node(id="historical_data",
month_values_cfg =  Config.configure_data_node(id="month_data")
nb_of_values_cfg = Config.configure_data_node(id="nb_of_values")
task_filter_cfg = Config.configure_task(id="filter_current",

task_count_values_cfg = Config.configure_task(id="count_values",
pipeline_cfg = Config.configure_pipeline(id="my_pipeline",

scenario_cfg = Config.configure_scenario(id="my_scenario",

scenario_1 = tp.create_scenario(scenario_cfg, creation_date=dt.datetime(2022,10,7), name="Scenario 2022/10/7")

print("Nb of values of scenario 1:",


[2022-12-22 16:20:03,424][Taipy][INFO] job JOB_filter_current_257edf8d-3ca3-46f5-aec6-c8a413c86c43 is completed.
[2022-12-22 16:20:03,510][Taipy][INFO] job JOB_count_values_90c9b3c7-91e7-49ef-9064-69963d60f52a is completed.

Nb of values of scenario 1: 896