Skip to content

taipy.core.data.ParquetDataNode

Bases: DataNode, _AbstractFileDataNode, _AbstractTabularDataNode

Data Node stored as a Parquet file.

Attributes:

Name Type Description
config_id str

Identifier of the data node configuration. This string must be a valid Python identifier.

scope Scope

The scope of this data node.

id str

The unique identifier of this data node.

owner_id str

The identifier of the owner (sequence_id, scenario_id, cycle_id) or None.

parent_ids Optional[Set[str]]

The identifiers of the parent tasks or None.

last_edit_date datetime

The date and time of the last modification.

edits List[Edit]

The ordered list of edits for that job.

version str

The string indicates the application version of the data node to instantiate. If not provided, the current version is used.

validity_period Optional[timedelta]

The duration implemented as a timedelta since the last edit date for which the data node can be considered up-to-date. Once the validity period has passed, the data node is considered stale and relevant tasks will run even if they are skippable (see the Task management page for more details). If validity_period is set to None, the data node is always up-to-date.

edit_in_progress bool

True if a task computing the data node has been submitted and not completed yet. False otherwise.

editor_id Optional[str]

The identifier of the user who is currently editing the data node.

editor_expiration_date Optional[datetime]

The expiration date of the editor lock.

path str

The path to the Parquet file.

properties dict[str, Any]

A dictionary of additional properties. properties must have a "default_path" or "path" entry with the path of the Parquet file:

  • "default_path" (str): The default path of the Parquet file.
  • "exposed_type": The exposed type of the data read from Parquet file. The default value is pandas.
  • "engine" (Optional[str]): Parquet library to use. Possible values are "fastparquet" or "pyarrow".
    The default value is "pyarrow".
  • "compression" (Optional[str]): Name of the compression to use. Possible values are "snappy", "gzip", "brotli", or "none" (no compression).
    The default value is "snappy".
  • "read_kwargs" (Optional[dict]): Additional parameters passed to the pandas.read_parquet() function.
  • "write_kwargs" (Optional[dict]): Additional parameters passed to the pandas.DataFrame.write_parquet() function. The parameters in "read_kwargs" and "write_kwargs" have a higher precedence than the top-level parameters which are also passed to Pandas.

read_with_kwargs(**read_kwargs)

Read data from this data node.

Keyword arguments here which are also present in the Data Node config will overwrite them.

Parameters:

Name Type Description Default
**read_kwargs dict[str, any]

The keyword arguments passed to the function pandas.read_parquet().

{}

write_with_kwargs(data, job_id=None, **write_kwargs)

Write the data referenced by this data node.

Keyword arguments here which are also present in the Data Node config will overwrite them.

Parameters:

Name Type Description Default
data Any

The data to write.

required
job_id JobId

An optional identifier of the writer.

None
**write_kwargs dict[str, any]

The keyword arguments passed to the function pandas.DataFrame.to_parquet().

{}