Hybrid pipeline
Introduction
Hybrid Pipeline is a Toucan feature that allows to optimize the execution of data transformation pipelines by intelligently combining Native SQL execution (e.g. the possibility to create Data transformation pipeline that will be translated in SQL and executed by the datasource) and in-memory processing where the steps are executed on the Toucan engine in-memory.
Execution Engines
Toucan supports several execution modes for YouPrep steps.
NativeSQL
Steps are translated into SQL queries and executed directly by the connected database which can be:
PostgreSQL
GoogleBigQuery
Snowflake
Amazon Redshift
Amazon Athena
Toucan engine
Toucan engine refers to the execution of data transformation steps that takes place in the Toucan backend
In-memory: Toucan executes transformations in RAM.
Toucan Storage data store: For data loaded in "load" mode.
Data transformation pipeline
A data transformation pipeline is a sequence of YouPrep steps that transforms input dataset that could a dataset, or a combination of datasets into an output dataset.
How It Works
The data transformation pipeline is executed in NativeSQL mode as long as the steps are compatible. see here for more information.
When an incompatible step is encountered, execution switches to in-memory mode for the rest of the pipeline.
Execution can be shared between the data source and Toucan.

Specific Rules
Append and Join Operations
If both source pipelines are in NativeSQL, the operation is performed at the data source level.
Otherwise, the operation is performed in-memory.
The rules above replace the rules stated here

Child Datasets
let's take a child dataset coming from a NativeSQL datasource the execution is done on Toucan's side if:
The parent was NativeSQL compatible but is no longer for certain reasons (an incompatible dataset in its pipeline)
The parent is full NativeSQL (all the steps can be executed in NativeSQL) but is stored.
there is a step of the dataset which is not compatible with NativeSQL.
In other use cases, this child dataset is compatible with a NativeSQL pipeline execution.
Benefits of using hybrid pipelines
The hybrid pipeline feature
Increases flexibility in creating complex pipelines.
Automates performance optimization.
Allows to combine various data sources and steps.
Limitations of in-memory processing
For some steps (JOIN or APPEND step) RAM consumption can be significant and the performance depends on the underlying engine (database or the Toucan workspace)
Last updated
Was this helpful?