🧑🍳Hybrid pipeline
Introduction
Hybrid Pipeline is an advanced feature of Toucan that optimizes the execution of data transformation pipelines by intelligently combining Native SQL execution and in-memory processing.
Key Concepts
Execution Engines
Toucan supports several execution modes for YouPrep steps:
NativeSQL: Steps are translated into SQL queries and executed directly by the connected database.
Toucan's side
In-memory: Toucan executes transformations in RAM.
Toucan Storage Database: For data loaded in "load" mode.
Pipeline
A pipeline is a sequence of YouPrep steps that transforms input dataset that could a dataset, or a combination of datasets into an output dataset.
Hybrid Pipeline
This feature allows Toucan to automatically determine the best engine to execute each step of a pipeline, prioritizing NativeSQL execution when possible.
How It Works
The pipeline is executed in NativeSQL mode as long as the steps are compatible. see here for more information
When an incompatible step is encountered, execution switches to in-memory mode for the rest of the pipeline.
Execution can be shared between the data source and Toucan.
Specific Rules
Append and Join Operations
If both source pipelines are in NativeSQL, the operation is performed at the data source level.
Otherwise, the operation is performed in-memory.
The rules above replace the rules stated here
Child Datasets
let's take a child dataset coming from a NativeSQL datasource the execution is done on Toucan's side if:
The parent was NativeSQL compatible but is no longer for certain reasons (an incompatible dataset in its pipeline)
The parent is full NativeSQL (all the steps can be executed in NativeSQL) but is stored.
there is a step of the dataset which is not compatible with NativeSQL.
In other usecases, this child dataset is compatible with a NativeSQL pipeline execution.
Conclusion
The hybrid pipeline feature
Increases flexibility in creating complex pipelines.
Automates performance optimization.
Allows to combine various data sources and steps.
Limitations
For some steps (JOIN or APPEND step) RAM consumption can be significant and the performance depends on the underlying engine (database or the Toucan workspace)
Last updated