# Hybrid pipeline

## Introduction

[Hybrid Pipeline](https://www.toucantoco.com/en/blog/product-release-hybrid-pipelines) is a Toucan feature that allows to optimize the execution of data transformation pipelines by intelligently combining [Native SQL](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql) execution (e.g. the possibility to create Data transformation pipeline that will be translated in SQL and executed by the datasource) and in-memory processing where the steps are executed on the Toucan engine in-memory.

## Execution Engines

Toucan supports several execution modes for YouPrep steps.

### NativeSQL

Steps are translated into SQL queries and executed directly by the connected database which can be:

* `PostgreSQL`
* `GoogleBigQuery`
* `Snowflake`
* `Amazon Redshift`
* `Amazon Athena`
* `MySQL` (with the new execution system)
* `MsSQL` and `Azure SQL` (with the new execution system)&#x20;
* `ClickHouse` (with the new execution system)

### Toucan engine

Toucan engine refers to the execution of data transformation steps that takes place in the Toucan backend

* **In-memory**: Toucan executes transformations in RAM.
* **Toucan Storage data store**: For data loaded in "load" mode.

### Data transformation pipeline

A data transformation pipeline is a sequence of YouPrep steps that transforms input dataset that could a dataset, or a combination of datasets into an output dataset.

## How It Works

1. The data transformation pipeline is executed in NativeSQL mode as long as the steps are compatible. see [here](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql) for more information.
2. When an incompatible step is encountered, execution switches to in-memory mode for the rest of the pipeline.
3. Execution can be shared between the data source and Toucan.

{% hint style="info" %}
For example, if you insert in your data transformation a `statistics` step (that is not nativeSQL compatible and not translatable in SQL), in a data transformation pipeline where the other steps are compatible with native SQL, then from that step onwards, the following steps will be executed in Toucan's in-memory engine.
{% endhint %}

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-9fe4121d6dbc9bcaab5c18d089f05b7e8c7720b2%2FHybrid%20pipeline%20-%20Hybrid%20pipeline.jpg?alt=media" alt=""><figcaption><p>Decision process for pipeline execution</p></figcaption></figure>

## Specific Rules

### Append and Join Operations

* If both source pipelines are in NativeSQL, the operation is performed at the data source level.
* Otherwise, the operation is performed in-memory.

{% hint style="warning" %}
The rules above replace the rules stated [here](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/overview-of-youprep-tm/combine/join-dataset-with-youprep-tm)
{% endhint %}

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-6e1b01aafa5f62ce3dcb3e69a5a9c71fff54c8b6%2FHybrid%20pipeline%20-%20Join_append%20step.jpg?alt=media" alt=""><figcaption><p>JOIN/APPEND decision process for execution</p></figcaption></figure>

### Child Datasets

let's take a child dataset coming from a NativeSQL datasource the execution is done on Toucan's side if:

* The parent was NativeSQL compatible but is no longer for certain reasons (an incompatible dataset in its pipeline)
* The parent is full NativeSQL (all the steps can be executed in NativeSQL) but is stored.
* there is a step of the dataset which is not compatible with NativeSQL.

In other use cases, this child dataset is compatible with a NativeSQL pipeline execution.

### Benefits of using hybrid pipelines

The hybrid pipeline feature

* Increases flexibility in creating complex pipelines.
* Automates performance optimization.
* Allows to combine various data sources and steps.

### Limitations of in-memory processing

For some steps (JOIN or APPEND step) RAM consumption can be significant and the performance depends on the underlying engine (database or the Toucan workspace)
