# Hybrid pipeline

## Introduction

[Hybrid Pipeline](https://www.toucantoco.com/en/blog/product-release-hybrid-pipelines) is a Toucan feature that allows to optimize the execution of data transformation pipelines by intelligently combining [Native SQL](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql) execution (e.g. the possibility to create Data transformation pipeline that will be translated in SQL and executed by the datasource) and in-memory processing where the steps are executed on the Toucan engine in-memory.

## Execution Engines

Toucan supports several execution modes for YouPrep steps.

### NativeSQL

Steps are translated into SQL queries and executed directly by the connected database which can be:

* `PostgreSQL`
* `GoogleBigQuery`
* `Snowflake`
* `Amazon Redshift`
* `Amazon Athena`
* `MySQL` (with the new execution system)
* `MsSQL` and `Azure SQL` (with the new execution system)&#x20;
* `ClickHouse` (with the new execution system)

### Toucan engine

Toucan engine refers to the execution of data transformation steps that takes place in the Toucan backend

* **In-memory**: Toucan executes transformations in RAM.
* **Toucan Storage data store**: For data loaded in "load" mode.

### Data transformation pipeline

A data transformation pipeline is a sequence of YouPrep steps that transforms input dataset that could a dataset, or a combination of datasets into an output dataset.

## How It Works

1. The data transformation pipeline is executed in NativeSQL mode as long as the steps are compatible. see [here](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql) for more information.
2. When an incompatible step is encountered, execution switches to in-memory mode for the rest of the pipeline.
3. Execution can be shared between the data source and Toucan.

{% hint style="info" %}
For example, if you insert in your data transformation a `statistics` step (that is not nativeSQL compatible and not translatable in SQL), in a data transformation pipeline where the other steps are compatible with native SQL, then from that step onwards, the following steps will be executed in Toucan's in-memory engine.
{% endhint %}

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-9fe4121d6dbc9bcaab5c18d089f05b7e8c7720b2%2FHybrid%20pipeline%20-%20Hybrid%20pipeline.jpg?alt=media" alt=""><figcaption><p>Decision process for pipeline execution</p></figcaption></figure>

## Specific Rules

### Append and Join Operations

* If both source pipelines are in NativeSQL, the operation is performed at the data source level.
* Otherwise, the operation is performed in-memory.

{% hint style="warning" %}
The rules above replace the rules stated [here](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/overview-of-youprep-tm/combine/join-dataset-with-youprep-tm)
{% endhint %}

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-6e1b01aafa5f62ce3dcb3e69a5a9c71fff54c8b6%2FHybrid%20pipeline%20-%20Join_append%20step.jpg?alt=media" alt=""><figcaption><p>JOIN/APPEND decision process for execution</p></figcaption></figure>

### Child Datasets

let's take a child dataset coming from a NativeSQL datasource the execution is done on Toucan's side if:

* The parent was NativeSQL compatible but is no longer for certain reasons (an incompatible dataset in its pipeline)
* The parent is full NativeSQL (all the steps can be executed in NativeSQL) but is stored.
* there is a step of the dataset which is not compatible with NativeSQL.

In other use cases, this child dataset is compatible with a NativeSQL pipeline execution.

### Benefits of using hybrid pipelines

The hybrid pipeline feature

* Increases flexibility in creating complex pipelines.
* Automates performance optimization.
* Allows to combine various data sources and steps.

### Limitations of in-memory processing

For some steps (JOIN or APPEND step) RAM consumption can be significant and the performance depends on the underlying engine (database or the Toucan workspace)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/hybrid-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
