# Stored and Live Datasets

In this page we will dive in the difference between Stored and Live Datasets and their benefits.

There are two ways to handle dataset storage in Toucan:

* **Stored dataset**: The dataset is stored in the Toucan data store.
* **Live dataset**: The dataset is queried by Toucan every time it is needed to fuel a tile or story.

## Stored Datasets

A **stored dataset** is similar to a [materialized view](https://en.wikipedia.org/wiki/Materialized_view). It stores the pre-computed result of a query (in this case, a **YouPrep pipeline**). The resulting data is saved in the **Toucan data store**.

When using a stored dataset, you rely entirely on Toucan for both storage and computation. The computation can be triggered:

* **manually**, after building the dataset or on demand
* **automatically**, at a scheduled time through an automation

### How a stored dataset is refreshed

A stored dataset represents a snapshot of your data at a given moment. To update it, you must perform a *refresh*. See [refresh datasets](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/managing-datasets/stored-datasets/refreshing-and-publishing-datasets) for more information

A **refresh** is a Toucan feature that recomputes the dataset by:

* fetching its datasource (e.g., a flat file or a SQL query to a database)
* transforming the data according to the steps defined in the YouPrep pipeline
* storing the updated result in the Toucan data store

A refresh only affects **staging** data. To apply these updates to **production**, you must publish the app.\
Once the refresh is complete, the previous dataset result is replaced with the updated data.

{% hint style="warning" %}
A stored dataset is already computed and then cannot include any variable
{% endhint %}

## Live Datasets

A **Live Dataset** is a dataset whose result is computed on-the-fly each time it is needed **the result is** not kept as a pre-computed result in the Toucan data store. It is computed either directly at the [datasource level](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql), in Toucan’s in-memory engine (RAM), or through a [hybrid computation](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/hybrid-pipeline) combining both. Starting with a live dataset ensures that the displayed data is always up to date, reflecting the latest state of the source.

### From which datasets can a live dataset be built?

Live Datasets can be built on top of:

* Other live datasets: if all datasets in the lineage are live datasets, it means that Toucan will not store data at any point of the data preparation process - so no data replication outside your own systems -, and the data displayed to users will be as fresh as it exists in the external datasource

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-625f0433c2b25c615c44b63f3fe6dcaa3baacc56%2Fimage.png?alt=media" alt="Data lineage with only live datasets in Toucan"><figcaption><p>When all datasets in Toucan are live, data is as fresh as the source</p></figcaption></figure>

* Stored datasets: in this case the data will be as fresh at its parent dataset in Toucan. Using a live dataset on top of a stored dataset is useful if you want to use [user context related variables](#user-content-fn-1)[^1] in the dataset

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-0d619a7241af6e450590b414f91c1e6390e54aaa%2Fimage.png?alt=media" alt="Data lineage with a stored parent datasets"><figcaption><p>When a live dataset is built on top of a stored dataset it is as fresh as the parent dataset</p></figcaption></figure>

## Stored datasets vs. Live Datasets benefits

|                                        Stored datasets                                        |                                                              Live datasets                                                             |
| :-------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------: |
|  Useful if you don't have any data warehousing solution in your data stack to build analytics | <p>If all the parent datasets are also live:<br>- No data replication outside of your system<br>- Data as fresh as the data source</p> |
| Already computed: can be faster than live datasets (depending on the data source performance) |                                                Can use variables in the computation step                                               |

[^1]: Variables related to a user context depends on an individual specific user: a user interacting with the app with specific attributes (roles, permissions) or selecting filter values on its browser
