🔢Stored and Live Datasets

In this page we will dive in the difference between Stored and Live Datasets and their benefits.

There are two ways to handle dataset storage in Toucan:

  • Stored dataset: The dataset is stored in the Toucan data store.

  • Live dataset: The dataset is queried by Toucan every time it is needed to fuel a tile or story.

Stored Datasets

A stored dataset is similar to a materialized view. It stores the pre-computed result of a query (in this case, a YouPrep pipeline). The resulting data is saved in the Toucan data store.

When using a stored dataset, you rely entirely on Toucan for both storage and computation. The computation can be triggered:

  • manually, after building the dataset or on demand

  • automatically, at a scheduled time through an automation

How a stored dataset is refreshed

A stored dataset represents a snapshot of your data at a given moment. To update it, you must perform a refresh. See refresh datasets for more information

A refresh is a Toucan feature that recomputes the dataset by:

  • fetching its datasource (e.g., a flat file or a SQL query to a database)

  • transforming the data according to the steps defined in the YouPrep pipeline

  • storing the updated result in the Toucan data store

A refresh only affects staging data. To apply these updates to production, you must publish the app. Once the refresh is complete, the previous dataset result is replaced with the updated data.

Live Datasets

A Live Dataset is a dataset whose result is computed on-the-fly each time it is needed the result is not kept as a pre-computed result in the Toucan data store. It is computed either directly at the datasource level, in Toucan’s in-memory engine (RAM), or through a hybrid computation combining both. Starting with a live dataset ensures that the displayed data is always up to date, reflecting the latest state of the source.

From which datasets can a live dataset be built?

Live Datasets can be built on top of:

  • Other live datasets: if all datasets in the lineage are live datasets, it means that Toucan will not store data at any point of the data preparation process - so no data replication outside your own systems -, and the data displayed to users will be as fresh as it exists in the external datasource

Data lineage with only live datasets in Toucan
When all datasets in Toucan are live, data is as fresh as the source
  • Stored datasets: in this case the data will be as fresh at its parent dataset in Toucan. Using a live dataset on top of a stored dataset is useful if you want to use in the dataset

Data lineage with a stored parent datasets
When a live dataset is built on top of a stored dataset it is as fresh as the parent dataset

Stored datasets vs. Live Datasets benefits

Stored datasets
Live datasets

Useful if you don't have any data warehousing solution in your data stack to build analytics

If all the parent datasets are also live: - No data replication outside of your system - Data as fresh as the data source

Already computed: can be faster than live datasets (depending on the data source performance)

Can use variables in the computation step

Last updated

Was this helpful?