# Troubleshoot:: DataHub

This page intends to help you to troubleshoot issues that can occur when you trying to access data or transform data.

### Introduction

This document provides a comprehensive overview of data sources within Toucan, focusing on connectors, datasets, and query configurations. It aims to guide users through the process of connecting to various data sources, understanding dataset structures, and effectively configuring queries for optimal data retrieval and transformation. By understanding these concepts, users can leverage Toucan to gain valuable insights from their data efficiently.

## Data Sources in Toucan

### Connectors

Regarding connections, we have two main types of connectors:

* Connectors where we perform a connection checklist. These connectors have a "Test connection" button.
* Other connectors where we do not perform a connection checklist. For example, the HTTP API connector, where it's not always possible to reach a first endpoint to ensure that your configuration is correct.

We will focus on connectors with a "Test connection" button.

The checklist is different for each connector, but let's take PostgreSQL as an example.

#### PostgreSQL Connection Checklist

| Test Step                   | What it means                                                                                                                                                        |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Host Resolved               | Toucan can reach the IP or host you have set in the input field.                                                                                                     |
| Port Opened                 | The port filled in the connection form is an open port on your PostgreSQL cluster.                                                                                   |
| Connected to PostgreSQL     | The host:port corresponds to a PostgreSQL instance, not a MySQL cluster, for example.                                                                                |
| Authenticated               | The login/password provided allows us to connect to the database you want to reach on your PostgreSQL cluster. The user has sufficient rights to access the cluster. |
| Default Database Connection | The default database field refers to an existing, accessible database associated with the user configured in                                                         |

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-f13e9a074a7fd6463ec0efd9d02bdcdead2e302a%2Fdatasource%20-%20mission%20connection%20success.png?alt=media" alt=""><figcaption><p>Datasource mission: connection succeded</p></figcaption></figure>

**Example:**

Let's consider that you have the following settings to access your database:

* host: `***-postgre-*******.net`
* port: `2******1`
* user: `username`
* password: `******`
* default database: `postgres`

**Host is not resolved (reachable)**

If your host is not reachable or the value you entered has a spelling mistake, your test connection will stop at the first step of the test connection

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-28d5c087bf02a19e689646f9ec3f6e17c133e0a5%2Fhost%20not%20reachable.png?alt=media" alt=""><figcaption><p>host can be resolved</p></figcaption></figure>

**Incorrect port**

**Port:** `2***0` instead of `2***1`

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-d08b7a9de7057d59a38a1c0b8db60f6f2a355ea8%2Fport_not_reachable.png?alt=media" alt=""><figcaption><p>port is not opened</p></figcaption></figure>

**Incorrect user/password**

**Username:** You filled `my_dear` instead of `username`

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-b1ad9f82cf2afdb4ffa62bcd6411b8d1024be0b0%2Fwrong_login_passwordassication.png?alt=media" alt=""><figcaption><p>user password combination does not correspond to an existing one</p></figcaption></figure>

**Incorrect database default**

**Default database :** You fill an incorrect database name `postdre` instead of `postgres`

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-98bd5fbafbe3a37ed9c66695ec57f746662a2c8f%2Fdefaultdatabasedoesnotexist.png?alt=media" alt=""><figcaption><p>default database does not exist</p></figcaption></figure>

## Datasets

The Dataset service manages all objects related to datasets and some of their business logic. It handles datasets and data providers.

To get your data, it executes a plan, calls a data provider, and applies a series of data transformation steps to output a data table.

The execution can be done:

* On your database (see native SQL for more explanation). In this case, an SQL request is sent to your database at the end of the pipeline.
* On our in-memory engine.
* In a combination of the two, depending on the position and compatibility of the steps.

The execution is mainly done through YouPrep and the query-configuration part.

Query configuration is the step after you configure your data source before you reach YouPrep.

To execute a plan depending on this situation, the data source must be available.

Each step must be executable, so its output should be a data table.

### Query Configuration

When you set up a data source of a connector type and try to fetch data, Toucan interprets your settings and queries the data source. The data returned is modified by Toucan. We limit SQL queries for performances reasons.

This part can vary depending on the connector you're using.

#### Code Mode

In this [specific query configuration setup](#code-mode), you can create an SQL query with placeholders. We'll insert the needed variables and send it to your database.

If your response is valid, a data table will appear on the right. If there's an issue, an error message from your database, reformulated by Toucan, will be display in the data table panel.

When using this mode, know about [crafting SQL](https://sql.sh) can be helpful. Alternatively you can also for some connectors use our [simplified query mode](https://docs-v3.toucantoco.com/data-management-in-datahub/datasources-in-toucan/managing-connectors/create-a-dataset-from-a-connector/code-mode-and-single-mode#single-mode)

**Example:**

Here, we are trying to send an SQL query to a table containing the following columns: `date`, `identity_id`, `value_2`, `value_1`. But the SQL statement we wrote is `SELECT value_3, identity_id from 'public'.sample_data'` and the query ends up in an error.

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-56cc833e831391666185ed367f87b2884e65e797%2Fquery_configuration_error%20copie.png?alt=media" alt=""><figcaption><p>SQL query error</p></figcaption></figure>

**Explanation**

The PostgreSQL database has no column named `value_3` in the `sample_data` table and returns this information to us.

### YouPrep

In YouPrep, you can configure a series of steps to be applied to your data table. In this mode, each step is applied when you click on it.

Each step is applied on the preview computation of 10k rows by default, allowing you to have a view of what your data table will look like after each step.

The data computation can be done:

* On the data source you have connected (for [Native SQL connectors](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/youprep-tm-native-sql)).
* In our in-memory engine (for non-Native SQL connectors or flat files).
* In a combination of Native SQL execution then in-memory ([hybrid pipelin](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/hybrid-pipeline)e)

{% hint style="info" %}
A step takes a data table as input, applies a step based on a configuration, and outputs a data table
{% endhint %}

If a step is in error, in the right tab, you will see the error corresponding to the step.

If an error occurs, check that:

* You apply a step to the right type.
* The column you're trying to transform exists previously in the dataset.
* You have read the helper text in the configuration panel.

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-5404b51ae9d41436659a49ba3da3004e888d8d69%2Fyouprep_notexistentcolumn.png?alt=media" alt=""><figcaption><p>Step applied on a column that does not exist</p></figcaption></figure>

Errors returned can depend on the plan execution of the dataset, if the dataset is executed on the datasource, for example as we do in NativeSQL the return will come from the database.

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-c766ffb41f13595eba3cc228d198cffd3a75609c%2FCapture%20d%E2%80%99e%CC%81cran%202025-02-14%20a%CC%80%2014.43.44.png?alt=media" alt=""><figcaption><p>Error from a NativeSQL pipeline</p></figcaption></figure>

You can check an exhaustive doc of YouPrep [here](https://weaverbird.toucantoco.dev/docs/general-principles/)

### Timeout and Optimization

To ensure optimal performance for your queries and dashboard speed, adhere to these steps. Remember, the **preview output is limited** to `30 seconds`.

It will improve the performance of your data queries. It's crucial to optimize to structure your steps in the right order.

Optimize the order of steps in your dataset pipeline

**Filter**

Start by filtering your data as early as possible. This step reduces the size of your dataset, which in turn decreases memory usage and speeds up subsequent operations.

**Keep columns**

After filtering, select only the columns you need for your analysis.

**Text and Date transformations steps**

Perform text and date transformations early in your pipeline. These operations can be computationally expensive, so doing them on a reduced dataset is more efficient.

**Conditional column**

Create conditional columns after text and date transformations but before aggregations. This allows you to use the newly created columns in subsequent operations.

**Unique values**

If you need to find unique values, do this before aggregations or joins. It can help reduce the dataset size for subsequent operations.

**Sort**

Sorting should be done after filtering and transformations but before aggregations. This can improve the performance of subsequent group-by operations.

**Group By and Aggregate**

Perform group-by and aggregate operations after the dataset has been reduced and transformed. This minimizes the amount of data being processed in these computationally intensive operations.

**Join**

Perform joins after you've reduced and aggregated your datasets. This minimizes the amount of data being joined, which can significantly improve performance.

**Append**

Append operations should generally be done near the end of your pipeline, after you've processed and transformed your individual datasets.

**Pivot/Unpivot**

Pivot and unpivot operations are typically done as final reshaping steps after all other transformations.

#### Reduce the amount of rows that the service treats

If you have difficulties having a return in 30 seconds, for example, you are working on a refresh dataset that will be executed asynchronously.

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-cc105d5d25e8c9e500fc400572ca173f7bdb0667%2Frows_subset.png?alt=media" alt=""><figcaption><p>source rows subset</p></figcaption></figure>

You can set the preview computation on a lower amount of rows. By default, the value is `10 000`, but it can be reduced to transform less in the preview, allowing you to have a result before the timeout.

### Troubleshooting Infinite Loops in Data Preparation

#### Issue Description

Infinite loops can severely impact the functionality of an application, depending on the affected dataset, and may prevent the application from being published. If an infinite loop affects a dataset, it will also affect all child datasets.

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-4869d973c6acb634ef0122379cec6e2763de9803%2Finfiniteloop.gif?alt=media" alt=""><figcaption><p>infinite loop preventing a home from loading</p></figcaption></figure>

#### Identifying the Problem

To resolve this issue, you need to identify the affected dataset. When opening the dataset, you may encounter an error message similar to:

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-c52fd686ff7eff92454c2f68698bdd77ef9dd0c2%2F8ed9d397-262f-4680-8dfd-ee65db37b937.jpg?alt=media" alt=""><figcaption><p>Infinite loop error message in Youprep</p></figcaption></figure>

```
RecursiveQueryError: Recursive query detected {{uuid}} was already visited
```

#### Root Cause

This error occurs when a filter is used to filter the same dataset that the filter is based on, creating an infinite loop (hence the RecursiveQueryError message). It can also happen when a dataset references itself.

#### Example Scenario

All filters on the home page use a dataset called `{{dataset_name}}`. If this dataset is filtered by these same filters, it creates a recursive loop.

For instance, your filter `{{filter_name}}` uses the dataset `{{dataset_name}}`. This dataset is constructed in a way that it's filtered by a filter with the ID `{{uuid_dataset}}`, which is actually `{{filter_name}}` itself.

Additionally, you're using the filter `{{filter_name2}}`, which is filtered by `{{filter_name}}` and is also a child of the dataset that originates `{{filter_name2}}`. This prevents the visualization of both datasets due to the loop formation.

#### Locating the Dataset

Each dataset has a unique ID. You can find the dataset using the following URL structure:

```
https://{{workspace_name}}.example.com/{{app_name}}/datahub/datasets/edit-dataset/{{uuid_dataset}}?stage=staging
```

#### Resolution Steps

To correct this issue, you have two options:

1. Remove the problematic filtering step
2. Use different source datasets for the filters

By implementing one of these solutions, you should be able to resolve the infinite loop and restore proper functionality to your application.

### Error Categories

Below is a list of error categories you might encounter while using Toucan. Each category includes a definition and, if possible, an example.

| Error Type                | Definition                                                                            | Example                                                                                           |
| ------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| PlanExecutionError        | One of the configurations of your steps prevents its execution.                       | Check the configuration of your steps for type errors, non-existent columns.                      |
| StepInUnsupportedPosition | Step is in a position that prevents its execution.                                    |                                                                                                   |
| DatasourceNotFound        | The data source to perform the steps has been removed, preventing pipeline execution. | A data source has been deleted from data sources.                                                 |
| PlanExecutionError        | Exception raised when a plan execution fails.                                         |                                                                                                   |
| QueryNotFoundError        | Exception raised when a query cannot be found.                                        |                                                                                                   |
| RecursiveQueryError       | Exception raised when a recursive query is detected.                                  | A filter has been used to filter the dataset used by that same filter, creating an infinite loop. |
| DatasetServiceError       | Raised when the dataset service faced an error when handling a query.                 | A dataset cannot be found, a step is not supported, a connector is not supported.                 |
| ValueError                | No data provider name.                                                                |                                                                                                   |
| TypeError                 | The value is a formula or a string literal that can't be parsed.                      | User attributes in response data were not understood.                                             |
| ValidationError           | Raised when dataset cannot be validated.                                              |                                                                                                   |

## Conclusion

By understanding these concepts related to data sources, connectors, datasets, and query configurations, you can effectively leverage Toucan to gain valuable insights from your data. Remember to optimize your steps for better performance and to handle errors effectively.

If necessary, please reach out to our support team for more information or assistance. We're here to help you make the most of Toucan
