# Optimize data performance

*Here are a few things you should consider regarding data in Toucan! 💡*

### How do you ensure that your application is optimized for long-term use?

#### Data prep is the key.

Ensure all of your heavy data transformations are done in data preparation.

#### Factorized your pipeline

Make sure that your data preparation is factorized. Why do the same operation ten times when you can only do it once?

If your loading time is still a bit long, go to the network tab of your developer tool, and check where the loading time is high, and how long is the loading time. Therefore, you can improve your query and check later if the loading time improved.

#### Mongo indexes

Load mode stores data in a MongoDB database. Data is read from MongoDB in load mode to increase data read speed and, consequently, application performance. Without indexes, MongoDB must perform a collection scan, meaning it examines every document in the collection to determine if it matches the query. For a small collection, the difference in rendering may not be noticeable, but it becomes significant for larger collections (for example, several thousand lines). Indexes can accelerate queries by limiting the number of documents (lines) to scan. Indexes can be created on any attribute of a document, allowing MongoDB to locate matching data more quickly.

For more information, refer to the [official documentation](https://docs.mongodb.com/manual/indexes/#create-an-index).

**Configuring MongoDB Indexes**

To access MongoDB indexes, switch to the staging mode of the application, in `Settings`, go to the `Advanced configuration` option and select the `etl_config` section, click on edit. Indexes can be created for each domain, with each domain represented as a key in the `MONGO_INDEXES` configuration block. Example configuration:

```
MONGO_INDEXES:
  domain_a: [
    'year'
    ['city', 'kpi_code', 'version']
    ['city', 'entity', 'version']
  ]
  domain_b: [
    ['date', 'filter']
  ]
```

The domain `domain_a` has 3 indexes:

* The first is an index on a single field `year`.
* The second and third are compound indexes.

{% hint style="info" %}
The most effective way to achieve efficient MongoDB indexes is to analyze the query structure. If datasets are consistently filtered using the same columns, it may be beneficial to add a MongoDB index on those columns.
{% endhint %}

For compound indexes, the order of fields is important in the index but not in the query. In addition to supporting queries that match all index fields, compound indexes can support queries that match a prefix (a subset at the beginning of the set) of the index fields.

<details>

<summary>Query Examples</summary>

* Success with the index `['city', 'kpi_code', 'version']`:

  ```
  textquery:
    kpi_code: "CA"
    city: "Paris"
    version: 7
  ```
* Success with the index `['city', 'kpi_code', 'version']`:

  ```
  textquery:
    city: "Paris"
    kpi_code: "CA"
  ```
* Success with the index `['city', 'kpi_code', 'version']` but only for `city`:

  ```
  textquery:
    city: "Paris"
    version: 7
  ```
* Failure with the index `['city', 'kpi_code', 'version']` (city is missing):

  ```
  textquery:
    version: 7
    kpi_code: "CA"
  ```

</details>

{% hint style="warning" %}
Creating indexes everywhere for everything is not a magical solution. It is time-consuming and memory-intensive.

Don’t forget to measure the improvements with the network tab of your browser inspector.
{% endhint %}

### Checklist to make sure your application is optimized

#### Queries in general:

* Check if queries are filtered first.
* Check if they return only what is displayed on the screen.
* Check if part of the query could be done in dataprep and thus increase display speed.
* No hard coded values: always prefer smart rules (ex: argmax on year instead of 2021).

#### Data architecture:

**Check the data pipeline: is it clear and easy to read ?**

* Clear domain names.
* Keep only used domains.

**Check the date requester construction (if you are using the old one):**

* Is it prepared with Dataprep? (because treatment will otherwise be played at each screen loading).
* Enough date format to use in all screen query filtering.
* Anticipate a year + 1: will it keep on working?
* Nice to have: a year -1 or month -1 date column if you need it in your screen queries calculations.

**Check report requester construction:**

* Is it prepared with YouPrep? (because treatment will otherwise be played at each screen loading).
* Nice to have: a column for the order (tip: use a conditional step to create it).
* Nice to have: if you need it, a “type” column is useful, and several if you have a hierarchical report (children type, parent type).
* Nice to have hierarchical: have both one parent/one children column and intermediary levels in columns for each child (tip: can be done with YouPrep as dataprep : rollup + join).
* How long do I think this data architecture can last before being improved? (limits in data volume, in query preparation?).
* Data validation is in place.

#### Loading speed:

* Check if the screens are fast enough.
* Check if the home is fast enough.
* Check if Mongo indexes are implemented (if needed).
* Check if requesters should be used instead of filters.

#### Mobile use:

* Are my screens easily readable on mobile?


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/managing-datasets/stored-datasets/optimize-data-performance.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
