Optimize data performance

Here are a few things you should consider regarding data in Toucan! 💡

How do you ensure that your application is optimized for long-term use?

Data prep is the key.

Ensure all of your heavy data transformations are done in data preparation.

Factorized your pipeline

Make sure that your data preparation is factorized. Why do the same operation ten times when you can only do it once?

If your loading time is still a bit long, go to the network tab of your developer tool, and check where the loading time is high, and how long is the loading time. Therefore, you can improve your query and check later if the loading time improved.

Mongo indexes

Load mode stores data in a MongoDB database. Data is read from MongoDB in load mode to increase data read speed and, consequently, application performance. Without indexes, MongoDB must perform a collection scan, meaning it examines every document in the collection to determine if it matches the query. For a small collection, the difference in rendering may not be noticeable, but it becomes significant for larger collections (for example, several thousand lines). Indexes can accelerate queries by limiting the number of documents (lines) to scan. Indexes can be created on any attribute of a document, allowing MongoDB to locate matching data more quickly.

For more information, refer to the official documentation.

Configuring MongoDB Indexes

To access MongoDB indexes, switch to the staging mode of the application, in Settings, go to the Advanced configuration option and select the etl_config section, click on edit. Indexes can be created for each domain, with each domain represented as a key in the MONGO_INDEXES configuration block. Example configuration:

MONGO_INDEXES:
  domain_a: [
    'year'
    ['city', 'kpi_code', 'version']
    ['city', 'entity', 'version']
  ]
  domain_b: [
    ['date', 'filter']
  ]

The domain domain_a has 3 indexes:

  • The first is an index on a single field year.

  • The second and third are compound indexes.

The most effective way to achieve efficient MongoDB indexes is to analyze the query structure. If datasets are consistently filtered using the same columns, it may be beneficial to add a MongoDB index on those columns.

For compound indexes, the order of fields is important in the index but not in the query. In addition to supporting queries that match all index fields, compound indexes can support queries that match a prefix (a subset at the beginning of the set) of the index fields.

Query Examples
  • Success with the index ['city', 'kpi_code', 'version']:

    textquery:
      kpi_code: "CA"
      city: "Paris"
      version: 7
  • Success with the index ['city', 'kpi_code', 'version']:

    textquery:
      city: "Paris"
      kpi_code: "CA"
  • Success with the index ['city', 'kpi_code', 'version'] but only for city:

    textquery:
      city: "Paris"
      version: 7
  • Failure with the index ['city', 'kpi_code', 'version'] (city is missing):

    textquery:
      version: 7
      kpi_code: "CA"

Creating indexes everywhere for everything is not a magical solution. It is time-consuming and memory-intensive.

Don’t forget to measure the improvements with the network tab of your browser inspector.

Checklist to make sure your application is optimized

Queries in general:

  • Check if queries are filtered first.

  • Check if they return only what is displayed on the screen.

  • Check if part of the query could be done in dataprep and thus increase display speed.

  • No hard coded values: always prefer smart rules (ex: argmax on year instead of 2021).

Data architecture:

Check the data pipeline: is it clear and easy to read ?

  • Clear domain names.

  • Keep only used domains.

Check the date requester construction (if you are using the old one):

  • Is it prepared with Dataprep? (because treatment will otherwise be played at each screen loading).

  • Enough date format to use in all screen query filtering.

  • Anticipate a year + 1: will it keep on working?

  • Nice to have: a year -1 or month -1 date column if you need it in your screen queries calculations.

Check report requester construction:

  • Is it prepared with YouPrep? (because treatment will otherwise be played at each screen loading).

  • Nice to have: a column for the order (tip: use a conditional step to create it).

  • Nice to have: if you need it, a “type” column is useful, and several if you have a hierarchical report (children type, parent type).

  • Nice to have hierarchical: have both one parent/one children column and intermediary levels in columns for each child (tip: can be done with YouPrep as dataprep : rollup + join).

  • How long do I think this data architecture can last before being improved? (limits in data volume, in query preparation?).

  • Data validation is in place.

Loading speed:

  • Check if the screens are fast enough.

  • Check if the home is fast enough.

  • Check if Mongo indexes are implemented (if needed).

  • Check if requesters should be used instead of filters.

Mobile use:

  • Are my screens easily readable on mobile?

Last updated