Adding indexes to stored datasets

When creating a stored dataset, Toucan stores it in a Mongo database.

Thus, when your app lags due to slow query execution, you can add Mongo Indexes to your dataset(s) to speed up the loading time.

What are MongoDB indexes?

We can make a comparison between all your datasets and a library. It could be painful to find the book youโ€™re looking for if all the books are not ordered. But if your library is ordered by categories/color/author, it will be childโ€™s play.

Without indexes, MongoDB has to do a collection scan, meaning it will scan each row of your dataset to see if it matches the query statement or a short collection; you wonโ€™t see any difference in rendering. Still, the difference will be huge for bigger collections (a few thousand lines, for example). Indexes can accelerate your queries by limiting the number of rows (=lines) to scan.

Indexes can be created on any document attribute: they allow MongoDB to know where to find data that will match your query statement quicker.

See the official doc for more information.

Configuring MongoDB Indexes in Toucan

MongoDB indexes are described in the etl_config file (accessible within the "Advanced configuration" interface of your App accessible from the top-right cogwheel icon). You can create indexes for each dataset. Each dataset is a key in the MONGO_INDEXES configuration block.

MONGO_INDEXES:
  dataset_a: [
    โ€˜yearโ€™
    [โ€˜cityโ€™, โ€˜kpi_codeโ€™, โ€˜versionโ€™]
    [โ€˜cityโ€™, โ€˜entityโ€™, โ€˜versionโ€™]
  ]
  dataset_b: [
    [โ€˜dateโ€™, โ€˜filterโ€™]
  ]

The dataset dataset_a has 3 indexes :

  • the 1st is an index on a single field year

  • the 2nd and the 3rd are compound indexes.

Once the indexes are configured, apply them by clicking on "Compute index".

Important

The best way to have efficient Mongo indexes is to analyze your query structure. If youโ€™re always filtering your datasets with the same columns, adding a Mongo index to those columns could be a great idea.

Note

For compound indexes, the order of the fields is important in the index but not in the query. In addition to supporting queries that match on all the index fields, compound indexes can support queries that match the index fields' prefix (a subset at the beginning of the set).

Warning

Creating indexes everywhere for everything is not a magical solution. It is time and memory-consuming.

Last updated