Aggregating indexes as secondary indexes

Overview

Aggregating indexes are useful for speeding up queries that use common aggregation patterns and aggregated results. They can also be used to speed up queries that filter by common patterns, acting as secondary indexes.

You can benefit from using an aggregating index as a secondary index when three conditions are met:

  1. Your query contains a filter over a high-cardinality field.
  2. The filtered field is not part of the scanned table primary index OR is part of the primary index but positioned in a less meaningful position in the primary index.
  3. The scanned table has a well-thought primary index definition that is designed to serve other frequent filtered fields.

Once you decide to create an secondary index, you may need to adjust your query pattern to consistently use the aggregating index for pruning. This involves changing your filter to use the following pattern: WHERE column IN (SELECT DISTINCT column FROM fact_table). By using the SELECT DISTINCT construction, Firebolt will prune your query using the aggregating index.

Example

One example of this is with an incremental ingest. A common pattern is to ingest files that have not been ingested yet, making use of the source_file_name metadata column in the external table and adding it to the fact table. The incremental insert would look like this:

INSERT INTO fact_table fSELECT * from external_table eWHERE e.source_file_name NOT IN (  SELECT DISTINCT source_file_name FROM fact_table);

If the fact table is very large, getting the distinct source_file_name values can be expensive.

However, by defining an Aggregating Index that includes source_file_name , Firebolt will use the much smaller AI to get the values, reading less data and running faster. The Aggregating Index would look like this:

CREATE AGGREGATING INDEX ix_agg_fact_table_filenameON fact_table(source_file_name));

There is an implicit count(*) included in every AI, so there is no need to specify an aggregation column.

An AI is not used in JOINS or if there are no aggregations in a single table query, so the filter pattern is required for the AI to be used:

...WHERE column1 IN (SELECT DISTINCT col1 FROM source table)

Running an EXPLAIN after creating the index will show you if the AI is being used in the query.