How to Warm Up an Aggregating Index in Firebolt

Introduction

Warming up an Aggregating Index in Firebolt can improve query performance by preloading data into the cache. Data will be stored in the cache as normal queries are executed, but running warm-up queries after an engine is started will result in those normal queries running faster. By the end of this article, you will know how to effectively warm up aggregating indexes using the CHECKSUM function.

TL;DR

  • Aggregating indexes are warmed up separately from their associated tables, only being loaded into the cache if they are used in the query being run.

  • Use the CHECKSUM function to warm up data in Firebolt.

  • Use filters to target specific data ranges.

  • Run multiple warm-up queries in parallel for large datasets.

  • Monitor cache usage during the warm-up process.

Step-by-Step Guide

All the example SQL code uses the Ultra Fast Gaming data set. To familiarize yourself with this data set, visit this link: Ultra Fast Gaming Dataset.

Step 1: Identify the Aggregating Indexes that should be warmed up

Aggregating indexes associated with large tables (millions of rows) that are frequently used are candidates for being warmed up. When the normal workload is run, those aggregating indexes will take the longest to read and have the biggest impact on initial query performance.

Step 2: Build the Warm-up Queries

An aggregating index is warmed up in Firebolt by running a CHECKSUM query on a subquery that uses the aggregation functions and filter columns defined in the aggregating index.

Example SQL code:

-- create an aggregating index on the playstats table
CREATE AGGREGATING INDEX idx_playstat_playerid ON playstats(
  playerid, 
  MAX(gameid),
  MAX(stattime),
  SUM(currentscore)
);

-- Warm-up the aggregating index
SELECT
CHECKSUM (*)
FROM(
  SELECT
    playerid, 
    MAX(gameid),
    MAX(stattime),
    SUM(currentscore)
  FROM playstats
  GROUP BY ALL
) t;

Step 3: Execute warm-up queries and monitor cache usage

While the warm-up queries are executing, the amount of cache being used will increase. You can monitor the increase by running two queries in another tab.

Example SQL code:

-- Use SHOW CACHE
SHOW CACHE;
-- This will return both a total and a percentage of cache used
-- for example: '36.35/1750.52 GB (2.08%)'

-- Use information_schema.engine_metrics_history
SELECT event_time,disk_used FROM information_schema.engine_metrics_history ORDER BY 1 DESC;
-- This will return a date/time and portion cache used
-- for example: 2024-07-02 20:33:00+00	0.021

If cache use approaches 80% after the warm-up queries have completed, this indicates the cache is getting filled up. While this will not cause any errors, some data may get evicted from the cache, affecting query performance. it may be necessary to warm-up fewer aggregating indexes. Another option would be to either increase the number of nodes or the size of existing nodes to provide more available cache and ensure the desired data stays in the cache to support normal workloads.

Full Example Code

CREATE AGGREGATING INDEX idx_playstat_playerid ON playstats(
  playerid, 
  MAX(gameid),
  MAX(stattime),
  SUM(currentscore)
);

SELECT
CHECKSUM (*)
FROM(
  SELECT
    playerid, 
    MAX(gameid),
    MAX(stattime),
    SUM(currentscore)
  FROM playstats
  GROUP BY
  ALL
) t;

SHOW CACHE;

SELECT event_time,disk_used FROM information_schema.engine_metrics_history ORDER BY 1 DESC;