How to Troubleshoot Data Being Read from Cold Storage in Firebolt

Introduction

Warmup strategies in Firebolt are essential for ensuring that data queries are quick and efficient by keeping frequently accessed data in the cache. However, sometimes data might still be read from cold storage, leading to slower query performance. By the end of this article, you will know how to troubleshoot and resolve issues related to data being read from cold storage despite warmup efforts.

TL;DR

  • Understand the role of data warmup.
  • Check for cold reads using engine_query_history and scanned_bytes_storage .
  • Check cache usage with SHOW CACHE .
  • Resolve issues by performing a warmup or resizing the engine.

Step-by-Step Guide

Step 1: Understand Warmup Concepts

Warmup is a process to preload frequently accessed data into the cache to ensure faster query performance. Review Understanding warmup and eviction for more details on how it works.

Step 2: Check for Cold Reads Using scanned_bytes_storage

After the engine has been in use for a while or after a warmup process has been executed, and you have a slow performing query, check if cold reads are occurring. Use the scanned_bytes_storage metric in the engine_query_history Information_schema table. This metric indicates how many bytes were read from cold storage. A value greater than 0 indicates not all the needed data was present in the cache. This query can be adjusted to find specific queries or tables if there are specific concerns.

Example SQL code:

-- Check for cold reads
    SELECT query_id, query_text, start_time, scanned_bytes_storage, query_text
    FROM Information_schema.engine_query_history
    WHERE start_time BETWEEN date_add('HOUR',-1,NOW()) AND NOW() 
    AND scanned_bytes_storage > 0
    -- AND query_text ILIKE '%<table_name or portion of query.%'
    ORDER BY start_time DESC;

Step 3: Check Cache Usage

If the query returns results, some queries are not getting their data from
cache. The next step is to check the cache to identify potential cache
eviction
issues. Use the SHOW CACHE command to inspect the cache status. This return both absolute storage and a percentage of available storage, e.g. 20.94/1750.52 GB (1.20%) .

Example SQL code:

-- Check cache usage
    SHOW CACHE;

Step 4: Resolve by Doing a Warmup or Resizing the Engine

If SHOW CACHE returns a value around 80%, then the cache is full, and the cold reads are most likely due to not enough cache to store necessary data. This would indicate the engine size should be increased, either by using larger nodes or increasing the number of nodes.

If SHOW CACHE returns a value of less than 80%, then the data needed by the query was not previously loaded. Perhaps the table was not warmed up at all, or partially warmed up, and the query was unexpectedly needing data from further in the past than the warmup process allowed for. The options here are to either let the data come into cache as it is needed, or run a warmup query to bring the data into cache.

Full Example Code

    SELECT query_id, query_text, start_time, scanned_bytes_storage, query_text
    FROM Information_schema.engine_query_history
    WHERE start_time BETWEEN date_add('HOUR',-1,NOW()) AND NOW() 
    AND scanned_bytes_storage > 0
    -- AND query_text ILIKE '%<table_name or portion of query.%'
    ORDER BY start_time DESC;
    
    SHOW CACHE;