How to Troubleshoot Data Being Read from Cold Storage in Firebolt

Introduction

Warmup strategies in Firebolt are essential for ensuring that data queries are quick and efficient by keeping frequently accessed data in the cache. However, sometimes data might still be read from cold storage, leading to slower query performance. By the end of this article, you will know how to troubleshoot and resolve issues related to data being read from cold storage.

TL;DR

  • Understand the role of data warmup.

  • Check for cold reads using engine_query_history and scanned_bytes_storage.

  • Check cache usage with SHOW CACHE.

  • Resolve issues by performing a warmup or resizing the engine.

Step-by-Step Guide

Step 1: Understand Warmup Concepts

Warmup is a process to preload frequently accessed data into the cache to ensure faster query performance. Review Understanding warmup and eviction for more details on how it works.

Step 2: Check for Cold Reads Using scanned_bytes_storage

If you have a slow performing query after the engine has been in use for a while or after a warmup process has been executed, check if cold reads are occurring. Use the scanned_bytes_storage metric in the engine_query_history Information_schema table. This metric indicates how many bytes were read from cold storage. A value greater than 0 indicates not all the needed data was present in the cache. This query can be adjusted to find specific queries or tables if there are specific concerns.

Example SQL code:

-- Check for cold reads
SELECT query_id, query_text, start_time, scanned_bytes_storage, query_text
FROM Information_schema.engine_query_history
WHERE start_time BETWEEN date_add('HOUR',-1,NOW()) AND NOW() 
AND scanned_bytes_storage > 0
-- AND query_text ILIKE '%<table_name or portion of query.%'
ORDER BY start_time DESC;

Step 3: Check Cache Usage

If the query returns results, some queries are not getting their data from cache. The next step is to check the cache to identify potential cache eviction issues. Use the SHOW CACHE command to inspect the cache status. This return both absolute storage and a percentage of available storage, e.g. 20.94/1750.52 GB (1.20%).

Example SQL code:

-- Check cache usage
SHOW CACHE;

Step 4: Resolve by Doing a Warmup or Resizing the Engine

If SHOW CACHE returns a value around 80%, then the cache is full, and the cold reads are most likely due to not enough cache to store necessary data. This would indicate the engine size should be increased, either by using larger nodes or increasing the number of nodes.

If SHOW CACHE returns a value of less than 80%, then the data needed by the query was not previously loaded. Perhaps the table was not warmed up at all, or partially warmed up, and the query was unexpectedly needing data not included in the warmup process. The options here are to either let the data come into cache as it is needed, or run a warmup query to bring the data into cache

Full Example Code

SELECT query_id, query_text, start_time, scanned_bytes_storage, query_text
FROM Information_schema.engine_query_history
WHERE start_time BETWEEN date_add('HOUR',-1,NOW()) AND NOW() 
AND scanned_bytes_storage > 0
-- AND query_text ILIKE '%<table_name or portion of query.%'
ORDER BY start_time DESC;

SHOW CACHE;