What is warmup and what is it used for?
“Warmup” is Firebolt’s terminology for transferring data from the database’s remote storage (S3) to the local SSD disk (aka cache). All Firebolt data is stored persistently in S3. When data is not cached and is being queried from S3, we call such reads “cold”. In general, reading cold data will be less performant compared to queries using “warm” data, when data are loaded on the local disk. Therefore, in most use cases, it is preferred to query data that is warmed/pre-warmed to cache.
What is cache eviction and why do we need it?
The size of the instance SSD is limited, and to maintain a healthy amount of available disk space, it is necessary to automatically clean the data on the disk (aka “cache eviction”). The eviction algorithm used is LRU (Least Recently Used). This means that when the cache utilization reaches 80% of its capacity, the data that has not been ‘used’ for the longest time will be purged from the local disk first.
What is being evicted when cache eviction kicks in?
Data is warmed at the tablet level, so when data is being evicted, the whole tablet with the data stored in the cache is being evicted.
How to warm up data?
-
Run a query - when running a query, the data that is being accessed is automatically loaded into cache.
-
Manual warmup - by running a
CHECKSUM(*)
query you can warm up ranges of data. More details are explained in How to Warmup a Table.-
Warmup only the data you need and are going to use in your queries, using the right filters, to reduce costs and to use the cache efficiently.
-
Break up your warmup script into small chunks to avoid overwhelming the engine’s local SSD.
-
-
To check how much of the cache has been used while warming up, run SHOW CACHE, or look at
disk_used
column under information_schema.engine_metrics_history.