What is warmup and what is it used for?
“Warmup” is Firebolt's terminology for transferring data from the database’s remote storage (S3) to the local SSD disk (aka cache).
All Firebolt data are stored persistently in S3. When data are not cached and are being queried from S3, we call such reads “cold”. In general, reading cold data will be less performant compared to queries using “warm” data, when data are loaded on the local disk. Therefore in most use cases it is preferred to query data that is warmed/pre-warmed to cache.
What is cache eviction and why do we need it?
The size of the instance SSD is limited and in order to maintain a healthy amount of available disk space, it is necessary to automatically clean the data on the disk (“cache eviction”). The eviction algorithm used is LRU (Least Recently Used). This means that when the cache utilization reaches 80% of its capacity, the data that we have not "used" for the longest time will be purged from the local disk first.
What is being evicted when cache eviction kicks in?
Data are warmed at the tablet level, so when data are being evicted, the whole tablet with the data stored in cache is being evicted.
How to warm up data?
There are 3 ways to warm up data -
Pre-warmup - Data being warmed into the cache/memory during engine startup - explained here
Run a query - when running a query, the data that is being touched is automatically loaded into cache.
Manual warmup - by running a CHECKSUM(*) query you can warmup ranges of data - explained here.
Size your engine to have enough disk space to contain the data for the broadest query that is going to be used on that engine.
Warmup only the data you need and are going to use in your queries, using the right filters, to reduce costs and to use the cache efficiently.
Break up your warmup script into small chunks to avoid overwhelming the engine’s local SSD.
To check how much of the cache has been used so far, run SHOW CACHE.