How to set up persistent query history

This is a short guide on how to set up manual persistence of query history within Firebolt.

Today, Firebolt won't persist query history after an engine is restarted, and will also enforce a 14-day time to live (TTL) on query history, so everything older than 14 days gets automatically purged.

Some users would like to persist the full query history so they can do performance analysis, understand how their Firebolt database and engine are being used, etc.

This can be done by creating a fact table that will hold query history and then periodically (once a day for example) run insert statements to populate this table with the latest data coming from information_schema.query_history (see more here: https://docs.firebolt.io/general-reference/information-schema/query-history-view.html).

 
-- this table will hold the query_history data
-- note analyzing events over narrow time ranges is made fast by the 
-- partition by key and primary index. For efficient pruning, predicates
-- should also contain 'query_start_date'. If the customer usually queries
-- 'start_time', think about making that part of the PI/Partition By keys.
CREATE FACT TABLE IF NOT EXISTS "query_history_persisted" (  "engine_name" TEXT NOT NULL,  "engine_id" TEXT NOT NULL,  "packdb_version" TEXT NOT NULL,  "user_id" TEXT NOT NULL,  "account_id" TEXT NOT NULL,  "start_time" TIMESTAMP NOT NULL,  "end_time" TIMESTAMP NOT NULL,  "duration_usec" LONG NOT NULL,  "query_optimization_us" LONG NOT NULL,  "status" TEXT NOT NULL,  "query_id" TEXT NOT NULL,  "query_text" TEXT NOT NULL,  "error_message" TEXT NOT NULL,  "scanned_rows" LONG NOT NULL,  "scanned_bytes" LONG NOT NULL,  "scanned_bytes_cache" LONG NOT NULL,  "scanned_bytes_storage" LONG NOT NULL,  "inserted_rows" LONG NOT NULL,  "inserted_bytes" LONG NOT NULL,  "inserted_bytes_storage" LONG NOT NULL,  "spilled_bytes_compressed" LONG NOT NULL,  "spilled_bytes_uncompressed" LONG NOT NULL,  "total_ram_consumed" LONG NOT NULL,  "returned_rows" LONG NOT NULL,  "returned_bytes" LONG NOT NULL,  "query_start_date" DATE NOT NULL,  "last_inserted_ts" TIMESTAMP NOT NULL) PRIMARY INDEX "query_id", "query_start_date"
PARTITION BY EXTRACT(MONTH FROM "query_start_date");  --this part can be run periodically to get the increments--the first run will populate the historical data (backfill)INSERT INTO query_history_persistedSELECT
    "engine_name",
    "engine_id",
    "packdb_version",
    "user_id",
    "account_id",
    "start_time",
    "end_time",
    "duration_usec",
    "query_optimization_us",
    "status",
    "query_id",
    "query_text",
    "error_message",
    "scanned_rows",
    "scanned_bytes",
    "scanned_bytes_cache",
    "scanned_bytes_storage",
    "inserted_rows",
    "inserted_bytes",
    "inserted_bytes_storage",
    "spilled_bytes_compressed",
    "spilled_bytes_uncompressed",
    "total_ram_consumed",
    "returned_rows",
    "returned_bytes",
    "start_time"::date as "query_start_date", 
    NOW() as "last_inserted_ts"FROM information_schema.query_historyWHERE "start_time" > (SELECT MAX(last_inserted_ts) FROM query_history_persisted);  --sample query, could also be used as part of the COPY command to export this to S3 for exampleselect * from query_history_persisted where "query_start_date" = now()::date and "query_text" ilike 'insert%';