When ingesting data into Firebolt incrementally, it is common practice to use a query which checks the SOURCE_FILE_TIMESTAMP
to ensure that only new files are selected for ingest, e.g.:
INSERT INTO incremental_table
SELECT
*,
source_file_name,
source_file_timestamp
FROM ex_incremental_table
WHERE source_file_timestamp > (SELECT MAX(source_file_timestamp) FROM incremental_table);
To improve performance of the WHERE
statement, we can add a very small aggregating index which will hold the MAX(source_file_timestamp)
value:
CREATE AGGREGATING INDEX ix_agg_incremental_table_max_source_file_timestamp
ON incremental_table
(
MAX(source_file_timestamp)
);
The size of the aggregating index will be extremely small and reduce the need for Firebolt to calculate this from the base table for your ingestion workloads.