Introduction
Optimizing ingestion workloads in Firebolt requires a deep understanding of engine observability metrics. By leveraging these metrics, you can monitor resource utilization, optimize query performance, and scale engines effectively. By the end of this article, you will know how to monitor and tune your ingestion workloads using Firebolt’s engine observability metrics.
To read through our internal benchmark, see Engine Sizing for Simple Ingestion.
TL;DR
-
Use
engine_metrics_history
to monitor CPU and RAM usage. -
Start with the smallest node type and number of nodes, then adjust based on metrics.
-
Scale up by increasing node size if CPU or RAM usage is high.
-
Scale out by adding more nodes to parallelize workload.
-
Utilize
engine_running_queries
for real-time query insights.
Step-by-Step Guide
All the example SQL code uses the Ultra Fast Gaming data set. To familiarize yourself with this data set, visit this link: Ultra Fast Gaming Firebolt Sample Dataset.
Step 1: Start with the Smallest Node Type and Number of Nodes
Begin by configuring your Firebolt engine with the smallest possible node type and the smallest number of nodes. This approach ensures that you are utilizing resources efficiently from the start.
Example SQL code:
-- Create an engine with the smallest node type and number of nodes
CREATE ENGINE my_small_engine TYPE = 'S' NODES = 1;
Step 2: Monitor CPU and RAM Usage
As soon as you start running your workloads, monitor the CPU and RAM usage using the engine_metrics_history
view. These metrics are crucial for understanding whether your current configuration can handle the workload.
Example SQL code:
-- Retrieve CPU and RAM usage metrics
SELECT engine_cluster, event_time, cpu_used, memory_used
FROM information_schema.engine_metrics_history
WHERE event_time > CURRENT_DATE - INTERVAL '1 day';
Step 3: Adjust Node Size if Resource Utilization is High
If you notice high CPU or RAM usage (specific thresholds may vary depending on your workload), scale up by increasing the node size. This helps in providing more resources to handle the workload efficiently.
Example SQL code for scaling up:
-- Scale up the engine to the next larger node type
ALTER ENGINE my_small_engine SET TYPE = 'M';
Step 4: Scale Up Incrementally
If resource utilization remains high after scaling up, continue to increase the node size incrementally. Monitor the metrics after each adjustment to determine if further scaling is necessary.
Example SQL code for further scaling up:
-- Further scale up the engine if needed
ALTER ENGINE my_small_engine SET TYPE = 'L';
Step 5: Parallelize by Adding More Nodes
To parallelize the workload and process more files simultaneously, add more nodes to your engine configuration. This approach can significantly enhance ingestion time.
Example SQL code for scaling out:
-- Scale out the engine by adding more nodes
ALTER ENGINE my_small_engine SET NODES = 2;
Step 6: Utilize Engine Running Queries for Real-Time Insights
Use the engine_running_queries
view to get real-time insights into the currently running queries. This helps in identifying and troubleshooting slow or stuck queries.
Example SQL code:
-- Retrieve real-time running queries
SELECT query_id, status, start_time
FROM information_schema.engine_running_queries;
Full Example Code
-- Create an engine with the smallest node type and number of nodes
CREATE ENGINE my_small_engine TYPE = 'S' NODES = 1;
-- Retrieve CPU and RAM usage metrics
SELECT engine_cluster, event_time, cpu_used, memory_used
FROM information_schema.engine_metrics_history
WHERE event_time > CURRENT_DATE - INTERVAL '1 day';
-- Scale up the engine to the next larger node type
ALTER ENGINE my_small_engine SET TYPE = 'M';
-- Further scale up the engine if needed
ALTER ENGINE my_small_engine SET TYPE = 'L';
-- Scale out the engine by adding more nodes
ALTER ENGINE my_small_engine SET NODES = 2;
-- Retrieve real-time running queries
SELECT query_id, status, start_time
FROM information_schema.engine_running_queries;