Getting Started with Firebolt Engines

FireboltAutomations · November 8, 2023, 2:43pm

TL;DR

Start with a dev database for testing without affecting production.
Estimate memory needs for your largest file plus overhead.
Monitor CPU delays and adjust capacity as needed.
Choose the right instance family for your workload.
Scale up before out for better performance.

General Best Practices

Engine Sizing: Tailor engine size to your specific needs by testing various configurations to find a cost-effective balance for all workloads.
Monitoring: Employ information_schema.query_history to track engine utilization, including memory and CPU usage.
Scaling Strategies:

For ingestion: Adjust based on file size and quantity—larger nodes for big files, more nodes for multiple small files, but avoid exceeding the file count.
For querying: Favor fewer but larger nodes to boost performance and minimize result merging tasks.

Separate Engines: While a single engine can handle both ingestion and analytics, separate them if you face large ingestion volumes or need to manage query performance during intensive ingestion tasks.

Choosing the Right Instance Family

Select an instance family based on your specific workload requirements:

Memory Optimized (r series): Ideal for memory-intensive operations with many joins and aggregations.
CPU-Optimized (c series): Best for CPU-heavy tasks with extensive filtering and high concurrency needs.
Balanced (m series): A good middle ground for workloads requiring both CPU and memory resources.
Storage/Cache-Optimized (i series): Choose when a large cache is necessary to maintain performance, especially when data doesn't fit in other nodes' memory.

Ingestion Engine Tuning

Initial Setup

Development Database: Initiate by establishing a development database to serve as a sandbox for testing, ensuring your production database remains unaffected by test ingestions.

Memory Management

Memory Requirements: Estimate memory based on the uncompressed size of your largest file plus an extra 15-20% for operational overhead.
Scaling for Memory: Start with engines that have a large memory capacity, possibly double your estimated needs, to identify the maximum memory requirements safely.

File and CPU Handling

Small Files: Leverage more nodes to take advantage of parallel processing for numerous small files.
Large Files: Opt for larger nodes to handle fewer large files, rather than increasing the node count.
CPU Tuning: Monitor the cpu_delay_us metrics through information_schema.query_history. If you encounter high delays, boost CPU capacity to alleviate strain from intensive operations like joins or aggregations.

Analytics Engine Tuning

Performance Optimization

Indexing: Ensure robust and effective indexing for optimized engine performance.
Engine Variety: Utilize different engine types for diverse query demands to ensure data consistency and prevent performance bottlenecks.

Engine Configuration

Instance Families: Select from Memory Optimized, Compute Optimized, Balanced, or Storage Optimized to suit specific workload demands.
Node Types: Generally, larger nodes offer better performance—scale up before scaling out.

Concurrency and Isolation

Separate Engines: For significant, infrequent ingestion tasks or when managing workload concurrency, use distinct engines for ingestion and analytics to avoid performance dips during heavy ingestion periods.

Conclusion

To sum up, effective Firebolt engine configuration hinges on understanding your data demands and performance goals. Through strategic sizing, monitoring, and instance selection, you can craft a robust data engine setup that balances efficiency, cost, and scalability

Topic		Replies	Views
Engine Sizing for Simple Ingestion	10	304	July 16, 2024
Tips for a successful Firebolt evaluation	4	337	August 7, 2024
Welcome to Firebolt Help Center Site Feedback	0	626	May 18, 2022
Query Diagnosis and Troubleshooting Help Center	4	543	August 3, 2024
How to optimize query performance by choosing the right Primary Index in Firebolt	2	73	July 15, 2024