Query Diagnosis and Troubleshooting


If a query is not running as fast as desired, there are several different areas that can be investigated for bottlenecks or restrictions. This blog post describes a comprehensive approach to debugging/optimizing queries.

Following are additional resources/ideas for diagnosing performance issues.


The EXPLAIN plan for a query shows how the data will be accessed and possibly indicates opportunities to improve query performance by changing index configurations, joins, query structure, and more, before running the query. From this, you may see if one or more of these refinements could improve performance:

  • Change a primary index to better support the query pattern.

  • Add more filters to the WHERE and JOIN clauses, even if they seem redundant.

    • Additional criteria give additional flexibility for Firebolt to approach the query differently, for the most aggressive filtering.  This is especially true in outer joins.

  • Create join indexes.

  • Create aggregating indexes.

  • Change the query to use deep filters.

  • Use hashing (CITY_HASH) on JOIN or WHERE columns.

    • Large strings can decrease performance, and in general, we prefer to work with numbers as much as possible. If you have a large string (more than 20 characters), it can be converted to a hash value (either at load time or at query time) and then you can use the compact hash number vs the large string in as many operations as possible.

    • CITY_HASH can be used to generate a single hash out of compound columns, and reduce multiple joins keys into one:

      • FROM table_a a INNER JOIN table_b b
        ON CITY_HASH(a.a,a.b,a.c,a.d) = CITY_HASH(b.a,b.b,b.c,b.d)

  • Materialize CTEs

Query statistics in query history

Query history contains information about what resources were used and how much time was spent in various steps of query execution. This information can provide insight on where a query was spending its time.

Some key metrics for a query can be retrieved from query history using a query like this:

SELECT    start_time    , end_time    , status    , duration_usec/1000000 as duration_secs    , time_in_queue_ms    , query_optimization_us/1000 as q_opt_ms    , query_id        , query_text    , error_message    , round(total_ram_consumed/1000000000,3) as total_ram_gb    , round(scanned_bytes/1000000000,3) as scanned_gb    , round(scanned_bytes_cache/1000000000,3) as scanned_cache_gb    , round(scanned_bytes_storage/1000000000,3) as scanned_storage_gb            , scanned_rows    , inserted_rows    , user_idFROM information_schema.query_historyWHERE LOWER(query_text) not like '%query_history%'AND LOWER(query_text) not like '%running_queries%'AND lower(query_text) like '%<an identifying text pattern for the query%'AND status != 'STARTED_EXECUTION'AND start_time BETWEEN     '2022-04-21 15:49:00' and '2022-04-21 15:51:00'  --UTC time range for queryorder by start_time

Some of the more performance important columns are:

  • round(scanned_bytes_storage/1000000000,3) as scanned_storage_gb

    • A high number here indicates that the data was not warm when the query was run. The data needed by the query was not available in memory or SSD cache, and had to be retrieved from F3, which is slower that the other two options.

  • scanned_rows

    • A high number here means that a lot of rows were not filtered out prior to retrieval. This could indicate the need for better filters on the query, adjustments to the primary index, or creation of join indexes or aggregating indexes.

  • query_optimization_us/1000 as q_opt_ms

    • A high number here indicates a complex query that takes time to compile and be prepared for execution. This could indicate the need for intermediate tables or materialized CTEs.

  • round(total_ram_consumed/1000000000,3) as total_ram_gb

    • A high number here indicates the query required a lot of memory, possibly due to aggregations or ordering data. This can impact how much memory is available for concurrent queries and result in over slower queries. This could indicate the need for better filters on the query, adjustments to the primary index, or creation of join indexes or aggregating indexes.

Query execution history

If a query has been executed over a long period of time, and recently seems to be running slower, it can be useful to increase the time range looked at in query history - perhaps for a week or a month.

Once the query execution information is retrieved, look for changes over time

  • Changes in scanned_rows or total_ram_gb indicate there is more data being read than in the past, or if there are particularly "heavy" versions of the query. It may be necessary to increase the size of the engine, or consider if new indexes or query filters would make a difference now that there is more data than in the past.

  • Changes in scanned_storage_gb can indicate more data is being read, and the SSD cache needs to be increased. It could also indicate the warm-up method or warmup queries are not bringing the needed data into cache when the engine is started.

Query context

Other queries concurrently running on the engine can impact a query. It is useful to know how busy or under load an engine was at the time of query execution. The Firebolt Query History Dashboard can provide an overview of system activity at a given time to understand system load at the time the query ran.