How to Use the VACUUM Command in Firebolt for Optimizing Your Database

Introduction

In normal SQL operations, tuples that are deleted or made obsolete by an update are not physically removed from their table; they remain present until a VACUUM is done. Also, tables that have frequent, small inserts will have a large number of small tablets that are less efficient to access. Therefore, it’s necessary to execute VACUUM periodically on such tables to reclaim storage space and optimize table access by removing deleted or outdated data and combining smaller tablets into more efficient larger ones. NOTE: deleted data is marked for deletion, and the physical storage recovered over several days by a garbage collection process. By the end of this article, you will know how to effectively use the VACUUM command in Firebolt to keep your database optimized.

TL;DR

  • The VACUUM command in Firebolt helps reclaim storage space by removing deleted data and merging smaller tablets into larger ones.

  • The process involves identifying target tables, executing the command, and verifying the results.

Step-by-step Guide

All the example SQL code uses the Ultra Fast Gaming data set. To familiarize yourself with this data set, visit this link: Ultra Fast Gaming Sample Dataset.

Step 1: Identify the Target Tables

Before running the VACUUM command, identify the tables that need to be vacuumed. These are usually tables that undergo frequent updates or deletions.

Example SQL code:

-- FOR TESTING ONLY: delete a number of rows to cause fragmentation
DELETE FROM players WHERE playerid < 2000;

-- Query to identify tables that are candidates for VACUUM

SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE 
-- if the table has a large number of tablets that are small enough to be merged
(number_of_tablets > 5000 AND ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) < 10)
-- if the table has a significant number of deleted/unused tablets
OR fragmentation > 0.1
-- if the average size of the tablets is small and there is enough data to merge
OR ((ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2)) < 2 AND (compressed_bytes/1024.0/1024.0/1024.0) > 50);

Step 2: Run VACUUM

Identify a time to execute the VACUUM command. VACUUM does not block DML or DQL, but it does use compute and storage resources. On large tables, it can run for an extended time.

After executing the VACUUM command, verify that the operation has successfully optimized the table. You can check the table before and after running the VACUUM command to ensure it has the desired effect.

Example SQL code:

-- Check the table statistics before VACUUM
SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE table_name = 'players';

-- Execute the VACUUM command
VACUUM players;

Step 3: Monitor the query

The VACUUM query can run for an extended period of time if the table is large. While it is running you can monitor it in several ways:

UI Statistics Tab

In the window where the query is running, click on the Statistics tab to see information about the query while it executes:

engine_running_queries

Run this query in a separate tab to see the running query and its statistics:

SELECT * FROM information_schema.engine_running_queries WHERE query_text ilike 'vacuum%';

engine_query_history

Once the query completes, you can see information about it in engine_query_history:

SELECT * FROM information_schema.engine_query_history WHERE query_text ilike 'vacuum%';

Step 4: Verify the results

Once the query completes, run a query on information_schema.tables and compare the results to what was returned for the same query before VACUUM was run. The number of tablets and fragmentation should be smaller, and the avg_table_size_gb should be larger.

-- Check the table statistics after VACUUM

SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE table_name = 'players';

Full Example Code

DELETE FROM players WHERE playerid < 2000;

SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE 
-- if the table has a large number of tablets that are small enough to be merged
(number_of_tablets > 5000 AND ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) < 10)
-- if the table has a significant number of deleted/unused tablets
OR fragmentation > 0.1
-- if the average size of the tablets is small and there is enough data to merge
OR ((ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2)) < 2 AND (compressed_bytes/1024.0/1024.0/1024.0) > 50);
SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE table_name = 'players';

VACUUM players;

SELECT * FROM information_schema.engine_running_queries WHERE query_text ilike 'vacuum%';

SELECT * FROM information_schema.engine_query_history WHERE query_text ilike 'vacuum%';


SELECT table_name, number_of_rows, number_of_tablets, fragmentation, 
  ROUND((compressed_bytes/1024.0/1024.0/1024.0)/number_of_tablets,2) AS avg_tablet_size_gb
FROM information_schema.tables
WHERE table_name = 'players';