Introduction
Generating a series of dates and times is a common requirement in data analysis and reporting, especially when dealing with time series data. Firebolt provides powerful functions to create such series efficiently. By the end of this article, you will know how to generate a continuous series of dates and times for various intervals and use cases using Firebolt SQL syntax.
TL;DR
-
Use
GENERATE_SERIES
to create a sequence of numbers. -
Combine
GENERATE_SERIES
withDATE_ADD
to generate dates and times. -
Customize the start date, start time, and interval as needed.
-
Integrate the generated date and time series into your queries.
-
If needed, store the generated series in an array.
Step-by-Step Guide
All the example SQL code uses the Ultra Fast Gaming data set. To familiarize yourself with this dataset, visit this link: Ultra Fast Gaming Data Set.
Step 1: Generate a Sequence of Numbers
The first step in creating a date and time series is to generate a sequence of numbers using the GENERATE_SERIES function. This function creates a series of integers that we can later transform into dates and times.
Example SQL code:
-- Generate a series of numbers from 1 to 10
SELECT n FROM GENERATE_SERIES(1, 10) s(n);
In this query, GENERATE_SERIES(1, 10)
generates numbers from 1 to 10. The alias s(n)
assigns these numbers to a column n
.
Step 2: Add the Generated Numbers to a Start Date and Time
Next, use the DATE_ADD function to convert the generated numbers into dates and times. You specify the start date and time, and the interval (e.g., days, hours, minutes) to add to this date and time.
Example SQL code:
-- Generate a date series starting from 2023-01-01 for 10 days
SELECT DATE_ADD('day', n, '2023-01-01'::DATE) AS date_series FROM GENERATE_SERIES(0, 9) s(n);
This query generates a series of dates starting from 2023-01-01
, adding each number in the series to this start date.
To generate a time series, modify the interval to hours, minutes, or seconds:
Example SQL code:
-- Generate a time series starting from 2023-01-01 00:00:00 for 10 hours
SELECT DATE_ADD('hour', n, '2023-01-01 00:00:00'::TIMESTAMP) AS time_series FROM GENERATE_SERIES(0, 9) s(n);
In this query, each generated timestamp is one hour apart, starting from 2023-01-01 00:00:00
.
Step 3: Generate a Custom Interval Date and Time Series
You can customize the interval by changing the interval parameter in the DATE_ADD
function. This allows you to create date and time series with different intervals, such as minutes, hours, days, or months.
Example SQL code:
-- Generate a date series with a weekly interval starting from 2023-01-01 for 10 weeks
SELECT DATE_ADD('week', n, '2023-01-01'::DATE) AS date_series FROM GENERATE_SERIES(0, 9) s(n);
In this query, each generated date is one week apart, starting from 2023-01-01
.
Example SQL code:
-- Generate a time series with a 15-minute interval starting from 2023-01-01 00:00:00 for 10 intervals
SELECT DATE_ADD('minute', n * 15, '2023-01-01 00:00:00'::TIMESTAMP) AS time_series FROM GENERATE_SERIES(0, 9) s(n);
In this query, each generated timestamp is 15 minutes apart, starting from 2023-01-01 00:00:00
.
Step 4: Integrate the Date and Time Series into Your Queries
You can integrate the generated date and time series into your data analysis queries. This is particularly useful for time series analysis and reporting.
Example SQL code:
-- Integrate the date series into a query to count stats per day for a player
WITH date_series AS (
SELECT DATE_ADD('day', n, '2021-01-01'::DATE) AS date_series
FROM GENERATE_SERIES(0, 9) s(n))
SELECT ds.date_series, COUNT(p.*) AS stat_count
FROM date_series ds
LEFT JOIN playstats p ON ds.date_series = p.stattime::DATE
AND playerid = 5416
GROUP BY ds.date_series
ORDER BY ds.date_series;
In this query, the generated date series is used to count the number of statistics collected for a player per day, providing a comprehensive view of player activity.
Example SQL code:
WITH time_series AS (
SELECT DATE_ADD('hour', n, '2021-01-02 00:00:00'::TIMESTAMP) AS time_series
FROM GENERATE_SERIES(0, 23) s(n))
SELECT ts.time_series, COUNT(p.*) AS stat_count
FROM time_series ts
LEFT JOIN playstats p ON ts.time_series = p.stattime::DATE
AND playerid = 5416
GROUP BY ts.time_series
ORDER BY ts.time_series;
In this query, the generated date series is used to count the number of statistics collected for a player per hour, providing a comprehensive view of player activity.
Step 5: Return Date and Time Series as an Array
If needed, the generated date and time series can be returned as an array. This can be done with the ARRAY_AGG function.
Example SQL code:
-- Store the the date series into an array
SELECT array_agg(date_series) FROM (
SELECT DATE_ADD('day', n, '2021-01-01'::DATE) AS date_series
FROM GENERATE_SERIES(0, 9) s(n));
Full Example Code
SELECT n FROM GENERATE_SERIES(1, 10) s(n);
SELECT DATE_ADD('day', n, '2023-01-01'::DATE) AS date_series FROM GENERATE_SERIES(0, 9) s(n);
SELECT DATE_ADD('hour', n, '2023-01-01 00:00:00'::TIMESTAMP) AS time_series FROM GENERATE_SERIES(0, 9) s(n);
SELECT DATE_ADD('week', n, '2023-01-01'::DATE) AS date_series FROM GENERATE_SERIES(0, 9) s(n);
SELECT DATE_ADD('minute', n * 15, '2023-01-01 00:00:00'::TIMESTAMP) AS time_series FROM GENERATE_SERIES(0, 9) s(n);
WITH date_series AS (
SELECT DATE_ADD('day', n, '2021-01-01'::DATE) AS date_series
FROM GENERATE_SERIES(0, 9) s(n))
SELECT ds.date_series, COUNT(p.*) AS stat_count
FROM date_series ds
LEFT JOIN playstats p ON ds.date_series = p.stattime::DATE
AND playerid = 5416
GROUP BY ds.date_series
ORDER BY ds.date_series;
WITH time_series AS (
SELECT DATE_ADD('hour', n, '2021-01-02 00:00:00'::TIMESTAMP) AS time_series
FROM GENERATE_SERIES(0, 23) s(n))
SELECT ts.time_series, COUNT(p.*) AS stat_count
FROM time_series ts
LEFT JOIN playstats p ON ts.time_series = p.stattime::DATE
AND playerid = 5416
GROUP BY ts.time_series
ORDER BY ts.time_series;
SELECT array_agg(date_series) FROM (
SELECT DATE_ADD('day', n, '2021-01-01'::DATE) AS date_series
FROM GENERATE_SERIES(0, 9) s(n));