Introduction
When dealing with source data provided in text files, columns containing date, time, or numeric data are often stored as text. A common process in data engineering is to CAST these columns to their appropriate data types for storage in a database. However, if these columns contain empty strings, the CAST
operation will fail. By the end of this article, you will know how to handle empty strings when casting columns to various data types in Firebolt using the NULLIF
function, ensuring successful casting to the desired data type.
TL;DR
-
Empty strings can cause errors during
CAST
operations. -
Use the NULLIF function to convert empty strings to
NULL
. -
NULL
values can be successfully cast to the desired data type.
Step-by-Step Guide
Step 1: Understanding the Issue with Empty Strings
When data is ingested as text, some columns might contain empty strings instead of NULLs. Attempting to cast these empty strings directly to another data type (e.g., DATE
, NUMERIC
) results in an error.
Example SQL code:
CREATE EXTERNAL TABLE IF NOT EXISTS
tournaments_nullif_example (
dt TEXT NULL
)
URL = 's3://firebolt-sample-datasets-public-us-east-1/gaming/csv/invalid_formats/' OBJECT_PATTERN = '*generic_nullif_example.csv'
TYPE = (CSV SKIP_HEADER_ROWS = TRUE);
CREATE FACT TABLE IF NOT EXISTS
tournaments_nullif_example_fact
(dt DATE NULL);
INSERT INTO tournaments_nullif_example_fact
SELECT
dt
FROM tournaments_nullif_example;
The above will fail, as ''
cannot be casted to date. The Error message would be:Unable to cast TEXT '' to date. Invalid date constant: Use the ISO 8601 format YYYY-[M]M-[D]D
Step 2: Using NULLIF to Handle Empty Strings
The NULLIF
function converts empty strings to NULL
, which can then be safely cast to the desired data type.
Example SQL code:
CREATE EXTERNAL TABLE IF NOT EXISTS
tournaments_nullif_example (
dt TEXT NULL
)
URL = 's3://firebolt-sample-datasets-public-us-east-1/gaming/csv/invalid_formats/' OBJECT_PATTERN = '*generic_nullif_example.csv'
TYPE = (CSV SKIP_HEADER_ROWS = TRUE);
CREATE FACT TABLE IF NOT EXISTS
tournaments_nullif_example_fact
(dt DATE NULL);
INSERT INTO tournaments_nullif_example_fact
SELECT
NULLIF(dt,'')::date
FROM tournaments_nullif_example;