How to Use NULLIF to CAST Empty Strings

Introduction

When dealing with source data provided in text files, columns containing date, time, or numeric data are often stored as text. A common process in data engineering is to CAST these columns to their appropriate data types for storage in a database. However, if these columns contain empty strings, the CAST operation will fail. By the end of this article, you will know how to handle empty strings when casting columns to various data types in Firebolt using the NULLIF function, ensuring successful casting to the desired data type.

TL;DR

  • Empty strings can cause errors during CAST operations.

  • Use the NULLIF function to convert empty strings to NULL.

  • NULL values can be successfully cast to the desired data type.

Step-by-Step Guide

Step 1: Understanding the Issue with Empty Strings

When data is ingested as text, some columns might contain empty strings instead of NULLs. Attempting to cast these empty strings directly to another data type (e.g., DATE, NUMERIC) results in an error.

Example SQL code:

CREATE EXTERNAL TABLE IF NOT EXISTS 
    tournaments_nullif_example (
  dt TEXT NULL
  ) 
  URL = 's3://firebolt-sample-datasets-public-us-east-1/gaming/csv/invalid_formats/'  OBJECT_PATTERN = '*generic_nullif_example.csv'  
TYPE = (CSV SKIP_HEADER_ROWS = TRUE);

CREATE FACT TABLE IF NOT EXISTS 
  tournaments_nullif_example_fact
  (dt DATE NULL);

INSERT INTO tournaments_nullif_example_fact
SELECT 
  dt
  FROM tournaments_nullif_example;

The above will fail, as '' cannot be casted to date. The Error message would be:Unable to cast TEXT '' to date. Invalid date constant: Use the ISO 8601 format YYYY-[M]M-[D]D

Step 2: Using NULLIF to Handle Empty Strings

The NULLIF function converts empty strings to NULL, which can then be safely cast to the desired data type.

Example SQL code:

CREATE EXTERNAL TABLE IF NOT EXISTS 
    tournaments_nullif_example (
  dt TEXT NULL
  ) 
  URL = 's3://firebolt-sample-datasets-public-us-east-1/gaming/csv/invalid_formats/'  OBJECT_PATTERN = '*generic_nullif_example.csv'  
TYPE = (CSV SKIP_HEADER_ROWS = TRUE);

CREATE FACT TABLE IF NOT EXISTS 
  tournaments_nullif_example_fact
  (dt DATE NULL);

INSERT INTO tournaments_nullif_example_fact
SELECT 
 NULLIF(dt,'')::date
  FROM tournaments_nullif_example;