What causes "Unable to cast 'TEXT' to xxx target data type..." errors when selecting from external tables

FireboltAutomations · November 22, 2022, 6:54am

This error can be encountered in several different activities with an external table that is based on CSV or TSV files:

SELECTing FROM an external table
INSERTing INTO a table FROM an external table
CREATing a table AS SELECT FROM an external table

The error means that Firebolt was unable to convert the data found for one or more columns in one or more of the rows in the file to the column data type defined in the external table DDL.

Example

/************/* Table DDL*************/CREATE EXTERNAL TABLE my_external_table(    c_id INT,    c_sum LONG)CREDENTIALS = (AWS_KEY_ID = '*****' AWS_SECRET_KEY = '******')URL = 's3://bucket/'OBJECT_PATTERN= '*.csv'TYPE = (CSV)/************/* FileA.csv*************/100,2001000,2000/************/* FileB.csv*************/100,2001000,"abc"/************/* FileC.csv*************/"c_id","c_name"100,2001000,2000

In the above example, the following would happen:

FileA.csv can be accessed with no errors
FileB.csv will error due to the "abc" data in column c_sum in the second row. Firebolt will be unable to implicitly convert "abc" to a numeric value
FileC.csv will error due to the header row that is not being skipped. If the files have header rows, the DDL TYPE should include parameter to skip the header row(s):
- TYPE = (CSV SKIP_HEADER_ROWS=1)

troubleshooting tip

It is necessary to use a text editor to review the data in the files to identify the mismatch in data types. Most of the time, it will be obvious by looking at the first few rows in the file to see what the issue is, and the most common resolution will be to make a simple change to the DDL to match the file contents.

However, it only takes 1 column in 1 row in the entire set of files to cause this error. If it is not quickly obvious in the first few rows, it can be helpful to narrow down the row(s) that may have the issue by using SELECT...LIMIT. Using the table in the example above, run this query multiple times and increase the OFFSET values to slice up the files and track down where the problematic rows are located. SOURCE_FILE_NAME is an auto-generated column and will help in identify which file the data is in (if there are multiple files).

SELECT SOURCE_FILE_NAME,COUNT(*) FROM     (SELECT *,SOURCE_FILE_NAME FROM my_external_table LIMIT 10000 OFFSET 0)GROUP BY source_file_name;....<repeat if no error>....SELECT SOURCE_FILE_NAME,COUNT(*) FROM     (SELECT *,SOURCE_FILE_NAME FROM my_external_table LIMIT 10000 OFFSET 10000)GROUP BY source_file_name

Topic		Replies	Views
How to resolve "Unable to cast" errors during CSV Ingestion Help Center	4	425	June 20, 2024
Troubleshooting CSV file ingestion Help Center	0	392	September 1, 2023
How to optimize query performance by choosing the right Primary Index in Firebolt	2	56	July 15, 2024
How to Rename a Table in Firebolt	2	151	August 2, 2024
Welcome to Firebolt Help Center Site Feedback	0	619	May 18, 2022

What causes "Unable to cast 'TEXT' to xxx target data type..." errors when selecting from external tables

Example

troubleshooting tip

Related topics