Troubleshooting CSV file ingestion

FireboltAutomations · September 1, 2023, 5:19pm

The Symptom

Sometimes when trying to ingest a CSV into an external table, the ingestion might fail. You'll see an error like this when querying the external table:

Cannot parse input: expected '|' but found '<CARRIAGE RETURN>' instead.: While handling s3 object 'some_bucket/some_folder/some_file.csv'

The above error indicates that the file delimiter in not what is defined in the table definition, or that the file contains a different number of columns than what is defined in the external table.

Troubleshooting

Check the file delimiter and make sure the FIELD_DELIMITER is defined correctly for what is in your CSV.
Take a look at the file and compare it column-by-column to the external table definition.
If the file is too big to easily download or look at, then you can try setting up a special external table definition that lets you look at the rows in the CSV file as a single string:

DROP TABLE IF EXISTS ext_tmp;CREATE EXTERNAL TABLE ext_tmp(blob text)URL = 's3://some_bucket/somefolder/'OBJECT_PATTERN = '*' TYPE = (CSV FIELD_DELIMITER=' ');-- In this case, what looks like a space for a column delimiter is actually ascii(255).  On a windows machine, you can create this character by holding the alt key and typing 255 on your numeric keypad.-- The trick is to use a character that will not appear in the normal data of the file. You could use another character for the delimeter, such as TYPE = (CSV FIELD_DELIMITER='^')-- (blob text) is just a column named blob with type text. You could replace it with any column name you want.-- Once this external table is created, there will be one field called “blob” in each row.

Once the table is created, you can run queries like this to look at the data and compare to the original external table definition.

-- shows the header row and first row of data
SELECT * FROM ext_tmp LIMIT 2;

-- shows how many delimiters are in each row, and orders the result by the count.  This can show if there are rows with different numbers of delimiters, which is not allowed
SELECT LENGTH(SPLIT(‘|’, blob)), blob FROM ext_tmp
ORDER BY LENGTH(SPLIT(‘|’, blob)) desc;

Topic		Replies	Views
What causes "Unable to cast 'TEXT' to xxx target data type..." errors when selecting from external tables Help Center	0	378	November 22, 2022
How to resolve "Unable to cast" errors during CSV Ingestion Help Center	4	456	June 20, 2024
Engine Sizing for Simple Ingestion	10	304	July 16, 2024
Simplifying SQL query analysis with EXPLAIN (ANALYZE)	0	281	June 20, 2024
Welcome to Firebolt Help Center Site Feedback	0	626	May 18, 2022

Troubleshooting CSV file ingestion

The Symptom

Troubleshooting

Related topics