Why are some columns blank or 0 when selecting from an external table linked to a parquet file?

Data in columns will not correctly display if there is a discrepancy between the column name in the parquet schema and the column name in the Firebolt DDL for the external table.

The parquet name and the Firebolt DDL name must match exactly - both spelling and case. If there is a mismatch (or if the DDL column names do not exist in the schema at all), then the mismatched columns will show as empty (for text data types), 0 (for numeric data types), or [] (for array data types).

As long as 1 column matches, data will be selected and returned. If no columns match, this error is returned

No external table columns were found in object metadata

Requirements

In order to compare the Firebolt DDL to the parquet schema, it is necessary to look at the schema. There are a number of tools/utilities available to show the schema for a parquet file. One of the most common is a utility called parquet-tools.

Examples

This section describes some scenarios that are not as obvious as misspelling column names or having an extra column in the external table

  • The case does not match.

    • The column name case has to be the same in schema and in the DDL, including mixed case.

Parquet schema:    message test_schema {    optional int64 l_Orderkey;    optional int64 l_Partkey;    }    Valid DDL:CREATE EXTERNAL TABLE ex_test (  "l_Orderkey" INT,  "l_Partkey" INT);Invalid DDL - the "o" in orderkey is not uppercase (the first column will show 0 when SELECTed):CREATE EXTERNAL TABLE ex_test (  "l_orderkey" INT,  "l_Partkey" INT);
  • The parquet columns are LIST elements with substructures

    • If the parquet schema shows a LIST structure, the DDL must be set up in one of two ways

      • Use the short column name and issue this command before SELECTing:

        • SET use_short_column_path_parquet = 1;

      • Use the fully qualified column name

Parquet schema:      message test_schema {        optional int64 country;        optional group sites (LIST) {          repeated group list {            required binary element (STRING);          }      }  Valid short column name:    CREATE EXTERNAL TABLE ex_test    ("country" int ,    "sites" ARRAY(TEXT));    -- must set this parameter before selecting data from the external table    SET use_short_column_path_parquet = 1;Valid fully qualified column name:    CREATE EXTERNAL TABLE IF NOT EXISTS test    ("country" int ,    "sites.list.element" ARRAY(TEXT)    )