Firebolt Sample Dataset: Zipper Retail Data Model Example

TL;DR

The "Zipper Retail" use case, highlighted in various help center articles, introduces an e-commerce platform built to exhibit Firebolt's capabilities in high-speed data analytics. It encompasses a range of table types, data ingestion strategies, advanced indexing, and diverse data types aimed at training, workshops, and documentation. This article will detail the model, relevant tables, and the SQL code needed to create it.

Fictional Company and Use Case Details

  • Industry: E-commerce
  • Name: Zipper Retail
  • Description: Zipper Retail transforms the online shopping experience with lightning-fast search, instant inventory updates, and rapid checkout processes. By harnessing state-of-the-art technologies, Zipper Retail ensures seamless and efficient transactions across a broad product spectrum on a global scale.
  • Use Case: Leveraging Firebolt's data warehouse capabilities, Zipper Retail aims to achieve real-time analytics on customer interactions, inventory management, and transaction processing. The primary focus is enhancing operational efficiency, reducing query response times, and delivering a superior customer experience.

Data Model - Detailed Description

The data model for Zipper Retail includes one main fact table and six dimension tables, designed to highlight Firebolt's performance and capabilities in handling diverse data types and structures.

Fact Table

  • orders: Records of customer orders.
    • order_id (STRING): Unique identifier for each order.
    • customer_id (STRING): Identifier linking to the customer dimension table.
    • product_id (STRING): Identifier linking to the product dimension table.
    • quantity (INT): Number of products ordered.
    • order_price (FLOAT): Total price of the order.
    • order_timestamp (TIMESTAMP): Timestamp when the order was placed.
    • payment_method (STRING): Identifier linking to the payment methods dimension table.
    • shipping_method (STRING): Identifier linking to the shipping options dimension table.

Dimension Tables

  • products: Details about products available for purchase.
    • product_id (STRING): Unique identifier for each product.
    • name (STRING): Product name.
    • category (STRING): Category of the product.
    • price (FLOAT): Price of the product.
    • availability_status (BOOLEAN): Availability status of the product (true for available, false for not available).
  • customers: Information about customers.
    • customer_id (STRING): Unique identifier for each customer.
    • name (STRING): Customer's name.
    • contact_info (JSON): Contact information, including email and phone number.
    • shopping_preferences (JSON): JSON structure capturing customer preferences.
  • inventory: Inventory levels of products across warehouses.
    • product_id (STRING): Identifier for the product, linked to the products table.
    • warehouse_id (STRING): Identifier for the warehouse where the product is stored.
    • quantity (INT): Quantity of the product available in the warehouse.
  • shipping_options: Shipping methods available.
    • shipping_method (STRING): Name of the shipping method.
    • cost (FLOAT): Shipping cost.
    • estimated_delivery_time (STRING): Estimated delivery time for the shipping method.
  • payment_methods: Information on payment methods.
    • payment_method (STRING): Type of payment method (e.g., credit card, PayPal).
    • processing_time (INT): Time taken to process the payment with this method.
  • event_logs: Logs of system events and user activities.
    • event_id (STRING): Unique identifier for each event.
    • event_timestamp (TIMESTAMP): Timestamp when the event occurred.
    • event_type (STRING): Type of event (e.g., login, logout, page view).
    • details (JSON): JSON structure with additional information about the event.

Analytical Questions

Below you will find selected analytical questions one might use for this use case:

  • What are the top-selling products in each category for a given time period? How has customer shopping behavior changed, and can we identify trends?
  • Which products frequently go out of stock, and how can we optimize inventory levels?
  • What is the average order value, and how does it vary based on payment methods and shipping options?
  • Can we predict customer churn based on historical data and customer interactions?
  • What are the peak shopping hours and days, and how can we staff accordingly to meet demand?
  • How can we personalize product recommendations for each customer to increase conversion rates?

Creating the Model in Firebolt

Please run the following scripts to recreate the model in your database.

DDL - Tables and Extrenal tables

-- External Table: External OrdersCREATE EXTERNAL TABLE IF NOT EXISTS external_orders(    order_id TEXT,    customer_id TEXT,    product_id TEXT,    quantity INT,    order_price FLOAT,    order_timestamp TIMESTAMP,    payment_method TEXT,    shipping_method TEXT)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*orders.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- External Table: External ProductsCREATE EXTERNAL TABLE IF NOT EXISTS external_products(    product_id TEXT,    name TEXT,    category TEXT,    price FLOAT,    availability_status BOOLEAN)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*products.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- External Table: External CustomersCREATE EXTERNAL TABLE IF NOT EXISTS external_customers(src TEXT NOT NULL)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*customers.json'TYPE = (JSON PARSE_AS_TEXT = TRUE);-- External Table: External InventoryCREATE EXTERNAL TABLE IF NOT EXISTS external_inventory(    product_id TEXT,    warehouse_id TEXT,    quantity INT)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*inventory.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- External Table: External Shipping OptionsCREATE EXTERNAL TABLE IF NOT EXISTS external_shipping_options(    shipping_method TEXT,    cost FLOAT,    estimated_delivery_time TEXT)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*shipping_options.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- External Table: External Payment MethodsCREATE EXTERNAL TABLE IF NOT EXISTS external_payment_methods(    payment_method TEXT,    processing_time INT)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*payment_methods.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- External Table: External Event LogsCREATE EXTERNAL TABLE IF NOT EXISTS external_event_logs(    event_id TEXT,    event_timestamp TIMESTAMP,    event_type TEXT,    details TEXT)URL = 's3://firebolt-publishing-public/help_center_assets/zipper_use_case/'OBJECT_PATTERN = '*event_logs.csv'TYPE = (CSV SKIP_HEADER_ROWS = 1);-- Create DDL for Internal Tables-- Fact Table: ordersCREATE FACT TABLE IF NOT EXISTS orders(    order_id TEXT ,    customer_id TEXT,    product_id TEXT,    quantity INT,    order_price FLOAT,    order_timestamp TIMESTAMP,    payment_method TEXT,    shipping_method TEXT);-- Dimension Table: productsCREATE DIMENSION TABLE IF NOT EXISTS products(    product_id TEXT ,    name TEXT,    category TEXT,    price FLOAT,    availability_status BOOLEAN);-- Dimension Table: customersCREATE DIMENSION TABLE IF NOT EXISTS customers(    customer_id TEXT,    name TEXT,    email TEXT,    phone_number TEXT,    shopping_preferences ARRAY(TEXT));-- Dimension Table: inventoryCREATE DIMENSION TABLE IF NOT EXISTS inventory(    product_id TEXT,    warehouse_id TEXT,    quantity INT);-- Dimension Table: shipping_optionsCREATE DIMENSION TABLE IF NOT EXISTS shipping_options(    shipping_method TEXT ,    cost FLOAT,    estimated_delivery_time TEXT);-- Dimension Table: payment_methodsCREATE DIMENSION TABLE IF NOT EXISTS payment_methods(    payment_method TEXT ,    processing_time INT);-- Dimension Table: event_logsCREATE DIMENSION TABLE IF NOT EXISTS event_logs(    event_id TEXT,    event_timestamp TIMESTAMP,    event_type TEXT,    details TEXT);

DML - Ingesting the data

-- Populate data from external_orders into its corresponding internal tableINSERT INTO orders SELECT *FROM external_orders;-- Populate data from external_products into its corresponding internal tableINSERT INTO products SELECT *FROM external_products;-- Populate data from external_customers into its corresponding internal tableINSERT INTO customersSELECT    JSON_EXTRACT(src, '/customer_id', 'text') as customer_id,    JSON_EXTRACT(src, '/name', 'text') as name,    JSON_EXTRACT(src, '/contact_info/email','text') as email,    JSON_EXTRACT(src, '/contact_info/phone_number','text') as phone_number,    JSON_EXTRACT(src, '/shopping_preferences/preferred_categories','array(text)') as preferred_shopping_categories,from     external_customers;-- Populate data from external_inventory into its corresponding internal tableINSERT INTO inventory SELECT *FROM external_inventory;-- Populate data from external_shipping_options into its corresponding internal tableINSERT INTO shipping_options SELECT *FROM external_shipping_options;-- Populate data from external_payment_methods into its corresponding internal tableINSERT INTO payment_methods SELECT *FROM external_payment_methods;-- Populate data from external_event_logs into its corresponding internal tableINSERT INTO event_logs SELECT *FROM external_event_logs;