Performance testing Firebolt with Locust.io

Locust.io is a useful Python-based open source tool for performance or load testing. You define the user behaviour through code and Locust will then swarm your system with the defined number of user requests (and any usage patterns you specify). You get a nice UI for all of this, you can export the results to analyze them further and can run headless through CLI or a scheduled workflow. It can also run in distributed mode in case you want to generate massive traffic.

This article is a short how-to guide on performance testing Firebolt with Locust. This is a useful exercise to find the optimal engine for your use case, to avoid bottlenecks and have good overall performance (for peak usage for example).

For more information on Locust and various ways of using it, see the official docs here.

Setting up your test suite

The first thing to do is to install Locust on your machine:

pip install locust

The next step is to define your test. To test Firebolt, there are two things we need to do:

  1. Create a login process for the users. We will use the Firebolt authentication REST API for this which returns the auth token back (more info here). We will reuse this token across all users that Locust spawns - this is important because we want to avoid logging in too many times so that we don't get locked out (Firebolt has a mechanism to detect too many login attempts and will block us in this case). You will need your Firebolt credentials for this.
  2. Define your queries and the database/engine you want to use. In the sample below, we're using a Python list where each element is a query that will be executed sequentially. This step is specific to the use case you are testing; in the sample below we are imitating a Looker dashboard that has 7 queries that need to be executed (one for each of the visuals on the dashboard).

Below is a sample Locust file that can be reused:

import osimport requestsfrom json import loadsfrom locust import HttpUser, task, between, User, events, constant, tag, LoadTestShapeclass FireboltUser(HttpUser):    # this is the endpoint of the Firebolt engine that will be used in the test and the database name    host = "https://adtechdb-v4-analytics.firebolt-demo.us-east-1.app.firebolt.io/?database=AdTechDB_v4"    # wait time before each task (in this case random time between 1 and 5 seconds)    wait_time = between(1, 5)    # a list of SQL queries to execute - these are the actual queries for the performance/load test    firebolt_queries = [            """            --impressions            SELECT                COALESCE(SUM("ltv"."impressions_count"), 0) AS "ltv.total_impressions"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone');            """,            """            --clicks            SELECT                COALESCE(SUM("ltv"."clicks_count"), 0) AS "ltv.total_clicks"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone');            """,            """            --clickthrough rate            SELECT                COALESCE(SUM("ltv"."clicks_count"), 0) / NULLIF(COALESCE(SUM("ltv"."impressions_count"), 0), 0) AS "ltv.clickthrough_rate"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone');            """,            """            --install rate            SELECT                COALESCE(SUM("ltv"."installs_count"), 0) / NULLIF(COALESCE(SUM("ltv"."clicks_count"), 0), 0) AS "ltv.install_rate"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone');            """,            """            --installs            SELECT                COALESCE(SUM("ltv"."installs_count"), 0) AS "ltv.total_installs"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone');            """,            """            --daily funnel (logarithmic scale)            SELECT                (DATE_FORMAT("ltv"."ltv_timestamp_date", '%Y-%m-%d')) AS "ltv.date_date",                COALESCE(SUM("ltv"."impressions_count"), 0) AS "ltv.total_impressions",                COALESCE(SUM("ltv"."clicks_count"), 0) AS "ltv.total_clicks",                COALESCE(SUM("ltv"."installs_count"), 0) AS "ltv.total_installs"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone')            GROUP BY                1            ORDER BY                1            """,            """            --performance summary            SELECT                "owned_apps"."name" AS "owned_apps.name",                "ltv"."media_source" AS "ltv.media_source",                "owned_apps"."platform" AS "owned_apps.platform",                COALESCE(SUM("ltv"."impressions_count"), 0) AS "ltv.total_impressions",                    COALESCE(SUM("ltv"."clicks_count"), 0) / NULLIF(COALESCE(SUM("ltv"."impressions_count"), 0), 0) AS "ltv.clickthrough_rate",                COALESCE(SUM("ltv"."clicks_count"), 0) AS "ltv.total_clicks",                    COALESCE(SUM("ltv"."installs_count"), 0) / NULLIF(COALESCE(SUM("ltv"."clicks_count"), 0), 0) AS "ltv.install_rate",                COALESCE(SUM("ltv"."installs_count"), 0) AS "ltv.total_installs"            FROM                "AdTechDB_v4"."ltv" AS "ltv"                LEFT JOIN "AdTechDB_v4"."owned_apps" AS "owned_apps" ON "ltv"."app_id" = "owned_apps"."app_slug"            WHERE ((( "ltv"."ltv_timestamp_date" ) >= (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-01')))) AND ( "ltv"."ltv_timestamp_date" ) < (DATE_TRUNC('day', TO_TIMESTAMP(TO_DATE('2018-03-16')))))) AND "ltv"."media_source" = '3bc36bb9f1bded2703684dae170581dc' AND "owned_apps"."platform" IN ('android', 'ios', 'windowsphone')            GROUP BY                1,                2,                3            ORDER BY                4 DESC;            """    ]    # a testing task that will execute all defined SQL queries in sequence (to mock a dashboard for example).     # This can be extended if needed, or additional tasks can be added in case you want to test multiple dashboards in parallel for example.    @tag('Dashboard1')    @task    def Dashboard1(self):        self.client.headers = {"Authorization": f"Bearer {token}"}        for query in self.firebolt_queries:            response = self.client.post(url=self.host, data=query)    # login process - this is done only once for all users and the auth token is reused in all calls to the Firebolt REST API.    # This part can be left untouched. Note that you need to supply your Firebolt username and password as environment variables before launching the test.    @events.init.add_listener    def on_locust_init(environment, **_kwargs):        global token        login_headers = {"content-type":"application/json;charset=UTF-8"}        login_request = requests.post(url="https://api.app.firebolt.io/auth/v1/login",                                        json = {"username":os.environ["FIREBOLT_USER"],"password":os.environ["FIREBOLT_PASSWORD"]},                                        headers=login_headers)        token = loads(login_request.content.decode("utf-8"))["access_token"]

Replace the host variable with your Firebolt engine and database endpoint, and the firebolt_queries with the SQL queries for your test use case. You are now ready to run your first test.

RUNNING THE TEST SUITE

Once you have your Locust Python file defined, you can run the test. The following commands will run the test in the Locust web UI, but there are various other options for running it as well (such as headless). Consult the Locust docs for more: https://docs.locust.io/en/stable/configuration.html.

$ export FIREBOLT_USER=my_user@domain.com$ export FIREBOLT_PASSWORD=my_password$ locust -f locust_firebolt.py

Go to http://0.0.0.0:8089/ in your browser to start the test:

image.png

Define the number of users for peak concurrency (such as 10) and the spawn rate (number of users created every second, starting from 1) and hit the Start swarming button. This immediately starts the performance test.

Note that you can also define various "load shapes" or patterns of user behaviour as well. For example, you can have a bell curve where you start with 5 users and then gradually go to 20 concurrent users, and then go back down to 5 again. More on that on the Locust docs: https://docs.locust.io/en/stable/custom-load-shape.html

ANALYZING THE RESULTS

Once the test has finished, you can analyze the results. This can be done in the Locust web UI, or by exporting the results in CSV. Locust will provide you metrics such as total RPS (requests per second), total users, median response times, etc.

This can then inform the Firebolt engine selection choice. For example, if you think the performance metrics (such as median or min/max response times) are not sufficient - you can go into tuning your data model and queries, or test another engine type or configuration.

image.png

Conclusion

Locust in a very easy to use and useful tool for performance and load testing. This tutorial was a short introduction on how to use Locust with Firebolt, together with a sample Locust file to give you a kickstart.