2  CTE (Common Table Expression) improves code readability and reduces repetition

2.1 Why use a CTE

A CTE is a named select statement that can be reused in a single query.

Complex SQL queries often involve multiple subqueries. Multiple sub-queries make the code hard to read. Use a Common Table Expression (CTE) to make your queries readable.

CTEs also make testing complex queries simpler.

2.2 How to define a CTE

Use the WITH keyword to start defining a CTE; the WITH keyword is not necessary for consecutive CTE definitions.

%%sql
use prod.db
%%sql
-- CTE definition
WITH
  supplier_nation_metrics AS ( -- CTE 1 defined using WITH keyword
    SELECT
      n.n_nationkey,
      SUM(l.l_QUANTITY) AS num_supplied_parts
    FROM
      lineitem l
      JOIN supplier s ON l.l_suppkey = s.s_suppkey
      JOIN nation n ON s.s_nationkey = n.n_nationkey
    GROUP BY
      n.n_nationkey
  ),
  buyer_nation_metrics AS ( -- CTE 2 defined just as a name
    SELECT
      n.n_nationkey,
      SUM(l.l_QUANTITY) AS num_purchased_parts
    FROM
      lineitem l
      JOIN orders o ON l.l_orderkey = o.o_orderkey
      JOIN customer c ON o.o_custkey = c.c_custkey
      JOIN nation n ON c.c_nationkey = n.n_nationkey
    GROUP BY
      n.n_nationkey
  )
SELECT -- The final select will not have a comma before it
  n.n_name AS nation_name,
  s.num_supplied_parts,
  b.num_purchased_parts
FROM
  nation n
  LEFT JOIN supplier_nation_metrics s ON n.n_nationkey = s.n_nationkey
  LEFT JOIN buyer_nation_metrics b ON n.n_nationkey = b.n_nationkey
LIMIT 10;

Note that the last CTE does not have a , after it.

Let’s look at ane example: Calculate the money lost due to discounts. Use the lineitem to retrieve the prices of items (excluding discounts) that are part of an order and compare them to the order.

The l_extendedprice column does not include discounts & the o_totalprice column includes discounts.

%%sql
WITH lineitem_agg AS (
    SELECT 
        l_orderkey,
        SUM(l_extendedprice) AS total_price_without_discount
    FROM 
        lineitem
    GROUP BY 
        l_orderkey
)
SELECT 
    o.o_orderkey,
    o.o_totalprice, 
    l.total_price_without_discount - o.o_totalprice AS amount_lost_to_discount
FROM 
    orders o
JOIN 
    lineitem_agg l ON o.o_orderkey = l.l_orderkey
ORDER BY 
    o.o_orderkey;

Note how each CTE can correspond to getting data in a certain grain with necessary enrichments and then compare it to the order’s data, whose total price has been computed with discounts.

2.3 Recreating similar CTE is a sign that it should be a table

A sql query with multiple temporary tables is better than a 1000-line SQL query with numerous CTEs.

Keep the number of CTEs per query small (depends on the size of the query, but typically < 5)

Assume that you have stakeholders running the below query multiple times as needed.

When multiple stakeholders repeatedly run the exact CTE definition, it is usually an indication that the CTE should be created as a table or view to ensure stakeholders have a unified definition.

%%sql
WITH orders_cte AS (
    SELECT
        o_orderkey,
        o_custkey,
        o_orderstatus,
        CAST(o_orderdate AS TIMESTAMP) AS o_orderdate,
        o_orderpriority,
        o_clerk,
        o_shippriority,
        o_comment,
        o_totalprice
    FROM orders
),
stg_customers AS (
    SELECT
        c_custkey,
        c_name,
        c_address,
        c_nationkey,
        c_phone,
        c_acctbal,
        c_mktsegment,
        c_comment
    FROM customer
),
nation_cte AS (
    SELECT
        CAST(n_nationkey AS INT) AS n_nationkey,
        CAST(n_name AS STRING) AS n_name,
        CAST(n_regionkey AS INT) AS n_regionkey,
        CAST(n_comment AS STRING) AS n_comment
    FROM nation
),
dim_customers AS (
    SELECT
        c.c_custkey,
        c.c_name,
        c.c_address,
        c.c_nationkey,
        n.n_name AS nation_name,
        c.c_phone,
        c.c_acctbal,
        c.c_mktsegment,
        c.c_comment
    FROM stg_customers c
    INNER JOIN nation_cte n ON c.c_nationkey = n.n_nationkey
)
SELECT
    o.o_orderkey,
    o.o_custkey,
    o.o_orderstatus,
    o.o_orderdate,
    o.o_orderpriority,
    o.o_clerk,
    o.o_shippriority,
    o.o_totalprice,
    c.c_name AS customer_name,
    c.c_address AS customer_address,
    c.c_phone AS customer_phone,
    c.c_acctbal AS customer_account_balance,
    c.c_mktsegment AS customer_market_segment,
    c.nation_name AS customer_nation_name
FROM orders_cte o
INNER JOIN dim_customers c ON o.o_custkey = c.c_custkey;

2.4 Exercises

  1. Sellers (name) who have sold at least one of the top 10 selling parts (use CTE)