%%sql
use prod.db2 CTE (Common Table Expression) improves code readability and reduces repetition
2.1 Why use a CTE
A CTE is a named select statement that can be reused in a single query.
Complex SQL queries often involve multiple subqueries. Multiple sub-queries make the code hard to read. Use a Common Table Expression (CTE) to make your queries readable.
CTEs also make testing complex queries simpler.
2.2 How to define a CTE
Use the WITH keyword to start defining a CTE; the WITH keyword is not necessary for consecutive CTE definitions.
%%sql
-- CTE definition
WITH
supplier_nation_metrics AS ( -- CTE 1 defined using WITH keyword
SELECT
n.n_nationkey,
SUM(l.l_QUANTITY) AS num_supplied_parts
FROM
lineitem l
JOIN supplier s ON l.l_suppkey = s.s_suppkey
JOIN nation n ON s.s_nationkey = n.n_nationkey
GROUP BY
n.n_nationkey
),
buyer_nation_metrics AS ( -- CTE 2 defined just as a name
SELECT
n.n_nationkey,
SUM(l.l_QUANTITY) AS num_purchased_parts
FROM
lineitem l
JOIN orders o ON l.l_orderkey = o.o_orderkey
JOIN customer c ON o.o_custkey = c.c_custkey
JOIN nation n ON c.c_nationkey = n.n_nationkey
GROUP BY
n.n_nationkey
)
SELECT -- The final select will not have a comma before it
n.n_name AS nation_name,
s.num_supplied_parts,
b.num_purchased_parts
FROM
nation n
LEFT JOIN supplier_nation_metrics s ON n.n_nationkey = s.n_nationkey
LEFT JOIN buyer_nation_metrics b ON n.n_nationkey = b.n_nationkey
LIMIT 10;Note that the last CTE does not have a , after it.
Let’s look at ane example: Calculate the money lost due to discounts. Use the lineitem to retrieve the prices of items (excluding discounts) that are part of an order and compare them to the order.
The l_extendedprice column does not include discounts & the o_totalprice column includes discounts.
%%sql
WITH lineitem_agg AS (
SELECT
l_orderkey,
SUM(l_extendedprice) AS total_price_without_discount
FROM
lineitem
GROUP BY
l_orderkey
)
SELECT
o.o_orderkey,
o.o_totalprice,
l.total_price_without_discount - o.o_totalprice AS amount_lost_to_discount
FROM
orders o
JOIN
lineitem_agg l ON o.o_orderkey = l.l_orderkey
ORDER BY
o.o_orderkey;Note how each CTE can correspond to getting data in a certain grain with necessary enrichments and then compare it to the order’s data, whose total price has been computed with discounts.
2.3 Recreating similar CTE is a sign that it should be a table
A sql query with multiple temporary tables is better than a 1000-line SQL query with numerous CTEs.
Keep the number of CTEs per query small (depends on the size of the query, but typically < 5)
Assume that you have stakeholders running the below query multiple times as needed.
When multiple stakeholders repeatedly run the exact CTE definition, it is usually an indication that the CTE should be created as a table or view to ensure stakeholders have a unified definition.
%%sql
WITH orders_cte AS (
SELECT
o_orderkey,
o_custkey,
o_orderstatus,
CAST(o_orderdate AS TIMESTAMP) AS o_orderdate,
o_orderpriority,
o_clerk,
o_shippriority,
o_comment,
o_totalprice
FROM orders
),
stg_customers AS (
SELECT
c_custkey,
c_name,
c_address,
c_nationkey,
c_phone,
c_acctbal,
c_mktsegment,
c_comment
FROM customer
),
nation_cte AS (
SELECT
CAST(n_nationkey AS INT) AS n_nationkey,
CAST(n_name AS STRING) AS n_name,
CAST(n_regionkey AS INT) AS n_regionkey,
CAST(n_comment AS STRING) AS n_comment
FROM nation
),
dim_customers AS (
SELECT
c.c_custkey,
c.c_name,
c.c_address,
c.c_nationkey,
n.n_name AS nation_name,
c.c_phone,
c.c_acctbal,
c.c_mktsegment,
c.c_comment
FROM stg_customers c
INNER JOIN nation_cte n ON c.c_nationkey = n.n_nationkey
)
SELECT
o.o_orderkey,
o.o_custkey,
o.o_orderstatus,
o.o_orderdate,
o.o_orderpriority,
o.o_clerk,
o.o_shippriority,
o.o_totalprice,
c.c_name AS customer_name,
c.c_address AS customer_address,
c.c_phone AS customer_phone,
c.c_acctbal AS customer_account_balance,
c.c_mktsegment AS customer_market_segment,
c.nation_name AS customer_nation_name
FROM orders_cte o
INNER JOIN dim_customers c ON o.o_custkey = c.c_custkey;2.4 Exercises
- Sellers (name) who have sold at least one of the top 10 selling parts (use CTE)