%%sql
use prod.db
2 CTE (Common Table Expression) improves code readability and reduces repetition
2.1 Why use a CTE
A CTE is a named select statement that can be reused in a single query.
Complex SQL queries often involve multiple subqueries. Multiple sub-queries make the code hard to read. Use a Common Table Expression (CTE) to make your queries readable.
CTEs also make testing complex queries simpler.
2.2 How to define a CTE
Use the WITH
keyword to start defining a CTE; the WITH keyword is not necessary for consecutive CTE definitions.
%%sql
-- CTE definition
WITH-- CTE 1 defined using WITH keyword
supplier_nation_metrics AS (
SELECT
n.n_nationkey,
SUM(l.l_QUANTITY) AS num_supplied_parts
FROM
lineitem l= s.s_suppkey
JOIN supplier s ON l.l_suppkey = n.n_nationkey
JOIN nation n ON s.s_nationkey
GROUP BY
n.n_nationkey
),-- CTE 2 defined just as a name
buyer_nation_metrics AS (
SELECT
n.n_nationkey,
SUM(l.l_QUANTITY) AS num_purchased_parts
FROM
lineitem l= o.o_orderkey
JOIN orders o ON l.l_orderkey = c.c_custkey
JOIN customer c ON o.o_custkey = n.n_nationkey
JOIN nation n ON c.c_nationkey
GROUP BY
n.n_nationkey
)-- The final select will not have a comma before it
SELECT
n.n_name AS nation_name,
s.num_supplied_parts,
b.num_purchased_parts
FROM
nation n= s.n_nationkey
LEFT JOIN supplier_nation_metrics s ON n.n_nationkey = b.n_nationkey
LEFT JOIN buyer_nation_metrics b ON n.n_nationkey 10; LIMIT
Note that the last CTE does not have a ,
after it.
Let’s look at ane example: Calculate the money lost due to discounts. Use the lineitem to retrieve the prices of items (excluding discounts) that are part of an order and compare them to the order.
The l_extendedprice
column does not include discounts & the o_totalprice
column includes discounts.
%%sql
WITH lineitem_agg AS (
SELECT
l_orderkey,
SUM(l_extendedprice) AS total_price_without_discount
FROM
lineitem
GROUP BY
l_orderkey
)
SELECT
o.o_orderkey,
o.o_totalprice, - o.o_totalprice AS amount_lost_to_discount
l.total_price_without_discount
FROM
orders o
JOIN = l.l_orderkey
lineitem_agg l ON o.o_orderkey
ORDER BY ; o.o_orderkey
Note how each CTE can correspond to getting data in a certain grain with necessary enrichments and then compare it to the order’s data, whose total price has been computed with discounts.
2.3 Recreating similar CTE is a sign that it should be a table
A sql query with multiple temporary tables is better than a 1000-line SQL query with numerous CTEs.
Keep the number of CTEs per query small (depends on the size of the query, but typically < 5)
Assume that you have stakeholders running the below query multiple times as needed.
When multiple stakeholders repeatedly run the exact CTE definition, it is usually an indication that the CTE should be created as a table or view to ensure stakeholders have a unified definition.
%%sql
WITH orders_cte AS (
SELECT
o_orderkey,
o_custkey,
o_orderstatus,
CAST(o_orderdate AS TIMESTAMP) AS o_orderdate,
o_orderpriority,
o_clerk,
o_shippriority,
o_comment,
o_totalprice
FROM orders
),
stg_customers AS (
SELECT
c_custkey,
c_name,
c_address,
c_nationkey,
c_phone,
c_acctbal,
c_mktsegment,
c_comment
FROM customer
),
nation_cte AS (
SELECT
CAST(n_nationkey AS INT) AS n_nationkey,
CAST(n_name AS STRING) AS n_name,
CAST(n_regionkey AS INT) AS n_regionkey,
CAST(n_comment AS STRING) AS n_comment
FROM nation
),
dim_customers AS (
SELECT
c.c_custkey,
c.c_name,
c.c_address,
c.c_nationkey,
n.n_name AS nation_name,
c.c_phone,
c.c_acctbal,
c.c_mktsegment,
c.c_comment
FROM stg_customers c= n.n_nationkey
INNER JOIN nation_cte n ON c.c_nationkey
)
SELECT
o.o_orderkey,
o.o_custkey,
o.o_orderstatus,
o.o_orderdate,
o.o_orderpriority,
o.o_clerk,
o.o_shippriority,
o.o_totalprice,
c.c_name AS customer_name,
c.c_address AS customer_address,
c.c_phone AS customer_phone,
c.c_acctbal AS customer_account_balance,
c.c_mktsegment AS customer_market_segment,
c.nation_name AS customer_nation_name
FROM orders_cte o= c.c_custkey; INNER JOIN dim_customers c ON o.o_custkey
2.4 Exercises
- Sellers (name) who have sold at least one of the top 10 selling parts (use CTE)