Common Table Expressions

Create, Reuse and Repeat.

Introduction

A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries easier to read and maintain by breaking them into smaller, modular components. Unlike subqueries, CTEs provide a more readable structure and can be reused within the same query.

CTEs are defined using the WITH keyword, followed by the query they represent. They are particularly useful in customer management for tasks like hierarchical data representation, recursive queries, and simplifying complex joins.

Syntax of a CTE

The general syntax for a CTE in MS SQL is:

WITH CTE_Name (Column1, Column2, ...)
AS
(
    -- CTE Query
    SELECT Column1, Column2, ...
    FROM TableName
    WHERE Condition
)
-- Main query using the CTE
SELECT Column1, Column2, ...
FROM CTE_Name;

Example

Let’s start with a simple example where we list all customers who have spent more than $500 in total.

WITH CustomerTotalSpend AS
(
    SELECT customer_id, SUM(total_amount) AS total_spent
    FROM Purchases
    GROUP BY customer_id
)
SELECT c.customer_id, c.first_name, t.total_spent
FROM Customers c
JOIN CustomerTotalSpend t ON c.customer_id = t.customer_id
WHERE t.total_spent > 500;

In this example, the CTE CustomerTotalSpend calculates the total amount spent by each customer. The main query retrieves the customer details for those whose total spending exceeds $500.

Multiple CTEs

You can define multiple CTEs in a single query by separating them with commas. For example, if you want to find customers who made a purchase in 2023 and spent over $500, you can use two CTEs.

WITH Customers2023 AS
(
    SELECT customer_id
    FROM Purchases
    WHERE YEAR(purchase_date) = 2023
),
CustomerTotalSpend AS
(
    SELECT customer_id, SUM(total_amount) AS total_spent
    FROM Purchases
    GROUP BY customer_id
)
SELECT c.customer_id, c.first_name, t.total_spent
FROM Customers c
JOIN Customers2023 y ON c.customer_id = y.customer_id
JOIN CustomerTotalSpend t ON c.customer_id = t.customer_id
WHERE t.total_spent > 500;

Here, the first CTE (Customers2023) filters customers who made purchases in 2023, while the second CTE (CustomerTotalSpend) calculates the total spending for those customers.

Recursive CTEs

A recursive CTE is a CTE that refers to itself. Recursive CTEs are useful for working with hierarchical data, such as customer referral systems or organizational structures.

  1. Recursive CTE Example: Customer Referrals

Assume we have a table that tracks customer referrals, where each customer can refer others. We want to find all customers referred by a specific customer (let’s say customer_id = 1), including customers referred by those they referred, and so on.

WITH RecursiveReferrals (customer_id, referred_by) AS
(
    -- Anchor member: Start with the original customer
    SELECT customer_id, referred_by
    FROM CustomerReferrals
    WHERE referred_by= 1

    UNION ALL

    -- Recursive member: Find customers referred by the customers in the anchor member
    SELECT cr.customer_id, cr.referred_by
    FROM CustomerReferrals cr
    INNER JOIN RecursiveReferrals rr ON cr.referred_by= rr.customer_id
)
SELECT customer_id
FROM RecursiveReferrals;

In this query, the CTE recursively finds all customers referred directly or indirectly by the original customer (customer_id = 1).

  1. Recursive CTE Example: Sales Team Hierarchy

If your customer management system tracks a sales team structure, where each salesperson reports to another, you can use a recursive CTE to explore this hierarchy.

WITH SalesHierarchy (salesperson_id, manager_id, Level) AS
(
    -- Base case: Start with the top-level manager
    SELECT salesperson_id, manager_id, 0 AS Level
    FROM Sales
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: Find employees reporting to the current level of managers
    SELECT s.salesperson_id, s.manager_id, h.Level + 1
    FROM Sales s
    INNER JOIN SalesHierarchy h ON s.manager_id = h.salesperson_id
)
SELECT salesperson_id, manager_id, Level
FROM SalesHierarchy
ORDER BY Level;

Here, the CTE builds a hierarchy of salespersons, where each level represents the depth in the organizational chart.

Benefits of Using CTEs

  • Readability: CTEs break down complex queries into manageable parts, improving query readability.
  • Reusability: A CTE can be referenced multiple times within the same query, reducing repetition.
  • Modularity: CTEs enable a modular approach to query building, allowing you to structure your query logic step by step.
  • Simplifying Recursive Queries: Recursive CTEs are a powerful tool for dealing with hierarchical and self-referential data.

Limitations of CTEs

  • Scope: CTEs are only valid within the query in which they are defined. Once the query execution is complete, the CTE result is discarded.
  • Performance: For very large datasets, complex recursive CTEs may have performance overhead. In such cases, using indexed tables or optimizing joins might be more efficient.

Conclusion

CTEs provide a powerful way to simplify complex queries and make them more readable and maintainable. They are especially useful in customer management for calculating aggregated data, handling hierarchical structures, and constructing recursive queries. When used effectively, CTEs can significantly enhance the flexibility of your MS SQL queries, whether you're analyzing customer purchases, tracking referrals, or managing team hierarchies.

Asked in Interview

  • What is a Common Table Expression (CTE) in SQL, and why is it used?
  • What is the difference between a CTE and a derived table?
  • Can a CTE reference itself in SQL?
  • What is the difference between a CTE and a temporary table?
  • Can you update data in a table using a CTE?
  • Can a CTE contain multiple queries or multiple CTEs?
  • What is a Common Table Expression (CTE) in SQL, and how is it different from a subquery?
  • What are the advantages of using a CTE over a derived table or subquery?
  • What is a recursive CTE, and how does it work?
  • How does recursion terminate in a recursive CTE?
  • How does a CTE affect query performance compared to a subquery?