Subqueries

Nested querry or Inner querry.

Introduction

In SQL, a subquery is a query nested inside another query. Subqueries are often used to perform intermediate steps or calculations that help the main query achieve more complex results. You can place subqueries in various SQL statements like SELECT, INSERT, UPDATE, or DELETE.

In customer management, subqueries are particularly useful for:

  • Finding specific customer data who have purchased certain products.
  • Aggregating data like total purchases, last transactions, or average spending per customer.

Key Characteristics of Subqueries

  • Subqueries must be enclosed in parentheses.
  • The output of a subquery can be used as a condition or as part of the result set for the outer query.
  • They can return a single value (scalar subquery), a list of values (row subquery), or even a full result set (table subquery).

Syntax

SELECT column_name
FROM table_name
WHERE column_name = (SELECT column_name FROM another_table WHERE condition);

Types of Subqueries

  1. Scalar Subquery

A scalar subquery returns a single value, which can be used in conditions or inserted directly into the select list. It is commonly used in comparison operations.

Example: Find the name of Customer who made a purchase after the last purchase date of 'Customer X'.

SELECT customer_name
FROM Sales
WHERE purchase_date > (
    SELECT Max(purchase_date )
    FROM Sales 
    WHERE customer_name= 'Customer X'
);
  1. Multi-Row Subqueries These subqueries return more than one row and are often used with operators like IN, ANY, ALL.

Example: List Customers who made purchases at stores located in the same city as other Customers.

SELECT customer_id, customer_name
FROM Customers 
WHERE City IN (SELECT store_city 
               FROM Purchases 
               WHERE Purchases.customer_id = Customers.customer_id);
  1. Subqueries in the SELECT Clause Subqueries can be used within the SELECT clause to provide additional derived values in the output.

Example: Show each customer’s total purchases.

SELECT customer_name, 
       (SELECT SUM(total_amount) 
        FROM Purchases 
        WHERE Purchases.customer_id = Customers.customer_id) AS TotalSpent
FROM Customers;

Here, the subquery calculates the total spent by each customer.

  1. Subqueries in the WHERE Clause

Subqueries in the WHERE clause are frequently used to filter results based on data from another table.

Example: List Customers who have spent more than $500.

SELECT customer_name
FROM Customers 
WHERE customer_id IN (SELECT customer_id 
                     FROM Purchases 
                     GROUP BY customer_id 
                     HAVING SUM(total_amount) > 500);
  1. Subqueries in the FROM Clause (Derived Tables)

You can use subqueries in the FROM clause to act as temporary tables.

Example: List customers with their average purchase amount.

SELECT customer_name, AvgPurchase 
FROM Customers c
JOIN (SELECT customer_id, AVG(total_amount) AS AvgPurchase 
      FROM Purchases 
      GROUP BY customer_id) p
ON c.customer_id = p.customer_id;

The subquery acts as a derived table and is joined with the Customers table.

  1. Correlated Subqueries

A correlated subquery references columns from the outer query. Each row processed by the outer query triggers execution of the inner query.

In correlated subqueries (also known as a repeating subquery), the subquery depends on the outer query for its values. which means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

Example: List customers whose total purchases are above the average amount spent by all customers.

SELECT customer_name
FROM Customers c
WHERE (SELECT SUM(total_amount) 
       FROM Purchases p 
       WHERE p.customer_id = c.customer_id) > 
      (SELECT AVG(total_amount) FROM Purchases);

Here, the inner query compares the customer's total spending to the average spending of all customers.

  1. EXISTS with Subqueries

The EXISTS keyword is used to check the existence of rows returned by a subquery. It returns TRUE if the subquery returns one or more rows.

Example: Find customers who made a purchase in 2024.

SELECT customer_name
FROM Customers c 
WHERE EXISTS (SELECT 1 
              FROM Purchases p 
              WHERE p.customer_id = c.customer_id 
              AND YEAR(PurchaseDate) = 2024);

Best Practices

  • Simplify complex queries by breaking them into smaller, more manageable subqueries.
  • Common Table Expressions (CTEs) are often more readable than subqueries, especially for complex queries.
  • Correlated subqueries are evaluated for each row, which can be slow. Use joins or CTEs when possible.
  • Avoid Subqueries in WHERE Clauses for Large Datasets. These can cause performance issues. Use joins or indexed columns instead.
  • Use EXISTS for Better Performance: In place of IN, EXISTS is often faster for checking the existence of rows.
  • Avoid deeply nested subqueries, as they can be hard to read and maintain.
  • Use Subqueries in the SELECT Clause Sparingly, as these can be expensive if repeated for every row.

Conclusion

Subqueries allow you to write complex queries by nesting one query within another. They are powerful for filtering data, computing aggregate values, and applying conditions across multiple tables. In customer management scenarios, subqueries help answer questions like finding top spenders, filtering by behavior, and deriving insights from complex purchase patterns.

Asked in Interview

  • What is the difference between a subquery and a JOIN?
  • What is a correlated subquery? Can you give an example?
  • Explain the difference between correlated and non-correlated subqueries.
  • How do you use a subquery with an EXISTS clause?
  • What is the difference between IN and EXISTS in subqueries?
  • What are the performance considerations when using subqueries in SQL?
  • Can a subquery be used in a SELECT statement? Give an example.