SQL Joins

Working with multiple tables.

Introduction

SQL logo

SQL joins are required in relational databases, where data is frequently divided over several tables, in order to merge rows from two or more tables according to a shared column.

There are various join kinds, and each has a distinct function:

  • INNER JOIN: Returns only matching rows from both tables.
  • LEFT JOIN: Returns all rows from the left table, and matching rows from the right table (with nulls if no match is found).
  • RIGHT JOIN:: Returns all rows from the right table, and matching rows from the left table (with nulls if no match is found).
  • FULL JOIN: Returns all rows when there is a match in either table, with nulls for non-matching rows.
  • CROSS JOIN: Returns the Cartesian product of the two tables.
  • SELF JOIN: A table joined with itself.

Syntax

SELECT column_name(s)
FROM table1 t1
JOIN table2 12
ON t1.column_name = t2.column_name;

Understanding table relationships

Let's understand how relational databases work. Tables in relational databases are often related by primary keys and foreign keys. These keys create a relationship between rows in different tables, and joins are used to retrieve data based on those relationships.

Example:

In a Customer and Orders relationship,

Customer has a primary key, customer_id. The Orders table contains a foreign key, customer_id, which relates back to the Customer table.

Now, let's understand basic SQL joins one by one.

Inner Join

Retrievs matching data.

The INNER JOIN is the most common type of join, used to retrieve rows that have matching values in both tables.

Example:

SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date
FROM 
    Customers c
INNER JOIN 
    Orders o
ON 
    c.customer_id = o.customer_id;

In this example, only customers who have placed orders will be included in the result set. Rows without a matching order will be excluded.

When to Use INNER JOIN:

  • When you only want to retrieve records that have matching values in both tables.
  • Useful for data integrity and ensuring there are no incomplete rows in the result.

Left Join

Including non-matching rows, but retrieves all rows from the left table.

A LEFT JOIN returns all rows from the left table (the first table in the query), and the matching rows from the right table. If no match is found, NULL values are returned for columns from the right table.

Example:

SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date
FROM 
    Customers c
LEFT JOIN 
    Orders o
ON 
    c.customer_id = o.customer_id;

In this case, even if a customer has not placed an order, they will still be included in the result set, but the order_id and order_date fields will contain NULL values for that customer.

When to Use LEFT JOIN:

  • When you want to include all records from the left table, even if there are no corresponding records in the right table.
  • Useful for reporting scenarios where you need to see all entries, including those with no matches.
  • When you want to see missing information in the right table.

Right Join

Including Non-Matching Rows, but retrieves all rows from the right table.

The RIGHT JOIN works similarly to the LEFT JOIN, but it returns all rows from the right table and the matching rows from the left table. If no match is found, NULL values are returned for columns from the left table.

Example:

SELECT 
    o.order_id,
    o.order_date,
    c.customer_name
FROM 
    Orders o
RIGHT JOIN 
    Customers c
ON 
    o.customer_id = c.customer_id;

Here, even if an order does not have a corresponding customer, it will still appear in the result set, but with NULL values in the customer fields.

When to Use RIGHT JOIN:

  • When you need to include all records from the right table, regardless of whether there are matching records in the left table.
  • It’s less commonly used than LEFT JOIN, but useful in certain scenarios where the right table is the primary data set.
  • When you want to see missing information in the left table.

Full Join

Including All Rows from Both Tables.

A FULL OUTER JOIN returns all rows when there is a match in either table, with NULL values in places where one table doesn’t have a match.

Example:

SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date
FROM 
    Customers c
FULL JOIN 
    Orders o
ON 
    c.customer_id = o.customer_id;

In this case, the result will include all customers and all orders, regardless of whether they have matching records. If a customer has no orders, the order_id and order_date fields will be NULL. If an order has no corresponding customer, the customer_name field will be NULL.

When to Use FULL JOIN:

  • When you want to return all records from both tables, regardless of matches.
  • Useful for situations where data completeness is more important than filtering by matching records.

Best Practices

  • Use Appropriate Join Type: Choose the correct join type based on the relationship between tables and the result you want to achieve.
  • Filter Early: When working with large datasets, apply filters in the WHERE clause to reduce the number of rows before joining.
  • Use Aliases for Clarity: When joining multiple tables, use table aliases to make your queries more readable.

Note: We will learn about SELF Joins and CROSS joins in intermediate section.

Conclusion

Joins are one of the fundamental concepts of SQL that allow you to work with multiple tables in a powerful and flexible way. By mastering the different types of joins and understanding when to use them, you can query your data more effectively, combining information from multiple tables to gain insights and create meaningful reports.

Asked in Interview

  • What is a JOIN in SQL, and why is it used?
  • What are Joins in SQL? Explain the different types.
  • Explain the difference between an INNER JOIN and an OUTER JOIN (LEFT, RIGHT, FULL).
  • Explain the performance implications of using JOINs.
  • Write a query to combine rows from multiple tables using a UNION and explain the difference between UNION and JOIN.
  • What happens when you perform an INNER JOIN between two tables, and there are no matching records?
  • Write a query to fetch common records from two tables using INNER JOIN.
  • How do NULL values affect an INNER JOIN?
  • What are potential performance issues when using INNER JOIN on large datasets, and how can you optimize it?
  • Can INNER JOIN be used for data validation? How?