Indexes in SQL

Speed up data retrieval

Introduction

Indexes in MS SQL Server are special structures associated with tables or views, designed to speed up data retrieval. They allow SQL Server to find the desired rows much more efficiently, just like an index in a book helps you quickly locate information without flipping through every page.

Indexes store the location of rows in a table based on the values of one or more columns, reducing the number of I/O operations required to fetch data. While indexes can drastically improve query performance, they also come with overhead for data inserts, updates, and deletes.

Types of Indexes in MS SQL

Clustered Index

A clustered index determines the physical order of data in a table. A table can have only one clustered index because there can only be one physical order for rows.
When you create a primary key on a table, a clustered index is automatically created by default.

Example: Creating a clustered index on the customer_id column of the Customers table.

CREATE CLUSTERED INDEX IX_customer_id
ON Customers (customer_id);

In this example, the rows in the Customers table will be physically organized according to the customer_id.

Non-Clustered Index

Non-clustered indexes do not affect the physical order of the data. Instead, they create a separate structure that holds the column values and pointers to the actual data rows. You can have multiple non-clustered indexes on a table.

Example: Creating a non-clustered index on the last_name column of the Customers table.

CREATE NONCLUSTERED INDEX IX_last_name
ON Customers (last_name);

This index allows faster searching based on the last_name, while the actual data in the table remains ordered by the clustered index.

Advantages of Indexes

Improved Query Performance: Indexes speed up data retrieval, especially in large tables. Queries that search or filter data using indexed columns can skip reading unnecessary rows.
Faster Sorting and Searching: Indexes enable faster sorting of data, as SQL Server can directly access the sorted index instead of performing a full table scan.
Enhanced JOIN Operations: Indexes can significantly improve the performance of join operations between tables, especially when they are applied to the foreign keys involved.

Drawbacks of Indexes

While indexes enhance data retrieval, they also have some drawbacks:

Maintenance Overhead: When data is inserted, updated, or deleted, indexes need to be updated as well, which can slow down write operations.
Storage Costs: Indexes consume additional disk space since they create separate data structures to store indexed values and references to rows.

Indexing Strategies

Choosing the Right Columns

Index columns that are frequently used in WHERE, JOIN, ORDER BY, or GROUP BY clauses. Columns that are used to filter or sort data benefit the most from indexing.

Example: If most of your queries filter by city, creating an index on the city column makes retrieval faster.

CREATE NONCLUSTERED INDEX IX_city
ON Customers (city);

Composite Indexes

A composite index (or multi-column index) includes multiple columns in a single index. These indexes are useful when you query combinations of columns frequently.

Example: Creating a composite index on last_name and first_name to speed up searches based on both names.

CREATE NONCLUSTERED INDEX IX_FullName
ON Customers (last_name, first_name);

This index is efficient for queries like:

SELECT * FROM Customers
WHERE last_name = 'Smith' AND first_name = 'John';

Covering Indexes

A covering index contains all the columns needed by a query, reducing the need for SQL Server to access the table data at all.

Example: If you frequently query customer_id, first_name, and last_name together, you could create an index that includes all of these columns.

CREATE NONCLUSTERED INDEX IX_CustomerFullInfo
ON Customers (customer_id, first_name, last_name);

Indexes on Foreign Keys

Indexing foreign key columns can enhance the performance of JOIN operations between tables.

Example: Indexing the customer_id column in the Orders table to improve join performance with the Customers table.

CREATE NONCLUSTERED INDEX IX_Orders_customer_id
ON Orders (customer_id);

Filtered Indexes

A filtered index is a specialized form of a non-clustered index that only includes rows that meet a specified condition. Filtered indexes are useful for large tables where only a subset of data is queried frequently.

Example: Create an index on Orders where the status is "Completed."

CREATE NONCLUSTERED INDEX IX_CompletedOrders
ON Orders (order_date)
WHERE order_status = 'Completed';

This index optimizes queries that look for completed orders without indexing the entire table.

Index Maintenance

Rebuilding and Reorganizing Indexes

Over time, indexes can become fragmented due to frequent insertions, deletions, and updates, reducing their efficiency. Regularly rebuilding or reorganizing indexes can improve performance.

Rebuild Index: Recreates the entire index, compacting it and removing fragmentation.

ALTER INDEX IX_customer_id
ON Customers REBUILD;

Reorganize Index: Defragments the index without completely rebuilding it, making it less resource-intensive.

ALTER INDEX IX_customer_id
ON Customers REORGANIZE;

Best Practices

Limit the number of indexes: Having too many indexes can slow down write operations like inserts and updates.
Index selectively: Focus on indexing columns that are frequently used in queries.
Use clustered indexes wisely: Select a column that ensures rows are uniquely and logically ordered.

Conclusion

Indexes are a powerful tool to optimize the performance of your SQL queries in MS SQL Server, especially in large databases with complex queries. By selecting the right columns to index, creating composite and covering indexes where necessary, and maintaining indexes regularly, you can ensure efficient data retrieval and high-performing applications.

Asked in Interview

What is Indexing? How does it improve query performance?

What is the difference between Clustered and Non-Clustered Indexes?

How does an Index improve query performance?

What are the disadvantages of using Indexes?

Can an Index be created on multiple columns? If yes, give an example.

Can you create an Index on a computed column?

What is Index Fragmentation, and how do you address it?

When would an index not be used by the query optimizer?

Explain the concept of an Index Rebuild vs. an Index Reorganize.

Basic Constraints

Orgranizing data

Beginner

Intermediate

Advanced

Resources