Performance tuning of a complex SQL query involves understanding its execution flow, optimizing query structure, and improving database interactions. In this article, I will discuss the approach which can be helpful for performance tuning of a complex SQL query.
To understand the approach, we will going to use some examples. In these examples, we have used 2 tables (sales.Customer & Sales.SalesOrderHeader) from the sample database AdventureWorks2022.
We have created the copy of these 2 tables to use in this article as shown below
select * into sales.Customer_demo from sales.Customer
select * into Sales.SalesOrderHeader_demo from Sales.SalesOrderHeader
1. Understand the Query Structure
In first step, we need to break down the SQL query to mutiple parts and identiy those part which doing heavy operations like joins, subqueries and aggregations like group by etc.
- Break down the SQL query into smaller parts (SELECT, JOIN, WHERE clauses).
- Identify heavy operations such as subqueries, joins, and aggregations.
Example:
select * from sales.Customer_demo cust
inner join Sales.SalesOrderHeader_demo soh on cust.CustomerID=soh.CustomerID
WHERE soh.orderdate >= '2013-01-01'
2. Analyze Execution Plan
- In this step we need to identify which parts of the query are consuming the most resources (scans, sorts, joins). For this, we can use tools like
EXPLAIN
, EXPLAIN ANALYZE
(PostgreSQL), or the SQL Server Execution Plan.
Example with SQL Server Execution Plan:
If we fetch the execution plan for the above SQL query, the execution plan show that a full table scan is happening instead of an indexed lookup as shown below:-
We can also use SET SHOWPLAN_TEXT ON
to get the execution plan as shown below
3. Index Optimization
- Create or modify indexes on columns used in
JOIN
, WHERE
, and ORDER BY
clauses. - Use composite indexes for multiple conditions.
- Use Covering Index to include all the columns of the select query in the indexes.
- Avoid using indexes for small tables, as this can unnecessarily increase overhead.
- Identify and delete the indexes which are no longer referenced as these indexes unnecessarily increase overhead and slow down the query performance.
- Fragmentation of indexes should be monitored regularly and indexes should be rebuild/reorganise accordingly.
- Clustered index should be avoided on the table which get very frequent DML changes due to high cost of maintaining the clustered index. It affect the performance very bad.
Example:
CREATE INDEX idx_SalesOrderHeader_orderdate ON Sales.SalesOrderHeader_demo(CustomerID,orderdate) include (Totaldue);
CREATE INDEX idx_CustomerID ON sales.Customer_demo(CustomerID,AccountNumber);
4. Avoid SELECT *
- Specify only the columns you need. Retrieving unnecessary columns increases data transfer time and degrade the performance.
Example: Instead of
SELECT * FROM sales.Customer_demo cust JOIN Sales.SalesOrderHeader_demo soh ON cust.CustomerID = soh.CustomerID WHERE soh.orderdate >= '2013-01-01';
below query should be used.
SELECT cust.AccountNumber,soh.TotalDue FROM sales.Customer_demo cust JOIN Sales.SalesOrderHeader_demo soh ON cust.CustomerID = soh.CustomerID WHERE soh.orderdate >= '2013-01-01';
5. Optimize Joins and Subqueries
- Prefer JOINs over subqueries/ corelated subqueries when possible, as correlated subqueries may re-execute for each row. Using Joins instead of subqueries will most of the time reduce the query execution time.
- Use inner joins instead of outer joins if you don’t need non-matching data.
Example: Instead of using a subquery:
SELECT cust.AccountNumber, (SELECT COUNT(*) FROM Sales.SalesOrderHeader_demo soh WHERE soh.CustomerID = cust.CustomerID) as total_orders
FROM sales.Customer_demo cust order by total_orders desc ;
Use a JOIN:
SELECT cust.AccountNumber, COUNT(soh.CustomerID) as total_orders
FROM sales.Customer_demo cust
LEFT JOIN Sales.SalesOrderHeader_demo soh ON cust.CustomerID = soh.CustomerID
GROUP BY cust.AccountNumber<br />order by total_orders desc ;
6. Limit Data Processing Early
- Use
WHERE
clauses to filter data before performing JOIN
operations or sorting. - If applicable, use LIMIT in MYSQL to reduce the number of rows processed. In case os SQL server, you can use the Row_number function or OFSET/ FETCH.. Next to reduce the number of rows as show below.
Example:
SELECT cust.AccountNumber,soh.TotalDue FROM sales.Customer_demo cust
JOIN Sales.SalesOrderHeader_demo soh ON cust.CustomerID = soh.CustomerID
WHERE soh.orderdate >= '2013-01-01' order by cust.AccountNumber
OFFSET 5 ROWS
FETCH NEXT 5 ROWS ONLY
7. Refactor Complex Logic
- Break down complex queries into temporary tables or Common Table Expressions (CTEs). This simplifies the query and enables optimization at different stages.
Example:
WITH recent_orders AS ( SELECT soh.CustomerID, soh.totaldue FROM Sales.SalesOrderHeader_demo soh WHERE soh.orderdate >= '2013-01-01' )
SELECT cust.AccountNumber, ro.totaldue FROM sales.Customer_demo cust JOIN recent_orders ro ON cust.customerid = ro.customerid;
8. Avoid Functions on Indexed Columns
- Avoid applying functions to indexed columns in
WHERE
clauses, as this negates the benefit of indexing.
Example: Instead of:
select * from Sales.SalesOrderHeader_demo soh where YEAR(soh.orderdate) = '2013'
Use:
select * from Sales.SalesOrderHeader_demo soh where soh.orderdate >= '2013-01-01' AND soh.orderdate < '2014-01-01';
9. Monitor and Tune
- Continuously monitor the database using tools like SQL Server Profiler, pg_stat_statements (PostgreSQL), or AWR Reports (Oracle).
- Focus on reducing I/O, CPU, and memory usage for resource-intensive queries.
10 .Use of “set nocount on”
Unless there is a need to know the number of rows affected by the execution of the query/stored procedure, we should use “set nocount on” at the beginning of the code block and “set nocount off” at the end of the code block as calculation of the number of rows affected adds extra cost to the performance.
Conclusion:
Tuning a complex SQL query involves analyzing its execution plan, optimizing indexes, and refactoring logic for better performance. By following this approach, you can significantly improve query execution time and reduce resource consumption.
The post How to approach for performance tuning of a Complex SQL Query appeared first on Technology with Vivek Johari.