Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / ASP.NET

Improving SQL query performance using JOIN

4.57/5 (5 votes)
30 Nov 2015CPOL2 min read 13K  
Faced slow performance in certain stored procedure, which caused timeout error. This was improved using WITH clause and JOINs

Introduction

We faced a performance issue with one of our web service methods and the root cause of the issue was traced to a poor performing stored procedure. I have described where the performance bottleneck was and how we removed it.

Description

We had a call to a webservice method from biztalk and it was taking a long time, around 2 hrs for processing around 6k records and caused a timeout issue. To investigate it, we made a replica of the production environment in a test machine. While debugging, we found that it took an unusually long time to execute a stored procedure. 

So we went and examined it and found it to take a very long time. It has used cursors and was taking around 2 hours to process a file of 11MB size. The slow performing queries and logic are given below.

Original Procedure

SQL
DECLARE @CommData TABLE (ID INT, CommodityIndicatorID INT, _
	CommodityID INT, ParentCommodityID INT, Date DATETIME) 
INSERT @CommData SELECT ID, CommodityIndicatorID, CommodityID, _
	ParentCommodityID, Date FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0

DECLARE @ParentCommID INT
DECLARE @CommID INT
DECLARE @RawDate DATETIME
DECLARE @RawDataID INT
DECLARE @TempData TABLE (ID INT)
DECLARE RawData_Cursor CURSOR FOR SELECT ID FROM @CommData 
OPEN RawData_Cursor
-- Perform the first fetch.
FETCH NEXT FROM RawData_Cursor into @RawDataID
-- Check @@FETCH_STATUS to see if there are any more rows to fetch.
WHILE @@FETCH_STATUS = 0
BEGIN

SET @ParentCommID = (Select ParentCommodityID FROM @CommData where ID = @RawDataID)
SET @CommID = (Select CommodityID FROM @CommData where ID = @RawDataID)
SET @RawDate = (Select Date FROM @CommData where ID = @RawDataID)

IF(@ParentCommID = 0 OR @ParentCommID IS NULL)
	INSERT @TempData SELECT ID FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID _
	AND (CommodityID = @CommID OR ParentCommodityID = @CommID) 
		AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
		CommodityRawData.Date, 101)
ELSE
	INSERT @TempData SELECT ID FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID _
	AND (CommodityID = @ParentCommID OR ParentCommodityID = @ParentCommID) 
		AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
		CommodityRawData.Date, 101)

-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM RawData_Cursor into @RawDataID

END
CLOSE RawData_Cursor
DEALLOCATE RawData_Cursor

UPDATE CommodityRawData SET IsSelected = 1, _
	IsTreated = 0 WHERE ID IN (SELECT ID FROM @TempData)

SELECT * FROM CommodityRawData WHERE ID IN (SELECT DISTINCT ID FROM @TempData)

As seen, there is a row by row processing using a certain business logic and applying certain flags for selected records. Since it uses cursors, it was very much slow. We wondered if there was any other way in which the same logic can be applied. Then, came the WITH clause and JOIN. Rather than identifying each and every row, we used joins to do batch querying and storing the list of ids in a temporary table. Once we have all the ids in a temp table, we joined it with the main table to update the flags in the main table. This turned out to be much faster than the cursor based approach and we didn't get any timeout expiry after that. Hopefully, this knowledge will help someone in a similar scenario.

Changed Procedure

SQL
Changed Queries:

DECLARE @TempData TABLE (ID INT);

WITH CommData(ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date) AS 
(
SELECT ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date FROM CommodityRawData 
WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0
)
INSERT @TempData  
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and 
(c.CommodityID = CommData.ParentCommodityID OR c.ParentCommodityID = CommData.ParentCommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is not null and CommData.ParentCommodityID <> 0)
UNION
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and 
(c.CommodityID = CommData.CommodityID OR c.ParentCommodityID = CommData.CommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is  null or CommData.ParentCommodityID = 0)

UPDATE CommodityRawData SET IsSelected = 1, IsTreated = 0 
FROM CommodityRawData c inner join @TempData t
ON c.ID = t.ID

SELECT c.* FROM CommodityRawData c inner join @TempData t ON c.ID = t.ID

Test Result

The original query with CURSOR took 2 hrs to process a set of rows in a 11 MB file. The changed query using WITH and JOIN processed the same data within 10 minutes. Hence, using JOIN gives much better performance than using CURSOR.  

Learning

Important learning is to use joins wherever possible instead of cursors. Cursors are recommended only when there is complete business logic which absolutely cannot be accomplished by joins. Also, we should ensure that the join based approach returns the exact same result set returned by cursors based approach.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)