Introduction
We faced a performance issue with one of our web service methods and the root cause of the issue was traced to a poor performing stored procedure. I have described where the performance bottleneck was and how we removed it.
Description
We had a call to a webservice method from biztalk and it was taking a long time, around 2 hrs for processing around 6k records and caused a timeout issue. To investigate it, we made a replica of the production environment in a test machine. While debugging, we found that it took an unusually long time to execute a stored procedure.
So we went and examined it and found it to take a very long time. It has used cursors and was taking around 2 hours to process a file of 11MB size. The slow performing queries and logic are given below.
Original Procedure
DECLARE @CommData TABLE (ID INT, CommodityIndicatorID INT, _
CommodityID INT, ParentCommodityID INT, Date DATETIME)
INSERT @CommData SELECT ID, CommodityIndicatorID, CommodityID, _
ParentCommodityID, Date FROM CommodityRawData _
WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0
DECLARE @ParentCommID INT
DECLARE @CommID INT
DECLARE @RawDate DATETIME
DECLARE @RawDataID INT
DECLARE @TempData TABLE (ID INT)
DECLARE RawData_Cursor CURSOR FOR SELECT ID FROM @CommData
OPEN RawData_Cursor
FETCH NEXT FROM RawData_Cursor into @RawDataID
WHILE @@FETCH_STATUS = 0
BEGIN
SET @ParentCommID = (Select ParentCommodityID FROM @CommData where ID = @RawDataID)
SET @CommID = (Select CommodityID FROM @CommData where ID = @RawDataID)
SET @RawDate = (Select Date FROM @CommData where ID = @RawDataID)
IF(@ParentCommID = 0 OR @ParentCommID IS NULL)
INSERT @TempData SELECT ID FROM CommodityRawData _
WHERE CommodityIndicatorID = @CRD_IndicatorID _
AND (CommodityID = @CommID OR ParentCommodityID = @CommID)
AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
CommodityRawData.Date, 101)
ELSE
INSERT @TempData SELECT ID FROM CommodityRawData _
WHERE CommodityIndicatorID = @CRD_IndicatorID _
AND (CommodityID = @ParentCommID OR ParentCommodityID = @ParentCommID)
AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
CommodityRawData.Date, 101)
FETCH NEXT FROM RawData_Cursor into @RawDataID
END
CLOSE RawData_Cursor
DEALLOCATE RawData_Cursor
UPDATE CommodityRawData SET IsSelected = 1, _
IsTreated = 0 WHERE ID IN (SELECT ID FROM @TempData)
SELECT * FROM CommodityRawData WHERE ID IN (SELECT DISTINCT ID FROM @TempData)
As seen, there is a row by row processing using a certain business logic and applying certain flags for selected records. Since it uses cursors, it was very much slow. We wondered if there was any other way in which the same logic can be applied. Then, came the WITH
clause and JOIN
. Rather than identifying each and every row, we used join
s to do batch querying and storing the list of ids in a temporary table. Once we have all the ids in a temp table, we joined it with the main table to update the flags in the main table. This turned out to be much faster than the cursor based approach and we didn't get any timeout expiry after that. Hopefully, this knowledge will help someone in a similar scenario.
Changed Procedure
Changed Queries:
DECLARE @TempData TABLE (ID INT);
WITH CommData(ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date) AS
(
SELECT ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date FROM CommodityRawData
WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0
)
INSERT @TempData
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and
(c.CommodityID = CommData.ParentCommodityID OR c.ParentCommodityID = CommData.ParentCommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is not null and CommData.ParentCommodityID <> 0)
UNION
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and
(c.CommodityID = CommData.CommodityID OR c.ParentCommodityID = CommData.CommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is null or CommData.ParentCommodityID = 0)
UPDATE CommodityRawData SET IsSelected = 1, IsTreated = 0
FROM CommodityRawData c inner join @TempData t
ON c.ID = t.ID
SELECT c.* FROM CommodityRawData c inner join @TempData t ON c.ID = t.ID
Test Result
The original query with CURSOR
took 2 hrs to process a set of rows in a 11 MB file. The changed query using WITH
and JOIN
processed the same data within 10 minutes. Hence, using JOIN
gives much better performance than using CURSOR
.
Learning
Important learning is to use join
s wherever possible instead of cursor
s. Cursor
s are recommended only when there is complete business logic which absolutely cannot be accomplished by join
s. Also, we should ensure that the join
based approach returns the exact same result set returned by cursor
s based approach.