This is part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.
In a recent post, I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.
The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.
Here is the original cursor which took about 20 minutes to run:
DECLARE CornerCursor CURSOR FAST_FORWARD FOR
SELECT
StudentNumber,
SchoolCode,
CurTransEligibility,
GradeLevel,
GeoCode,
HomeZone,
BilingualCode,
SpedCode
FROM
OPEN CornerCursor
FETCH NEXT FROM CornerCursor INTO
@StudentNumber,
@SchoolCode,
@CurTransEligibility,
@GradeLevel,
@GeoCode,
@HomeZone,
@BilingualCode,
@SpedCode
WHILE @@FETCH_STATUS = 0
BEGIN
IF EXISTS(SELECT Sch FROM TransWalkingDistance
WHERE SchCode = @SchoolCode
AND GeoCode = @GeoCode
AND GradeLevel = @GradeLevel)
BEGIN
UPDATE
EnrollmentRecord
SET
TransCode = '',
LastUpdateUserID= 'CScrpW'
WHERE
StudentNo = @StudentNo
AND SchCode = @SchoolCode
GOTO FetchNext
END
IF ((rtrim(@BilingualCode) = ''
OR RTRIM(isnull(@BilingualCode,'')) = 'S')
AND (@SpedCode = ''
OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R')
OR @SpedCode IN ('V1', 'V2', 'P1', 'P2')))
BEGIN
IF @GradeLevel <> 'H'
AND NOT EXISTS
(SELECT School FROM BufferZoneLookup
WHERE School = @School
AND GradeLevel = @GradeLevel
AND (GeoCode = @GeoCode
OR HomeZone IN (@HomeZone, 'C','H')))
BEGIN
END
GOTO FetchNext
END
FetchNext:
FETCH NEXT FROM CornerCursor INTO
@StudentNumber,
@SchoolCode,
@CurTransEligibility,
@GradeLevel,
@GeoCode,
@HomeZone,
@BilingualCode,
@SpedCode
END
END
CLOSE CornerCursor
DEALLOCATE CornerCursor
Notice with the above SQL that there are a sequence of update
statements surrounded by if
/else
conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:
- Encapsulate the
if
/else
conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set. - Use the
temp
table with the bit field settings to determine what the transportation eligibility code should be. The update
statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case
statement matches the order of the updates in the cursor. - Use the transportation codes set in #2 to actually do the updates.
Here’s the resulting SQL:
SELECT
studentdata.StudentNumber,
studentdata.SchoolCode,
studentdata.TransCode,
studentdata.GradeLevel,
studentdata.BilingualCode,
studentdata.SpedCode,
studentdata.GeoCode,
studentdata.HomeZone,
(CASE WHEN studentdata.sch IN('4261','1010','1020','1340')
THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone',
(CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END)
AS 'InZoneOrBufferZone',
(CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END)
AS 'IsInWalkingDistance',
(CASE WHEN studentdata.grade IN ('K0', 'K1')
AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END)
AS 'IsHalfDayKindergarten',
(CASE WHEN NOT isnull(studentdata.BilingualCode, '') = ''
AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END)
AS 'BilingualIneligibleForC0RR',
INTO #StudentsToProcess
FROM
SELECT studentdata.studentno, studentdata.transcode AS 'curTransCode',
(CASE WHEN IsInWalkingDistance = 1 THEN ''
WHEN BilingualIneligibleForC0RR = 0
AND NOT sys.grade IN('09','10','11','12')
AND InZoneOrBufferZone = 0
AND CanNotBeOutOfZone = 0 THEN 'C0RR'
WHEN (IsHalfDayKindergarten = 1
AND isnull(sys.transcode, '') = '') THEN 'C5*R'
WHEN (IsHalfDayKindergarten = 1
AND isnull(sys.transcode, '') <> '') THEN sys.transcode
ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData'
INTO #ProposedTranscodeChanges
FROM
UPDATE studentdata SET transcode = propsedTransCodeBasedOnData,
LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '')
WHEN 'c5rr' THEN 'CScrpC'
WHEN 'c0rr' THEN 'CScrpZ'
WHEN '' THEN 'CScrpW'
WHEN 'C5*R' THEN 'CScrpK'
ELSE 'CScrpX' END)
FROM
I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!
Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?