Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Replacing Cursors with Set-Based SQL Queries – Part 2

5.00/5 (1 vote)
22 Jul 2011CPOL2 min read 10.6K  
This is part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.

This is part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a recent post, I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.

The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.

Here is the original cursor which took about 20 minutes to run:

SQL
DECLARE  CornerCursor   CURSOR FAST_FORWARD FOR
	SELECT 
		StudentNumber,
		SchoolCode,
		CurTransEligibility,
		GradeLevel,
		GeoCode,
		HomeZone,
		BilingualCode,
		SpedCode
	FROM 
		--  [Tables for getting the student data]
 
OPEN CornerCursor
 
FETCH NEXT FROM CornerCursor INTO 
	@StudentNumber,
	@SchoolCode,
	@CurTransEligibility,
	@GradeLevel,
	@GeoCode,
	@HomeZone,
	@BilingualCode,
	@SpedCode
 
WHILE @@FETCH_STATUS = 0
BEGIN
	-- For the grade level, see if the student is ineligible 
              -- for transportation [too close to school].
	IF EXISTS(SELECT Sch FROM TransWalkingDistance 
                       WHERE SchCode = @SchoolCode 
                           AND GeoCode = @GeoCode 
                           AND GradeLevel = @GradeLevel)
	BEGIN
		UPDATE 
		      EnrollmentRecord
		SET
		      TransCode = '',	
                                 LastUpdateUserID= 'CScrpW' -- ineligible
		WHERE 
		      StudentNo = @StudentNo 
                                  AND SchCode = @SchoolCode
 
		GOTO FetchNext
	END 
 
              IF ((rtrim(@BilingualCode) = '' 
                   OR RTRIM(isnull(@BilingualCode,''))  = 'S')
                   AND (@SpedCode = '' 
                           OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R') 
                           OR @SpedCode IN ('V1', 'V2', 'P1', 'P2'))) 
	BEGIN
		IF @GradeLevel <> 'H' 
			AND NOT EXISTS
			(SELECT School FROM BufferZoneLookup 
			      WHERE School = @School  
                                               AND GradeLevel = @GradeLevel 
			      AND (GeoCode = @GeoCode 
                                               OR HomeZone IN (@HomeZone, 'C','H')))
		BEGIN
			-- Code here to run a similar update to the update 
                                         -- statement above, but setting the transportation
			-- code to 'C0RR' and the LastUpdateUser to 'CScrpZ'
		END
 
		GOTO FetchNext
	END
 
	-- Two additional update statements here, each with different 
              -- conditions.  There's a default one as well if none of the above 
              -- updates were used.
 
	FetchNext:
		FETCH NEXT FROM CornerCursor INTO 
		@StudentNumber,
		@SchoolCode,
		@CurTransEligibility,
		@GradeLevel,
		@GeoCode,
		@HomeZone,
		@BilingualCode,
		@SpedCode
	END
END
CLOSE CornerCursor
DEALLOCATE CornerCursor

Notice with the above SQL that there are a sequence of update statements surrounded by if/else conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:

  1. Encapsulate the if/else conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set.
  2. Use the temp table with the bit field settings to determine what the transportation eligibility code should be. The update statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case statement matches the order of the updates in the cursor.
  3. Use the transportation codes set in #2 to actually do the updates.

Here’s the resulting SQL:

SQL
-- This query will populate a table with different bit fields 
-- representing the if/else conditions used throughout the cursor.  
-- So essentially this is centralizing disperse if/else conditions into 
-- one record for each student that can be then used by the next query.
SELECT
	studentdata.StudentNumber,
	studentdata.SchoolCode,
	studentdata.TransCode,
	studentdata.GradeLevel,
	studentdata.BilingualCode,
	studentdata.SpedCode,
	studentdata.GeoCode,
	studentdata.HomeZone,
	(CASE WHEN studentdata.sch IN('4261','1010','1020','1340')
		THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone', -- Citywide
              (CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END) 
                                                            AS 'InZoneOrBufferZone',
	(CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END)
                                                            AS 'IsInWalkingDistance',
	(CASE WHEN studentdata.grade IN ('K0', 'K1') 
                     AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END) 
                                                            AS 'IsHalfDayKindergarten',
	-- Bilingual non-spanish students are always transportation eligible.
	(CASE WHEN NOT isnull(studentdata.BilingualCode, '') = '' 
                  AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END)
                                                            AS 'BilingualIneligibleForC0RR',
	-- A few additional case statement options omitted [SpedIneligibleForC0RR].
INTO #StudentsToProcess
FROM
	--  [List of tables] similar to the one in the cursor definition.   
 
-- Use the bit fields to set what the transportation code would 
-- be for the student.  The case statement to set PropsedTransCodeBasedOnData 
-- runs in the same precedence order as the original cursor.  In other words, 
-- order is important to keep the results the same.
SELECT  studentdata.studentno, studentdata.transcode AS 'curTransCode',
	-- in the walkeligible range
	(CASE WHEN IsInWalkingDistance = 1 THEN ''
	-- Set out of zone.  High school grades and some other 
              -- categories cannot be out of zone.
	WHEN BilingualIneligibleForC0RR = 0
	    AND NOT sys.grade IN('09','10','11','12') 
	    AND InZoneOrBufferZone = 0 
	    AND CanNotBeOutOfZone = 0 THEN 'C0RR'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') = '') THEN 'C5*R'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') <> '') THEN sys.transcode
	ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData'
INTO #ProposedTranscodeChanges
FROM 
	-- [List of tables] 
 
-- Run the update - set trans eligibility codes [c5rr, c0rr, etc].  
-- Each update used a different user ID tag, so set this appropriately.
UPDATE studentdata SET transcode = propsedTransCodeBasedOnData,
LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '') 
              WHEN 'c5rr' THEN 'CScrpC'
	WHEN 'c0rr' THEN 'CScrpZ' 
              WHEN '' THEN 'CScrpW' 
              WHEN 'C5*R' THEN 'CScrpK' 
              ELSE 'CScrpX' END)
FROM 
	-- [List of tables]

I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!

Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)