Purpose
The purpose is too provide an efficient method for sorting and paging large datasets on a Microsoft SQL Server 2000 database that does not require the use of concatenated SQL. Primarily, this is a proof-of-concept piece and you will need to do your own performance testing to see if it meets your goals.
Introduction
Ever since I heard about SQL injection attacks and the hope of improving a query's performance and maintainability by using the same query every time, I've been trying to avoid concatenated SQL. At the very least, static SQL tends to be easier to understand -- which generally results in fewer bugs. But, I always had a problem when trying to implement a paging solution that had dynamic sort values.
Then I ran into this article and noticed that the queries using ROWCOUNT
tended to outperform other queries. But getting that to work for a dynamic sort value without concatenation tended to involve a temporary table, and that performed worse than using a CURSOR
when you got past the first couple of pages. So I decided there had to be a way to do it...
The SQL
This stored procedure is built to run on Microsoft's Adventure Works sample database. I just picked a table that had a decent amount of rows. Without further ado, here's the SQL.
CREATE PROCEDURE dbo.up_GetSortedSalesOrdersByPageUsingRowset
@orderedOnStart datetime,
@orderedOnEnd datetime,
@pageNumber int,
@pageSize int,
@sortExpression varchar(100),
@sortOrder varchar(4),
@virtualCount int OUTPUT
AS
IF @pageNumber < 1
BEGIN
SET @pageNumber = 1
END
SELECT
@virtualCount = COUNT(*)
FROM
Sales.SalesOrderHeader Header
WHERE
Header.[OrderDate] >= @orderedOnStart
AND Header.[OrderDate] < @orderedOnEnd
DECLARE @lastKeyValue numeric(18,0)
DECLARE @lastAscendingSortValue SQL_Variant
DECLARE @lastDescendingSortValue SQL_Variant
DECLARE @numberToIgnore int
SET @numberToIgnore = (@pageNumber-1) * @pageSize
IF @numberToIgnore > 0
BEGIN
SET ROWCOUNT @numberToIgnore
SELECT
@lastKeyValue = [UniqueValue],
@lastAscendingSortValue = [AscendingSort],
@lastDescendingSortValue = [DescendingSort]
FROM
(
SELECT
Header.[SalesOrderID] AS [UniqueValue],
CASE
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'CUSTOMERID'
THEN
CONVERT(SQL_Variant, [CustomerID])
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'TOTALDUE'
THEN
CONVERT(SQL_Variant, [TotalDue])
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'ORDERDATE'
THEN
CONVERT(SQL_Variant, [OrderDate])
ELSE
NULL
END AS [DescendingSort],
CASE
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'CUSTOMERID'
THEN
CONVERT(SQL_Variant, [CustomerID])
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'TOTALDUE'
THEN
CONVERT(SQL_Variant, [TotalDue])
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'ORDERDATE'
THEN
CONVERT(SQL_Variant, [OrderDate])
ELSE
NULL
END AS [AscendingSort]
FROM
Sales.SalesOrderHeader Header
WHERE
Header.[OrderDate] >= @orderedOnStart
AND Header.[OrderDate] < @orderedOnEnd
) AS Derived
ORDER BY
[AscendingSort] ASC,
[DescendingSort] DESC,
[UniqueValue] ASC
END
SET ROWCOUNT @pageSize
SELECT
[SalesOrderID],
[OrderDate],
[TotalDue],
[CustomerID]
FROM
(
SELECT
[SalesOrderID],
[OrderDate],
[TotalDue],
[CustomerID],
[SalesOrderID] As [UniqueValue],
CASE
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'CUSTOMERID'
THEN
CONVERT(SQL_Variant, [CustomerID])
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'TOTALDUE'
THEN
CONVERT(SQL_Variant, [TotalDue])
WHEN
UPPER(@sortOrder) = 'DESC'
AND UPPER(@sortExpression) = 'ORDERDATE'
THEN
CONVERT(SQL_Variant, [OrderDate])
ELSE
NULL
END AS [DescendingSort],
CASE
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'CUSTOMERID'
THEN
CONVERT(SQL_Variant, [CustomerID])
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'TOTALDUE'
THEN
CONVERT(SQL_Variant, [TotalDue])
WHEN
UPPER(@sortOrder) = 'ASC'
AND UPPER(@sortExpression) = 'ORDERDATE'
THEN
CONVERT(SQL_Variant, [OrderDate])
ELSE
NULL
END AS [AscendingSort]
FROM
Sales.SalesOrderHeader Header
WHERE
Header.[OrderDate] >= @orderedOnStart
AND Header.[OrderDate] < @orderedOnEnd
) Derived
WHERE
(
@lastAscendingSortValue IS NULL
AND @lastDescendingSortValue IS NULL
AND @lastKeyValue IS NULL
)
OR
(
(@lastAscendingSortValue IS NOT NULL)
AND
(
([AscendingSort] > @lastAscendingSortValue)
OR
(
[AscendingSort] = @lastAscendingSortValue
AND [UniqueValue] > @lastKeyValue
)
)
)
OR
(
(@lastDescendingSortValue IS NOT NULL)
AND
(
([DescendingSort] < @lastDescendingSortValue)
OR
(
[DescendingSort] = @lastDescendingSortValue
AND [UniqueValue] > @lastKeyValue
)
)
)
ORDER BY
[AscendingSort] ASC,
[DescendingSort] DESC,
[SalesOrderID] ASC
SET ROWCOUNT 0
GO
A Quick Caveat
This method requires that your data set has at least 1 unique value (or combination of values) in it per row. I think that tends to be most of what we use these for anyway.
The Explanation
What does it do? It gets a page of sales orders that are in a particular date range and sorted in a particular way.
How does it work? First, the paging. This is described elsewhere is better detail, but the idea behind using ROWCOUNT
is that you ignore a specific series of rows. So, if I'm on the 4th page and I have 10 records on a page, I want to ignore the first 40 rows. How do we do that? ROWCOUNT
specifies the number of rows that will be returned, in our case 40, and we simply store unique information about the last row (row #40) and force the last select
statement to return rows after #40.
As mentioned, this causes some difficulty when you add sorting. A common response to using this method has been, "Well, that works if I hardcode my sort value", or, "Well, that only works if my sort value is unique." So, enter SQL_VARIANT
. SQL_VARIANT
is interesting in that it preserves the data types, not unlike variant types in other languages like JavaScript and Visual Basic. So, we store your sort values and your unique value from the last row. It's important that we have both because what if I want to sort by a field with non-unique values, like [OrderDate]
? In order to guarantee a sequence in SQL, we need to make sure that something unique about the row is included in the ORDER BY
clause. This is the "magic" that gets us around the sort value uniqueness problem.
But what about the hardcoded sort values? Well, that's what those CASE
statements are doing. If you look, you'll see that the values coming out of the CASE
statements are just used in the ORDER BY
. I guess there's a sense in which we've moved from hardcoding a single sort value to hardcoding the available sort values, but that's something I'm willing to live with. If you really need ad hoc queries, there are other methods which will probably suit your needs better. Since SQL_VARIANT
preserves the types of the sort values and we're returning the columns we want to sort on in the derived query, the sort works just like it should. The IS NULL
checks are there because if we're on the first page, those values should be NULL
.
That's It?
You wanted more?!? Personally, this was just something that made me smack myself in the forehead and think, "OF COURSE! THAT'S how you do it." There are some things left to do, however.
- Performance testing really needs to be done on larger sets of data. If anyone wants to, I'll gladly post the results (giving credit where credit is due, of course). I ran this procedure at home on SQL Server 2005 (because that's what I have) and it looked very speedy.
At any rate, I hope this helps some of you. Happy coding!