Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Use SQL Server's Ranking Functions Wisely

4.88/5 (5 votes)
3 May 2013CPOL3 min read 41K  
Beware of What We Ask Them to Do

Introduction

SQL Server's Ranking functions can be very useful and handy if we use them in a right way and thoughtfully; otherwise, they become evil very fast! In this tip I will talk about my experience with NTILE function but the same story can be expanded to all other ranking functions.

Background

  After spending several hours on why a script was generating different results against same sets of data, I discovered the evil. It was a set of unwise NTILE function calls. For those who may not be familiar with this function, please check this MSDN page. All rank functions contain an over clause which tells the function which column(s) to be used as reference.

The Problem

Here is an example for NTILE and how it becomes evil -

Imagine the following table of data (named MyTable):

NameCost
C1
B1
D4
A1
F3
E2

I picked simle names A to F in order to make it easier to follow the scenario.

Now assume we have the following script:

SQL
SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost]) AS [Rank] FROM [MyTable]

Running the script agains MyTable you will have the following result set:

NameCostRank
C11
B12
A13
E24
F35
D46

As you can see, NTILE does exactly what it is asked to do. Let's run the same script against the same table with slightly different data order:

NameCost
B1
A1
E2
C1
D4
F3

As you can see, the tables contents are the same but only the order of data has changed. After running the very same script, you will get the following result set:

NameCostRank
B11
A12
C13
E24
F35
D46

Now, let's compare the rankings in two result sets side by side:

Result 1Result 2
NameRankRank
A32
B21
C13
D44
E55
F66

 Can you see the evil? Running the same script against the same data sets with different data order results in totally different outputs. NTILE by itself is not an evil but the way we assumed it would work was wrong. In script we are asking it to order the Cost column and then group them into 6 groups. The only order it considers is Cost because that's what we asked for. Who to blame?!

How to fix it?

When we use rank functions we need to be very careful. We need to consider many things and check what we are asking them to do wisely. Rank functions can receive several orders. What we need to do is that we need to make sure all required orderings are in place. That way we can make sure the data order does not affect our results. In this case in particular, one way to fix the discrepancy is to add a new order to our NTILE function:

SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost], [Name]) AS [Rank] FROM [MyTable]

 We are adding Name column to the order clause. It does not matter if it as ASC or DESC; what matters is that we will get the same results if we run it against any data set with the same data but different order.

NameCostRank
A11
B12
C13
E24
F35
D46

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)