I had a situation recently where I had a table full of people records, where the people were divided into families. The business logic that needed to be followed was that I had to assign a “Twin Code” to each record. This meant that for each family in the database, if two or more members were born on the same day, they should be treated as twins. The twins should be assigned a number enumerating them in order of birth. If the member was not a twin, they should just receive the twin code of 1.
Here’s an example table:
PersonID | FamilyID | FirstName | LastName | DateOfBirth |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 |
There are lots of ways to achieve the desired result, but the simplest is to just use a simple SELECT
statement combined with the ROW_NUMBER()
function with a couple parameters as to how to number the rows!
ROW_NUMBER()
provides you with the number of a row in a given recordset, where you provide details on how to number the records. For example, if I just had to number the records above based solely upon the date of birth (ignoring families), then I would use this query:
SELECT
[PersonID]
,[FamilyID]
,[FirstName]
,[LastName]
,[DateOfBirth]
,ROW_NUMBER() over (ORDER BY DateOfBirth) AS Number
FROM
People
ORDER BY
PersonID
This just tells the ROW_NUMBER()
function to order its numbering ascending by DateOfBirth
. Notice that I apply an order myself later on in the query, which is different than the row_number()
order. I would get these results:
PersonID | FamilyID | FirstName | LastName | DateOfBirth | Number |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 | 3 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 | 5 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 | 1 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 | 2 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 | 4 |
The number field that is assigned to each record is in the order of DateOfBirth
.
Ordering my numbering by DateOfBirth
is just half of the picture. I also need to “group” the records by the FamilyID
. This is where a clause in T-, SQL that you might not be very familiar with comes into play: “PARTITION BY
”. The PARTITION BY
clause allows us to group the results within the call to ROW_NUMBER()
without grouping them ourselves via a GROUP BY
. It just tells the ROW_NUMBERR
what groupings to use when it does its counting.
Here is our final SQL statement, which achieves the business logic we wanted to implement.
SELECT
[PersonID]
[FamilyID]
,[FirstName]
,[LastName]
,[DateOfBirth]
,ROW_NUMBER() over(PARTITION BY FamilyID,
CONVERT(NVARCHAR(25), DateOfBirth, 111)
ORDER BY DateOfBirth ASC) TwinCode
FROM [People]
ORDER BY PersonID
IIn the ROW_NUMBER
function above, I am doing several things. I’m grouping on FamilyID
, and also grouping on a converted DateOfBirth
. I convert the DateOfBirth
to an nvarchar
using the 111 conversion code, because that gets results like ‘2009/10/11
′ and ‘2009/10/12
′ which can easily be grouped by to achieve distinct dates.
Grouping on the Family
, DateOfBirth
, and then sorting by DateOfBirth
ascending achieves the desired result for the ROW_NUMBER
. Here are the results of the query:
PersonID | FamilyID | FirstName | LastName | DateOfBirth | TwinCode |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 | 1 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 | 1 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 | 1 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 | 2 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 | 1 |
As you can see, the two people who qualify as twins above (Karly
and Kacy
) are enumerated correctly, with Karly
receiving a 1
and Kacy
receiving a 2
. All the records that are not twins properly receive a 1.