One common problem is the need to have data on a result set row from another row in the same result set. For example, in order to calculate an interval between dates on two separate rows, typically a self join is needed. This requires to add the same table (or set) twice into the query. With a simple query, this won't be any problem but if the result set is created using a more complex query, creating the same complexity twice can cause problems and of course create performance problems.
In SQL Server 2012, two functions with windowing support are introduced to address this problem: LAG
(to look backwards) and LEAD
(to look forward). Both can be instructed to look over several records, not just the previous or next.
First we need a simple table with some test data:
CREATE TABLE LagTest (
Event VARCHAR(100) NOT NULL,
AddTime DATETIME NOT NULL
);
And some data with random dates in the future:
INSERT INTO LagTest VALUES ('E#1', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#2', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#3', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#4', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#5', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#6', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#7', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#8', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#9', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
INSERT INTO LagTest VALUES ('E#10', CURRENT_TIMESTAMP + ABS( RAND() ) * 10);
Now, in order to fetch the following information:
- What row I'm on?
- What was the previous event?
- What's the datetime on previous row?
- What's the interval in hours between the previous and this event?
- What event pair is going to be next?
The query could be like:
SELECT sub.Event AS ActualRow,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY sub.AddTime) = 1 THEN 'First event'
ELSE sub.PrevEvent + ' -> ' + sub.Event
END AS LookBack,
sub.PrevTime,
DATEDIFF( hour, sub.PrevTime, sub.AddTime) AS IntervalInHours,
CASE
WHEN sub.NextEvent IS NOT NULL
THEN 'Next: ' + sub.Event + ' -> ' + sub.NextEvent
ELSE 'Last event'
END AS LookForward
FROM ( SELECT lt.Event,
lt.AddTime,
LAG(lt.Event, 1) OVER (ORDER BY lt.AddTime) AS PrevEvent,
LAG(lt.AddTime, 1) OVER (ORDER BY lt.AddTime) AS PrevTime,
LEAD(lt.Event, 1) OVER (ORDER BY lt.AddTime) AS NextEvent
FROM LagTest lt) AS sub
First, an inline view (named sub
) is used to gather the data. Since LAG
and LEAD
are window functions, the ordering must be specified. In this example the AddTime
column defines the order of the rows.
An example output from the inline view could look like the following:
Event AddTime PrevEvent PrevTime NextEvent
----- ------- --------- -------- ---------
E#5 2011-07-29 05:17:44.680 NULL NULL E#1
E#1 2011-07-29 18:30:02.053 E#5 2011-07-29 05:17:44.680 E#4
E#4 2011-08-01 01:39:16.430 E#1 2011-07-29 18:30:02.053 E#10
E#10 2011-08-01 14:32:50.040 E#4 2011-08-01 01:39:16.430 E#2
E#2 2011-08-02 19:30:16.483 E#10 2011-08-01 14:32:50.040 E#8
E#8 2011-08-02 20:20:19.920 E#2 2011-08-02 19:30:16.483 E#9
E#9 2011-08-03 01:23:00.940 E#8 2011-08-02 20:20:19.920 E#6
E#6 2011-08-03 02:09:39.343 E#9 2011-08-03 01:23:00.940 E#7
E#7 2011-08-04 05:30:01.203 E#6 2011-08-03 02:09:39.343 E#3
E#3 2011-08-05 11:16:57.253 E#7 2011-08-04 05:30:01.203 NULL
The main query is basically just formatting, concatenation of event column data, interval calculation etc. The actual output would be like:
ActualRow LookBack PrevTime IntervalInHours LookForward
--------- -------- -------- --------------- -----------
E#5 First event NULL NULL Next: E#5 -> E#1
E#1 E#5 -> E#1 2011-07-29 05:17:44.680 13 Next: E#1 -> E#4
E#4 E#1 -> E#4 2011-07-29 18:30:02.053 55 Next: E#4 -> E#10
E#10 E#4 -> E#10 2011-08-01 01:39:16.430 13 Next: E#10 -> E#2
E#2 E#10 -> E#2 2011-08-01 14:32:50.040 29 Next: E#2 -> E#8
E#8 E#2 -> E#8 2011-08-02 19:30:16.483 1 Next: E#8 -> E#9
E#9 E#8 -> E#9 2011-08-02 20:20:19.920 5 Next: E#9 -> E#6
E#6 E#9 -> E#6 2011-08-03 01:23:00.940 1 Next: E#6 -> E#7
E#7 E#6 -> E#7 2011-08-03 02:09:39.343 27 Next: E#7 -> E#3
E#3 E#7 -> E#3 2011-08-04 05:30:01.203 30 Last event
Having these two functions makes it easier to get data from next or previous rows.