This code snippet demonstrates how to convert a date range into a set of rows in SQL, including handling overlapping date ranges and splitting ranges into day-wise records. It provides two examples: one for splitting date ranges into individual days and another for splitting ranges into overlapping or related ranges. The code includes recursive CTEs and joins to achieve the desired results efficiently.
Background
I had a case where I had to convert a date range into a set of rows. For example, I had a record saying a product discount from 1st June till 10th June. This single row needs to be transformed into 10 rows, one for each day of sales. How could I do this in a fast and scalable manner, since I had thousands and thousands of records which might result in millions? That's not all.
- There could be different types of products
- Overlapping date ranges, leading to more discounts (sum)
Split Ranges to Day Wise Range
Here, we are going to convert the date range to possible day records.
Data
DECLARE @tblDateRange TABLE (
Id INT IDENTITY(1, 1),
TypeId INT,
FromDateTime DATETIME,
ToDateTime DATETIME
)
INSERT INTO @tblDateRange (TypeId, FromDateTime, ToDateTime)
VALUES
(1, '2023-01-01 10:00:00.000', '2023-01-01 10:00:00.000'),
(2, '2023-02-02 00:00:00.000', '2023-02-04 23:59:59.000'),
(3, '2023-03-05 10:00:00.000', '2023-03-06 23:59:59.000'),
(4, '2023-04-07 00:00:00.000', '2023-04-08 21:00:00.000'),
(5, '2023-05-09 11:00:00.000', '2023-05-11 11:00:00.000'),
(6, '2023-06-01 10:00:00.000', '2023-06-01 22:00:00.000');
Split Query
In this split process, we are using recursion to populate new rows inside a CTE.
WITH
DateRanges(Id, LevelNo, [Date], DateWiseStartDateTime, DateWiseEndDateTime)
AS
(
SELECT
p.Id,
1,
CAST(p.FromDateTime AS DATE),
p.FromDateTime,
IIF(DATEADD(SECOND, -1, DATEADD(DAY, DATEDIFF(DAY, 0, p.FromDateTime) + _
1, 0)) < p.ToDateTime, DATEADD(SECOND, -1, _
DATEADD(DAY, DATEDIFF(DAY, 0, p.FromDateTime) + 1, 0)), p.ToDateTime)
FROM @tblDateRange p
UNION ALL
SELECT
c.Id,
p.LevelNo + 1,
CAST(DATEADD(DAY, DATEDIFF(DAY, 0, _
DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0) AS DATE),
DATEADD(DAY, DATEDIFF(DAY, 0, DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0),
IIF(c.ToDateTime < DATEADD(DAY, 1, p.DateWiseEndDateTime), _
c.ToDateTime, DATEADD(DAY, 1, p.DateWiseEndDateTime))
FROM @tblDateRange c
JOIN DateRanges p ON c.Id = p.Id
WHERE DATEADD(DAY, DATEDIFF(DAY, 0, _
DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0) < _
c.ToDateTime
)
SELECT
*
FROM DateRanges
ORDER BY Id, LevelNo
OPTION (MAXRECURSION 30000)
By default, it is going to support 100 recursive calls, and we don't need to use OPTION (MAXRECURSION 30000)
. If it requires more than that, we need to use that OPTION
clause where the max value can be 32767. Use 0 for no limit.
Join Split Rows to Actual Rows
Let's join the split rows with actual data. Here is the new select
query in the CTE:
SELECT
d.*,
r.LevelNo,
r.Date,
r.DateWiseStartDateTime,
r.DateWiseEndDateTime
FROM @tblDateRange d
JOIN DateRanges r ON d.Id = r.Id
ORDER BY d.Id, r.LevelNo
OPTION (MAXRECURSION 30000)
Limitations
Imagine we have ranges for 10 or 20 years. For each year, it's going to generate about 356 rows each year. The end query will be a bit slow. Another way is splitting them into small ranges based on overlapping and nonoverlapping ranges.
Split Ranges to Overlapping/Related Ranges
Here, we are going to convert the date range to all possible small ranges. Here is an idea:
1 12
|-------|-----------------------------|---------|-----------|--------|
2 6
|-----------------------------|
8 10
|-----------|
Input
Range Part
1-12 1
2-6 1
8-10 1
Expected Result
NewRange SUM of Parts
1-2 1
3-6 2
7-8 1
9-10 2
11-12 1
As we can see, here overlapping and nonoverlapping ranges are available.
Data
DECLARE @tableDataRanges TABLE(
Id INT IDENTITY(1, 1),
TypeId INT,
FromDateTime DATETIME,
ToDateTime DATETIME,
Points INTEGER
);
INSERT INTO @tableDataRanges (TypeId, FromDateTime, ToDateTime, Points)
VALUES
(1, '2023-01-13', '2023-01-20 23:59:59', 20),
(1, '2023-02-10', '2023-02-20 23:59:59', 10),
(1, '2023-02-15', '2023-02-25 23:59:59', 10),
(2, '2023-02-10 12:00:00', '2023-02-20 23:00:00', 20),
(2, '2023-02-15 06:00:00', '2023-02-25 23:59:59', 20),
(3, '2023-02-10', '2023-02-20 23:59:59', 30),
(3, '2023-02-05', '2023-02-25 23:59:59', 20);
Split Query
WITH
PosibleStartDateTimes
AS
(
SELECT FromDateTime AS StartDateTime, TypeId, 1 AS Cover
FROM @tableDataRanges
UNION ALL
SELECT DATEADD(SECOND, 1, ToDateTime) AS StartDateTime, TypeId, -1 AS Cover
FROM @tableDataRanges
)
,AggregatedStartDateTimes
AS
(
SELECT StartDateTime, TypeId, SUM(Cover) AS Cover
FROM PosibleStartDateTimes
GROUP BY StartDateTime, TypeId
)
,StartDateTimeToRanges
AS
(
SELECT
StartDateTime AS FromDateTime,
TypeId,
LEAD(StartDateTime) OVER (PARTITION BY TypeId ORDER BY StartDateTime) _
AS ToDateTime,
SUM(Cover) OVER (PARTITION BY TypeId ORDER BY StartDateTime) AS NumberOfParts
FROM AggregatedStartDateTimes
)
,PossibleRanges
AS
(
SELECT
FromDateTime,
DATEADD(SECOND, -1, ToDateTime) AS ToDateTime,
TypeId,
NumberOfParts
FROM StartDateTimeToRanges
WHERE NumberOfParts > 0
)
SELECT
*
FROM PossibleRanges
ORDER BY TypeId, FromDateTime;
We have different TypeId
data in the table. That's why in the OVER
clause, we are using PARTITION BY TypeId
partitioning by row type.
Join Split Rows to Actual Rows
Let's join the split range rows with actual data. Here is the new select
query in the CTE:
SELECT
d.*,
r.NumberOfParts,
r.FromDateTime AS RangeFromDateTime,
r.ToDateTime AS RangeToDateTime
FROM @tableDataRanges d
JOIN PossibleRanges r ON d.TypeId = r.TypeId _
AND (d.FromDateTime <= r.FromDateTime AND d.ToDateTime >= r.ToDateTime)
ORDER BY d.Id, r.FromDateTime;
Others
If there is no type difference, simply we can do a query without using PARTITION BY
:
DECLARE @tableRanges TABLE(
Id INT IDENTITY(1, 1),
FromDateTime DATETIME,
ToDateTime DATETIME,
Points INTEGER
);
INSERT INTO @tableRanges (FromDateTime, ToDateTime, Points)
VALUES
('2018-01-01', '2018-01-31 23:59:59', 80),
('2018-01-07', '2018-01-10 23:59:59', 10),
('2018-01-07', '2018-01-31 23:59:59', 10),
('2018-01-11', '2018-01-31 23:59:59', 5),
('2018-01-25', '2018-01-27 23:59:59', 5),
('2018-02-02', '2018-02-23 23:59:59', 100);
WITH
PosibleStartDateTimes
AS
(
SELECT FromDateTime AS StartDateTime, Points, 1 AS Cover
FROM @tableRanges
UNION ALL
SELECT DATEADD(SECOND, 1, ToDateTime) AS StartDateTime, -1 * Points, -1 AS Cover
FROM @tableRanges
)
,AggregatedStartDateTimes
AS
(
SELECT StartDateTime, SUM(Points) AS Points, SUM(Cover) AS Cover
FROM PosibleStartDateTimes
GROUP BY StartDateTime
)
,StartDateTimeToRanges
AS
(
SELECT
StartDateTime AS FromDateTime,
LEAD(StartDateTime) OVER (ORDER BY StartDateTime) AS ToDateTime,
SUM(Points) OVER (ORDER BY StartDateTime) AS Points,
SUM(Cover) OVER (ORDER BY StartDateTime) AS NumberOfParts
FROM AggregatedStartDateTimes
)
,PossibleRanges
AS
(
SELECT
FromDateTime,
DATEADD(SECOND, -1, ToDateTime) AS ToDateTime,
Points,
NumberOfParts
FROM StartDateTimeToRanges
WHERE NumberOfParts > 0
)
SELECT
*
FROM PossibleRanges
ORDER BY FromDateTime;
References
Conclusion
Nonrecursive PARTITION BY
query was the fastest of them all
Code/Environment
Find the same code .sql file as the zip, if things not working as expected let me know. Tested code in Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64).
History
- 1st July, 2023: Initial version