Introduction
This article describes my solution on how to keep date ranges non-overlapping. I will use SQL instead of pseudo code to make the code practical but it should be easy to code it using a different language.
The Problem
One way of storing temporal or time-based data is by assigning it a date range. An example is a table of product price. The price is stored with the date range when that price is effective.
A common problem with date ranges is how to keep it non-overlapping. Like for the sample product price table, the data, in this case the price, will be ambiguous if there are two ranges that overlap. E.g. If the table contains a record Jan 1, 2009-Jan 30, 2009, 10USD; and another record Jan 15, 2009-Mar 31, 2009, 15USD; what will be the price for Jan 16, 2009?
A Solution
A strategy will be to disallow entry of a range if it will overlap with an existing one. This means existing ranges will need to be manually adjusted first before the new data can be entered. This will be at least cumbersome and at worst error-prone.
My Preferred Solution
My assumption is that the entered data is more valid than the existing ones so users should not trouble themselves with adjustments.
My preferred strategy then is to allow the entry of the range that will overlap (UI), automate the adjustments (SQL), then insert the entered range (SQL). So how do we do this?
Sample Data
Let's start with a sample table:
CREATE TABLE [dbo].[test](
[Start] [datetime] NULL,
[End] [datetime] NULL,
[data] [nvarchar](50) NULL
)
and some data:
INSERT INTO [dbo].[test] VALUES ('Jan 1, 2009','Jan 12, 2009','A')
INSERT INTO [dbo].[test] VALUES ('Jan 13, 2009','Jan 25, 2009','B')
INSERT INTO [dbo].[test] VALUES ('Feb 1, 2009','Feb 28, 2009','C')
INSERT INTO [dbo].[test] VALUES ('Mar 12, 2009','Mar 13, 2009','D')
Querying the table will yield the following:
Start End data
---------- ---------- ----
2009-01-01 2009-01-12 A
2009-01-13 2009-01-25 B
2009-02-01 2009-02-28 C
2009-03-12 2009-03-13 D
Cases 1, 2 and 3
Given the data above, what will happen if the user will enter the following data?
If we look at the dates, we will see that the new range will overlap all of the existing ranges but not quite in the same way.
Case 1: The new range overlaps the start and the end of an existing range
This is true for data B and C. These are no longer valid so we delete them.
delete from [dbo].[test]
where start>=@start
and [end]<=@end
Case 2: The new range overlaps only the end of an existing range
This is true for A. The range is no longer valid but for the start up to the new range start - 1 so the end will be adjusted.
Note that the code below assumes that the code above has already run. If not, then we will need to adjust the where
condition to explicitly exclude ranges that also have their starts within the new range.
update [dbo].[test]
set [end]=@start-1
where [end]>=@start
and [end]<=@end
Case 3: The new range overlaps only the start of an existing range
This is true for D. The range start will need to be adjusted to the end of the new range + 1.
Note that this also assumes that it will be used with the code above. If not, then ranges that also end within the new range must be excluded.
update [dbo].[test]
set start=@end+1
where start>=@start
and start<=@end
After running the three codes, we will insert the new range:
INSERT INTO [dbo].[test] VALUES (@start,@end,@data)
Querying the table will now yield the following:
Start End data
---------- ---------- ----
2009-01-01 2009-01-01 A
2009-01-02 2009-03-12 E
2009-03-13 2009-03-13 D
Case 4: The New Range is Within an Existing Range
Now what if the new range is within an existing one? Let's start with the data above and work on the data below.
set @start='Jan 10, 2009'
set @end='Mar 1, 2009'
set @data='F'
This range F is within the previously added range E. This will make the middle dates invalid. We can correct this by splitting the date range E to two, using a middle point within the new range, then apply cases 2 and 3. Or we can do the following.
Copy the old range with an adjusted end:
insert into [dbo].[test]
select Start,@start-1,data
from [dbo].[test]
where start<@start
and [end]>@end
Adjust the old range's start:
update [dbo].[test]
set start=@end+1
where start<@start
and [end]>@end
And finally, insert the new range:
INSERT INTO [dbo].[test] VALUES (@start,@end,@data)
Querying now will yield:
Start End data
---------- ---------- ----
2009-01-01 2009-01-01 A
2009-01-02 2009-01-09 E
2009-01-10 2009-03-01 F
2009-03-02 2009-03-12 E
2009-03-13 2009-03-13 D
Using the Code
The attached zip file contains all the queries from creating the table to selecting the resulting data. For actual use, I recommend creating a stored procedure with the new range and data as parameters.
Points of Interest
I'm sure that this was tackled before for it is not really a new problem, but I did not find any similar articles (or I was not savvy at searching :) ). Feel free to point me to a link if you find anything related.
Some DBAs might prefer marking data for deletion rather than permanently deleting them, i.e. purging. While others might prefer keeping invalidated data as historical data.
Another solution that comes to mind that allows for historical data is to allow overlaps and just record the date of entry. I think this will be sufficient to get the latest price, i.e. the matching range with the latest entry date, but I will leave that for another article ;).
History
- Created September 24, 2009 by acarpio1975
- Edited September 28, 2009 by acarpio1975