Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2016

A Practical Approach of Time Travel Concept in Business Intelligence Scenarios using Temporal Tables on Data Warehouses

5.00/5 (5 votes)
10 Feb 2016CPOL6 min read 16.6K  
SQL Server 2016 introduces support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

Introduction

When comparing the differences between operational systems and so called business intelligence (BI) solutions, it is a fact that the operational viewpoint is that of the current, latest data values whereas the importance of Business Intelligence provides the ability to analyze historical trends and patterns of data changes over time. So, it is not possible to analyze data if the time context is not defined. The time-space continuum must be defined at least before to analyze several data sources. This data context means that we need to add a “semantic value” for any proposition in study of data.

Data on Context: Semantics

Semantic data is the information that allows machines to understand the meaning of information [Th1998]. It describes the technologies and methods that convey the meaning of the information. This term was minted by Tim Berners-Lee. The semantic data model is a software engineering model based on relationships between stored symbols and the real world [Be2001]. The data is organized in such a way that it can be interpreted meaningfully without human intervention. Semantic data has a history dating back to the 1970s and is currently used in a wide variety of data management systems and applications. Databases designed around this concept have greater applicability and are more easily integrated into other databases. The time line data have a lack of interpretation if you run a query without specific periods and provide more useful information about the prospective of a data. But if we give a “memory” to the group of data, we can introduce at least one variable of control in our mind model, and, of course, useful information.

Using Common Practice in Business Intelligence

A common practice of supporting historical analysis in the Data Warehouse (DW) design is the definition of a Slowly Changing Dimension (SCD). The Data Warehouse provides by itself a historical database as the foundation for business analysis supporting Facts and Dimensions in a “Dimensional Model.” Facts are defined as numeric values that are used to assess the business, such as revenue, costs, profits, salaries, etc. By another part, Dimensions add context to the analysis of facts, for instance, analyzing sales by region or profits by year or average manager salary by department. In these examples, region, year, and department would be dimensions in the Data Warehouse with sales, profits, and salary implemented as facts.

Deep into Code

In the above mentioned scenario, consider that when a new department is assigned to the budget, their expenses of this one should be applied to the whole year average or only the time period that they were assigned? If we choose the former, we are defining an “SCD Type 1,” which only stores current data values and the potential for inaccurate detailed analysis.

For true analysis over time periods, it’s necessary to choose the last option, which requires the careful design of an “SCD Type 2” for historical data. The typical design method of an SCD Type 2 is to provide the ability for multiple rows for a given dimension member to allow for storage of historical changes by the addition of a “surrogate” key. The surrogate key provides uniqueness and allows duplicate values of the business key used in the operational system. Thus it is added also, and it would be a start date/time and an end date/time column for accurate time analysis. The design and population of this type of dimension is usually a complex, manual, time-consuming process.

However, we can build a third option based on the last SCD Type 2 using “Temporal Data”. It could be possible, e.g., with the automation of this process via built-in datatypes called Temporal Tables in SQL Server 2016. This feature is fully compliant with the ANSI SQL 2011 specification.

This one introduces support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time [MS2016]. The following script is an example of a temporal table DML definition.

SQL
DROP TABLE IF EXISTS [dbo].[DimDepartment]
;
CREATE TABLE [dbo].[DimDepartment](
	[DimDepartmentKey] [int] IDENTITY(1,1) NOT NULL,
	[ParentDimDepartmentKey] [int] NULL,
	[DimDepartmentName] [nvarchar](50) NULL,
	[SysStartTime] datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
	[SysEndTime]   datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
	PERIOD FOR SYSTEM_TIME  (SysStartTime , SysEndTime),	
	CONSTRAINT [PK_DimDepartment] PRIMARY KEY CLUSTERED 
(	[DimDepartmentKey] ASC )
WITH (	PAD_INDEX = OFF, 
		STATISTICS_NORECOMPUTE = OFF, 
		IGNORE_DUP_KEY = OFF, 
		ALLOW_ROW_LOCKS = ON, 
		ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
;

ALTER TABLE [dbo].[DimDepartment] 
   SET (SYSTEM_VERSIONING = ON ) 
;

/*** Another relationships ***/

ALTER TABLE [dbo].[DimDepartment]  
WITH CHECK ADD  CONSTRAINT [FK_DimDepartment_DimDepartment] 
FOREIGN KEY([ParentDimDepartmentKey])
REFERENCES [dbo].[DimDepartment] ([DimDepartmentKey])
;

ALTER TABLE [dbo].[DimDepartment]
CHECK CONSTRAINT [FK_DimDepartment_DimDepartment]
;

Once created, the temporal table will automatically track changes and maintain the start and end date and time columns, effectively implementing an SCD Type 2 via the history table while keeping the current table as normal with the latest data values. Both tables are visible in the SSMS Object Explorer, as showed at Figure 1 and they can be queried as separate tables, as required. The anonymous history table name for the above code was MSSQL_TemporalHistoryFor MSSQL_TemporalHistoryFor_1650104919.

Figure 1

The SysEndTime column set to the maximum value for datatype datetime2 to indicate current values as showed in Figure 2.

Figure 2

Changes over rows are stored on System-Versioned Temporal Tables if the optional sentence is applied:

SQL
ALTER TABLE [dbo].[DimDepartment] 
   SET (SYSTEM_VERSIONING = ON) 
;

Temporal data can be used for auditing or for recovering data values after accidental updates. The elements of this table can be copied to a dimension table in a Data Warehouse and at this point, any manipulation using complex processing would not be necessary when incrementally loading data.

To resolve the initial problem, when new department is assigned to the budget and their expenses should be applied only to the time period, first, we have a Fact table with the budget info for every department assignment. So, we’re going to depict this fact with the available code in this script. Here, we are assuming that each table does not exist yet for consistency purposes:

SQL
CREATE TABLE [dbo].[DimDate](
	[DateKey] [int] NOT NULL,
	[FullDateAlternateKey] [date] NOT NULL,
	[DayNumberOfWeek] [tinyint] NOT NULL,
	[EnglishDayNameOfWeek] [nvarchar](10) NOT NULL,
	[SpanishDayNameOfWeek] [nvarchar](10) NOT NULL,
	[FrenchDayNameOfWeek] [nvarchar](10) NOT NULL,
	[DayNumberOfMonth] [tinyint] NOT NULL,
	[DayNumberOfYear] [smallint] NOT NULL,
	[WeekNumberOfYear] [tinyint] NOT NULL,
	[EnglishMonthName] [nvarchar](10) NOT NULL,
	[SpanishMonthName] [nvarchar](10) NOT NULL,
	[FrenchMonthName] [nvarchar](10) NOT NULL,
	[MonthNumberOfYear] [tinyint] NOT NULL,
	[CalendarQuarter] [tinyint] NOT NULL,
	[CalendarYear] [smallint] NOT NULL,
	[CalendarSemester] [tinyint] NOT NULL,
	[FiscalQuarter] [tinyint] NOT NULL,
	[FiscalYear] [smallint] NOT NULL,
	[FiscalSemester] [tinyint] NOT NULL,
 CONSTRAINT [PK_DimDate_DateKey] PRIMARY KEY CLUSTERED 
(
	[DateKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, _
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [AK_DimDate_FullDateAlternateKey] UNIQUE NONCLUSTERED 
(
	[FullDateAlternateKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, _
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
;

CREATE TABLE [dbo].[DimAccount](
	[AccountKey] [int] IDENTITY(1,1) NOT NULL,
	[ParentAccountKey] [int] NULL,
	[AccountCodeAlternateKey] [int] NULL,
	[ParentAccountCodeAlternateKey] [int] NULL,
	[AccountDescription] [nvarchar](50) NULL,
	[AccountType] [nvarchar](50) NULL,
	[Operator] [nvarchar](50) NULL,
	[CustomMembers] [nvarchar](300) NULL,
	[ValueType] [nvarchar](50) NULL,
	[CustomMemberOptions] [nvarchar](200) NULL,
 CONSTRAINT [PK_DimAccount] PRIMARY KEY CLUSTERED 
(
	[AccountKey] ASC
)WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
;

CREATE TABLE [dbo].[DimOrganization](
	[OrganizationKey] [int] IDENTITY(1,1) NOT NULL,
	[OrganizationName] [nvarchar](50) NULL
CONSTRAINT [PK_DimOrganization] PRIMARY KEY CLUSTERED 
(	[OrganizationKey] ASC)
WITH (
PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, 
IGNORE_DUP_KEY = OFF,     
ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
;

CREATE TABLE [dbo].[DimScenario](
	[ScenarioKey] [int] IDENTITY(1,1) NOT NULL,
	[ScenarioName] [nvarchar](50) NULL,
 CONSTRAINT [PK_DimScenario] PRIMARY KEY CLUSTERED 
(	[ScenarioKey] ASC)
WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
;

CREATE TABLE [dbo].[FactBudget](
	[FinanceKey] [int] IDENTITY(1,1) NOT NULL,
	[DateKey] [int] NOT NULL,
	[OrganizationKey] [int] NOT NULL,
	[DepartmentKey] [int] NOT NULL,
	[ScenarioKey] [int] NOT NULL,
	[AccountKey] [int] NOT NULL,
	[Amount] [float] NOT NULL,
	[Date] [datetime] NULL
) ON [PRIMARY]
;

ALTER TABLE [dbo].[FactBudget] 
WITH CHECK ADD  CONSTRAINT [FK_FactBudget_DimAccount_1] FOREIGN KEY([AccountKey])
REFERENCES [dbo].[DimAccount] ([AccountKey])
;

ALTER TABLE [dbo].[FactBudget] CHECK CONSTRAINT [FK_FactBudget_DimAccount_1]
;

ALTER TABLE [dbo].[FactBudget]
WITH CHECK ADD  CONSTRAINT [FK_FactBudget_DimDate_1] FOREIGN KEY([DateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
;

ALTER TABLE [dbo].[FactBudget] CHECK CONSTRAINT [FK_FactBudget_DimDate_1]
GO

ALTER TABLE [dbo].[FactBudget] 
WITH CHECK ADD  CONSTRAINT [FK_FactBudget_DimDepartment_1] 
FOREIGN KEY([DepartmentKey])
REFERENCES [dbo].[DimDepartment] ([DimDepartmentKey])
;

ALTER TABLE [dbo].[FactBudget] CHECK CONSTRAINT [FK_FactBudget_DimDepartment_1]
;

ALTER TABLE [dbo].[FactBudget] 
WITH CHECK ADD  CONSTRAINT [FK_FactBudget_DimOrganization_1] FOREIGN KEY([OrganizationKey])
REFERENCES [dbo].[DimOrganization] ([OrganizationKey])
;

ALTER TABLE [dbo].[FactBudget] CHECK CONSTRAINT [FK_FactBudget_DimOrganization_1]
;

ALTER TABLE [dbo].[FactBudget] 
WITH CHECK ADD  CONSTRAINT [FK_FactBudget_DimScenario_1] FOREIGN KEY([ScenarioKey])
REFERENCES [dbo].[DimScenario] ([ScenarioKey])
;

ALTER TABLE [dbo].[FactBudget] CHECK CONSTRAINT [FK_FactBudget_DimScenario_1]
;

After it has inserted data using the above mentioned tables, it is good practice to review the dates. Here, we can appreciate that the maximum values belongs to the last budget as shown in the GlobalDate column.

Figure 3. Exploring Maximum Date values for Budget Fact Table by Department.

However, all departments had been created recently because the table was created recently. And the following code shows below how do it to query using time-traveling feature:

SQL
SELECT DimDepartmentName, sum(FactBudget.Amount) AS Total
FROM DimDepartment
FOR SYSTEM_TIME
BETWEEN '2016-01-01 00:00:00.0000000' AND '2016-03-01 00:00:00.0000000'
 INNER JOIN [FactBudget] on DimDepartment.DimDepartmentKey = FactBudget.DepartmentKey
WHERE DimDepartmentKey = 7
GROUP BY DimDepartmentName
ORDER BY DimDepartmentName;

And, the follow code warns about if we check the possibility to review the event before this period, we expect for no results.

SQL
    SELECT DimDepartmentName, sum(FactBudget.Amount) AS Total
FROM DimDepartment
FOR SYSTEM_TIME
BETWEEN '2005-01-01 00:00:00.0000000' AND '2005-03-01 00:00:00.0000000'
INNER JOIN [FactBudget] on DimDepartment.DimDepartmentKey = FactBudget.DepartmentKey
WHERE DimDepartmentKey = 7
GROUP BY DimDepartmentName
ORDER BY DimDepartmentName;

Both results are shown in the following figure below:

Figure 4. Checking Time-Traveling query with several periods with respect to the same fact

If an update occurs due to some organizational restructuration, maybe something called “Sales and Marketing” now will be renamed “Global Sales” department after the date of the table was created, it could warns this event with the property of versioning. The MSSQL_TemporalHistoryFor MSSQL_TemporalHistoryFor_1650104919 table is checked to test the old value where valid data against Budget fact table is proven in the period utilized in the clause:

SQL
FOR SYSTEM_TIME BETWEEN '2016-01-01 00:00:00.0000000' AND '2016-03-01 00:00:00.0000000'

The following picture where it illustrates this behavior:

Figure 5. Temporal Table – Using inherited System Versioning.

Periods tested with the property of temporal table include the intervals of:

  1. Before the update:
    SQL
    '2016-01-01 00:00:00.0000000' AND '2016-02-05 00:00:00.0000000'
    
  2. Including the events with the old and new name:
    SQL
    '2016-02-10 01:00:00.0000000' AND '2016-03-01 00:00:00.0000000'
  3. After the update sentence:
    SQL
    '2016-02-10 02:00:00.0000000' AND '2016-03-01 00:00:00.0000000'

The following picture in Figure 6 shows SQL sentence that keeps the context of the data through the time-traveling concept and this proves the correct use of the semantic within BI scenarios.

Figure 6. Example of query using time-traveling concept.

Bibliography

  • [Th1998] Semantics in Databases. THALHEIM, Bernhard; LIBKIN , Leonid. Lecture Notes in Computer Science. Vol: 1358. Springer-Verlag Berlin Heidelberg. Berlin, 1998. Softcover ISBN: 978-3-540-64199-5.
  • [Be2001] Berners-Lee, Tim; James Hendler; Ora Lassila (May 17, 2001). "The Semantic Web A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities". Scientific American 284: 34–43. doi:10.1038/scientificamerican0501-34.
  • [MD2016] Microsoft Development Network.
    URL: https://msdn.microsoft.com/en-US/library/dn935015.aspx
  • [MS2016] Samples for SQL Server 2016 CTP.
    URL: http://msftdbprodsamples.codeplex.com/releases/view/618193

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)