Introduction
Dimension table loading in BI environment using Execute SQL Task gives better performance than SCD transformation in SSIS Transformation.
- SCD stands for Slow Changing Dimension
- This code has been tested for SQL Server 2008
Why it is needed?
In a BI environment there are several methods to load the Dimension table, e.g., by using Slow Changing Dimension transformation or by using merge join transformation etc., but both methods are perfect for small amounts of data because these transformations are asynchronous and they perform operation for each row of data source, so suppose we have millions of records in
the data source it will take a long time to load the dimension table and I think your business will not
be allowed to do so. Hence we can take a simple Execute SQL Task in SSIS to decrease
the processing time even when we have millions of records in the data source.
There are three types of Slow Changing Dimensions: SCD Type 1 (changing attributes) that means no history will be maintained for the column(s) which
are defined as SCD type 1; e.g., phone number of employee. Second is SCD type 2 (historical attributes) that means history will be maintained by using
the start date / end date or by using version number; e.g., when any Employee's work location
is changed, we will maintain the history of the old employee's location by using the
start date/ end date. And third SCD type is what we are considering in this article.
We can also maintain both SCD type 1 and SCD type 2 while loading the Dimension table by using this method (i.e., Execute SQL Task Transformation in SSIS).
Understanding the scripts that will be run under the Execute SQL Task
For SCD type 2:
For SCD type 2 I have considered the following columns:
Town
CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,Occupation
Declare @tempHistoricalAttribute As Table(CustomerId Int)
Insert Into @tempHistoricalAttribute(CustomerId)
select src.CustomerId from
(
select CustomerId,town from Customer
EXCEPT
select CustomerId,town from CustomerUpdate Where EndDate Is NULL
) src
Inner Join CustomerUpdate dst On src.CustomerId = dst.CustomerId
Where dst.EndDate Is NULL
Update CustomerUpdate Set EndDate = GETDATE() from
@tempHistoricalAttribute src
Inner Join CustomerUpdate dst On src.CustomerId = dst.CustomerId
Where dst.EndDate Is NULL
Insert Into CustomerUpdate(CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,StartDate,EndDate)
Select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,GETDATE(),null From Customer where CustomerId in
(Select customerid from @tempHistoricalAttribute)
For SCD type 1:
For SCD type 1 I have considered the following columns:
CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,Occupation
Update CustomerUpdate SET
CustomerName=src.CustomerName,DateOfBirth=src.DateOfBirth,
TelephoneNo = src.TelephoneNo,DrivingLicenceNo = src.DrivingLicenceNo,
Occupation = src.Occupation
from
(
select CustomerId,CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,
Occupation from Customer
EXCEPT
select CustomerId,CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,
Occupation from CustomerUpdate where EndDate Is Null
)src
INNER JOIN CustomerUpdate dest ON src.CustomerId = dest.CustomerId
where dest.EndDate Is Null
For new record:
Insert Into CustomerUpdate(CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,StartDate,EndDate)
select src.CustomerId,src.CustomerName,src.DateOfBirth,src.Town,
src.TelephoneNo,src.DrivingLicenceNo,src.Occupation,GETDATE(),NULL
from
(
select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,DrivingLicenceNo,
Occupation from Customer
EXCEPT
select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,DrivingLicenceNo,
Occupation from CustomerUpdate where EndDate Is Null
)src
Important thing to note
The script should run in the following sequence. Execute SQL Task will run
the script in the following sequence:
- Seq 1: SCD type 2 script
- Seq 2: SCD type 1 script
- Seq 3: New record script
Create project
Table script
if exists (select * from sys.tables where name = 'Customer')
drop table Customer
go
create table Customer
(
CustomerId int not null primary key,
CustomerName varchar(30), DateOfBirth date, Town varchar(50),
TelephoneNo varchar(30), DrivingLicenceNo varchar(30), Occupation varchar(30)
)
go
declare @i int, @si varchar(10), @startdate date
set @i = 1
while @i <= 10
begin
set @si = right('0'+CONVERT(varchar(10), @i),2)
insert into Customer
( CustomerId, CustomerName, DateOfBirth, Town, TelephoneNo, DrivingLicenceNo, Occupation)
values
( @i, 'Customer'+@si, DATEADD(d,@i-1,'2000-01-01'), 'Town'+@si,
'Phone'+@si, 'Licence'+@si, 'Occupation'+@si)
set @i = @i + 1
end
go
if exists (select * from sys.tables where name = 'CustomerUpdate')
drop table CustomerUpdate
go
create table CustomerUpdate
(
Dim_CustomerId int identity(1,1),
CustomerId int not null,
CustomerName varchar(30), DateOfBirth date, Town varchar(50),
TelephoneNo varchar(30), DrivingLicenceNo varchar(30), Occupation varchar(30),
StartDate DateTime,EndDate Datetime
)
go
declare @i int, @si varchar(10), @startdate date
set @i = 1
while @i <= 5
begin
set @si = right('0'+CONVERT(varchar(10), @i),2)
insert into CustomerUpdate
( CustomerId, CustomerName, DateOfBirth, Town, TelephoneNo,
DrivingLicenceNo, Occupation,StartDate,EndDate)
values
( @i, 'Customer'+@si, DATEADD(d,@i-1,'2000-01-01'), 'Town'+@si,
'Phone'+@si, 'Licence'+@si, 'Occupation'+@si,GETDATE(),NULL)
set @i = @i + 1
end
go
Step 1. Create SSIS Package
Open SQL Server Business Intelligence Development Studio and create New project as SQL
Server Integration Services, after adding a new package.
Step 2. Add Execute SQL task from toolbox, rename as Load Dimension, then
edit SQL task. Then set the connection credential and database in the Connection property.
Step 3: Click on button in SQL statement property and then write the
SQL script:
Test the Package
Test Step 1:
Run package and see the result :
5 new records have been inserted from customerid 6 to 10.
Test Step 2:
Update record using script:
UPDATE [Customer]
SET [CustomerName] = 'aaa1'
,[DateOfBirth] = '2012-01-01'
WHERE [CustomerId] =1
GO
UPDATE [Customer]
SET [Town] = 'ttt1'
WHERE [CustomerId] =2
Now run package again and you will see the result:
CustomerName and dateOfBirth record will be updated for customer id = 1 but no new record will be inserted because of SCD type 1.
Moreover, end date will be updated for customer id =2 and a new record will be inserted for customer id =2 but new surrogate key
will be generated, i.e., Dim_CustomerId = 11
with end date as null because of SCD type 2.
Points of interest
This approach performs better than SCD transformation and Merge
Join used for dimension loading in BI environments.