Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Dimension loading with executing SQL tasks in SSIS for SCD type 1 and type 2

4.95/5 (5 votes)
29 Sep 2013CPOL3 min read 31.9K  
Dimension loading using SQL task gives better performance than SCD transformation.

Introduction

Dimension table loading in BI environment using Execute SQL Task gives better performance than SCD transformation in SSIS Transformation.

  • SCD stands for Slow Changing Dimension 
  • This code has been tested for SQL Server 2008 

Why it is needed?

In a BI environment there are several methods to load the Dimension table, e.g., by using Slow Changing Dimension transformation or by using merge join transformation etc., but both methods are perfect for small amounts of data because these transformations are asynchronous and they perform operation for each row of data source, so suppose we have millions of records in the data source it will take a long time to load the dimension table and I think your business will not be allowed to do so. Hence we can take a simple Execute SQL Task in SSIS to decrease the processing time even when we have millions of records in the data source.

There are three types of Slow Changing Dimensions: SCD Type 1 (changing attributes) that means no history will be maintained for the column(s) which are defined as SCD type 1; e.g., phone number of employee. Second is SCD type 2 (historical attributes) that means history will be maintained by using the start date / end date or by using version number; e.g., when any Employee's work location is changed, we will  maintain the history of the old employee's location by using the start date/ end date. And third SCD type is what we are considering in this article.

We can also maintain both SCD type 1 and SCD type 2 while loading the Dimension table by using this method (i.e., Execute SQL Task Transformation in SSIS).

Understanding the scripts that will be run under the Execute SQL Task

For SCD type 2:

For SCD type 2 I have considered the following columns:

Town

CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,Occupation

SQL
----For SCD type 2 Historical Attribute---
Declare @tempHistoricalAttribute As Table(CustomerId Int)
 
Insert Into @tempHistoricalAttribute(CustomerId)
select src.CustomerId from 
(
select CustomerId,town from Customer
EXCEPT
select CustomerId,town from CustomerUpdate Where EndDate Is NULL
) src 
Inner Join CustomerUpdate dst On src.CustomerId = dst.CustomerId
Where dst.EndDate Is NULL
 
Update CustomerUpdate Set EndDate = GETDATE()  from 
 @tempHistoricalAttribute  src
Inner Join CustomerUpdate dst On src.CustomerId = dst.CustomerId
Where dst.EndDate Is NULL
 
Insert Into CustomerUpdate(CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,StartDate,EndDate)
Select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,GETDATE(),null From Customer where CustomerId in 
(Select customerid from  @tempHistoricalAttribute) 

For SCD type 1:

For SCD type 1 I have considered the following columns:

CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,Occupation  

SQL
-- To Maintain SCD type 1 Changing Attribute----
Update CustomerUpdate SET  
CustomerName=src.CustomerName,DateOfBirth=src.DateOfBirth,
TelephoneNo = src.TelephoneNo,DrivingLicenceNo = src.DrivingLicenceNo,
Occupation = src.Occupation
from
(
select CustomerId,CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,
Occupation from Customer
EXCEPT
select CustomerId,CustomerName,DateOfBirth,TelephoneNo,DrivingLicenceNo,
Occupation from CustomerUpdate where EndDate Is Null
)src 
INNER JOIN CustomerUpdate dest ON src.CustomerId = dest.CustomerId
where dest.EndDate Is Null

For new record:

SQL
--To Insert new Record------
Insert Into CustomerUpdate(CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,
DrivingLicenceNo,Occupation,StartDate,EndDate)
select src.CustomerId,src.CustomerName,src.DateOfBirth,src.Town,
src.TelephoneNo,src.DrivingLicenceNo,src.Occupation,GETDATE(),NULL 
from
(
select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,DrivingLicenceNo,
Occupation from Customer
EXCEPT
select CustomerId,CustomerName,DateOfBirth,Town,TelephoneNo,DrivingLicenceNo,
Occupation from CustomerUpdate where EndDate Is Null
)src

Important thing to note

The script should run in the following sequence. Execute SQL Task will run the script in the following sequence:

  • Seq 1: SCD type 2 script 
  • Seq 2: SCD type 1 script 
  • Seq 3: New record script 

Create project 

Table script

SQL
if exists (select * from sys.tables where name = 'Customer')
drop table Customer
go
create table Customer
( 
CustomerId int not null primary key,
CustomerName varchar(30), DateOfBirth date, Town varchar(50),
TelephoneNo varchar(30), DrivingLicenceNo varchar(30), Occupation varchar(30)
)
go
 
-- Populate Customer Source
 
declare @i int, @si varchar(10), @startdate date
set @i = 1
while @i <= 10
begin
 set @si = right('0'+CONVERT(varchar(10), @i),2)
 insert into Customer
 ( CustomerId, CustomerName, DateOfBirth, Town, TelephoneNo, DrivingLicenceNo, Occupation)
 values
 ( @i, 'Customer'+@si, DATEADD(d,@i-1,'2000-01-01'), 'Town'+@si, 
       'Phone'+@si, 'Licence'+@si, 'Occupation'+@si)
 set @i = @i + 1
end
go
 
-- Create customer table
 
if exists (select * from sys.tables where name = 'CustomerUpdate')
drop table CustomerUpdate
go
 
create table CustomerUpdate
(
Dim_CustomerId int identity(1,1), 
CustomerId int not null,
CustomerName varchar(30), DateOfBirth date, Town varchar(50),
TelephoneNo varchar(30), DrivingLicenceNo varchar(30), Occupation varchar(30),
StartDate DateTime,EndDate Datetime
)
go
 
-- Populate Customer Destination
 
declare @i int, @si varchar(10), @startdate date
set @i = 1
while @i <= 5
begin
 set @si = right('0'+CONVERT(varchar(10), @i),2)
 insert into CustomerUpdate
 ( CustomerId, CustomerName, DateOfBirth, Town, TelephoneNo, 
        DrivingLicenceNo, Occupation,StartDate,EndDate)
 values
 ( @i, 'Customer'+@si, DATEADD(d,@i-1,'2000-01-01'), 'Town'+@si, 
         'Phone'+@si, 'Licence'+@si, 'Occupation'+@si,GETDATE(),NULL)
 set @i = @i + 1
end
go 

Image 1

Step 1. Create SSIS Package 

Open SQL Server Business Intelligence Development Studio and create New project as SQL Server Integration Services, after adding a new package.

Image 2

 Step 2. Add Execute SQL task from toolbox, rename as Load Dimension, then edit SQL task. Then set the connection credential and database in the Connection property.  

Image 3

Step 3: Click on button in SQL statement property and then write the SQL script:

Image 4

Test the Package

Test Step 1:

Run package and see the result :

5 new records have been inserted from customerid 6 to 10.

Image 5

Test Step 2:  

Update record using script:

SQL
--for SCD type 1-- 
UPDATE [Customer]
   SET [CustomerName] = 'aaa1'
      ,[DateOfBirth] = '2012-01-01'
 WHERE [CustomerId] =1
GO
 
--for SCD type 2--
UPDATE [Customer]
 SET [Town] = 'ttt1'
 WHERE [CustomerId] =2 

Now run package again and you will see the result:

CustomerName and dateOfBirth record will be updated for customer id = 1 but no new record will be inserted because of SCD type 1.

Moreover, end date will be updated for customer id =2 and a new record will be inserted for customer id =2 but new surrogate key will be generated, i.e., Dim_CustomerId = 11 with end date as null because of SCD type 2.

Image 6

Points of interest

This approach performs better than SCD transformation and Merge Join used for dimension loading in BI environments.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)