Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

SSIS: Load Data in Dimension Table by Merge Join

5.00/5 (6 votes)
13 Jan 2013CPOL4 min read 70.2K   1.2K  
How to load data from an Excel file to load in a dimension table in an incremental way.

Introduction

In this tutorial I will show how to load data from an Excel file to load in a dimension table in an incremental way.

Background

Each time you want to populate your dimension table data you need to truncate the destination and reload from the source. But if the table size is huge then it is not the best solution. So instead of delete insert, we will populate data by insert new row and update changed row.

What you need

This script was tested in SQL Server 2008.

Create project

First create a database and table from script.

SQL
use master
go

if db_id ('bi_Stage') is not null
drop database bi_Stage;
go

create database bi_Stage 
on primary (name = 'bi_Stage'
, filename = 'E:\BI\DB\Stage\bi_Stage.mdf'
, size = 5 MB, filegrowth = 5 MB)
log on (name = 'bi_Meta_log'
, filename = 'E:\BI\DB\Stage\bi_Stage_log.ldf'
, size = 2 MB, filegrowth = 512 KB)
collate SQL_Latin1_General_CP1_CI_AS
go

alter database bi_Stage set recovery simple 
go
alter database bi_Stage set auto_shrink off
go
alter database bi_Stage set auto_create_statistics off
go
alter database bi_Stage set auto_update_statistics off
go


SET ANSI_NULLS ON
GO

use bi_Stage
go

if exists 
  (select * from sys.tables 
   where name = 'chart_of_account')
drop table chart_of_account
go

BEGIN
CREATE TABLE [dbo].[chart_of_account](
	[account_id] [int] NULL,
	[parent_account_id] [int] NULL,
	[account_code] [nvarchar](50) NULL,
	[account_name] [nvarchar](150) NULL,
	[account_type] [nvarchar](50) NULL
) ON [PRIMARY];
END
GO
SET ANSI_NULLS OFF
GO

Then open SQL Server Business Intelligence Development Studio.

Image 1

Then go to File->New->Project and select Integration Service Project.

Image 2

Select "Data Flow Task" from "Control Menu" and drag it on "Control Flow" tab. Then double click it.

Image 3

Now select "Excel Source" from "Data Flow Sources" and drag it on "Data Flow" tab. Then double click it to assign file source.

Image 4

Image 5

Image 6

Image 7

Image 8

Click OK and then again click OK.

Select "Data Conversion" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Excel Source". Then double click it. We need to convert data which is coming from source.

Image 9

Rename "Output Alias" field, change "Data Type" value from default value to add prefix "src_" to distinguish, and change "Length". Click OK.

Image 10

Select "OLE DB Source" from "Data Flow Transformation" and drag it on "Control Flow" tab. Then double click it. We need to assign data source.

Image 11

Click New button for create New connection or select from already existing connection.

Image 12

Click New button.

Image 13

Assign "Server Name", select "Authentication", select "Database", and click "OK".

Image 14

Image 15

From left tab select "Column" and then click OK.

Image 16

Select "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Data Conversion". Then double click it.

Image 17

Select column which needs to be sorted.

Image 18

Select another "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "OLE DB Source". Then double click it. Select column which needs to be sorted.

Image 19

Select "Merge Join" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Sort".

Image 20

Select Join Type as "Left outer join" and select all fields from left panel and all fields from right panel. Click OK. Connect "Merge Join" from "Sort Dest".

Image 21

Select "Conditional Split" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Merge Join". Right-click the Conditional Split and click Edit to open the Conditional Split Editor.

Image 22

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <<expression>> )" function to the Condition column of the New Rows condition:

Next, drag account_id from the columns folder and drop it onto the "<<expression>>" text in the Condition column. "New" should now be defined by the condition "ISNULL( [account_id] )". This defines the WHERE clause for new rows - setting it to "WHERE account_id Is NULL".

Type "Update" into a second Output Name column. Add the expression "((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))" to the Condition column for the Update Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))". Note "||" is used to convey "OR" in SSIS Expressions:

Change the "Default output name" from "Conditional Split Default Output" to "Unchanged".

Image 23

Select "OLE DB Destination" from "Data Flow Destination" and Drag it on "Control Flow" tab.

Image 24

Connect it from "Conditional Split" and select "New" and assign connection and map field.

Image 25

Image 26

Image 27

Select "OLE DB Command" from "Data Flow Transformation" and Drag it on "Control Flow" tab.

Image 28

Connect it from "Conditional Split" and select "Update" and assign "Connection Manager" and map field.

Image 29

Image 30

Image 31

Image 32

Image 33

Click OK. Now we are ready to execute the SSIS.

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Image 34

Now Execute the command to test:

SQL
SELECT [account_id]
      ,[parent_account_id]
      ,[account_code]
      ,[account_name]
      ,[account_type]
  FROM [bi_Stage].[dbo].[chart_of_account]

Image 35

Now I have changed some data in Excel file and again execute the package with debugging (press F5), the package should succeed and appear as shown here:

Image 36

Image 37

Conclusion

I have used one more "Sort Dest" for sorting destination table. I think it will be better if we use "ORDER BY" clause in "OLE DB Source" rather than use "Sort". Performance can be varied. I hope this might be helpful to you!

History

None so far.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)