SSIS: Load Data in Dimension Table by Merge Join

mparvez

5.00/5 (6 votes)

13 Jan 2013CPOL4 min read

70.2K

1.2K

How to load data from an Excel file to load in a dimension table in an incremental way.

Introduction

In this tutorial I will show how to load data from an Excel file to load in a dimension table in an incremental way.

Background

Each time you want to populate your dimension table data you need to truncate the destination and reload from the source. But if the table size is huge then it is not the best solution. So instead of delete insert, we will populate data by insert new row and update changed row.

What you need

This script was tested in SQL Server 2008.

Create project

First create a database and table from script.

SQL

use master
go

if db_id ('bi_Stage') is not null
drop database bi_Stage;
go

create database bi_Stage 
on primary (name = 'bi_Stage'
, filename = 'E:\BI\DB\Stage\bi_Stage.mdf'
, size = 5 MB, filegrowth = 5 MB)
log on (name = 'bi_Meta_log'
, filename = 'E:\BI\DB\Stage\bi_Stage_log.ldf'
, size = 2 MB, filegrowth = 512 KB)
collate SQL_Latin1_General_CP1_CI_AS
go

alter database bi_Stage set recovery simple 
go
alter database bi_Stage set auto_shrink off
go
alter database bi_Stage set auto_create_statistics off
go
alter database bi_Stage set auto_update_statistics off
go


SET ANSI_NULLS ON
GO

use bi_Stage
go

if exists 
  (select * from sys.tables 
   where name = 'chart_of_account')
drop table chart_of_account
go

BEGIN
CREATE TABLE [dbo].[chart_of_account](
	[account_id] [int] NULL,
	[parent_account_id] [int] NULL,
	[account_code] [nvarchar](50) NULL,
	[account_name] [nvarchar](150) NULL,
	[account_type] [nvarchar](50) NULL
) ON [PRIMARY];
END
GO
SET ANSI_NULLS OFF
GO

Then open SQL Server Business Intelligence Development Studio.

Then go to File->New->Project and select Integration Service Project.

Select "Data Flow Task" from "Control Menu" and drag it on "Control Flow" tab. Then double click it.

Now select "Excel Source" from "Data Flow Sources" and drag it on "Data Flow" tab. Then double click it to assign file source.

Click OK and then again click OK.

Select "Data Conversion" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Excel Source". Then double click it. We need to convert data which is coming from source.

Rename "Output Alias" field, change "Data Type" value from default value to add prefix "src_" to distinguish, and change "Length". Click OK.

Select "OLE DB Source" from "Data Flow Transformation" and drag it on "Control Flow" tab. Then double click it. We need to assign data source.

Click New button for create New connection or select from already existing connection.

Click New button.

Assign "Server Name", select "Authentication", select "Database", and click "OK".

From left tab select "Column" and then click OK.

Select "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Data Conversion". Then double click it.

Select column which needs to be sorted.

Select another "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "OLE DB Source". Then double click it. Select column which needs to be sorted.

Select "Merge Join" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Sort".

Select Join Type as "Left outer join" and select all fields from left panel and all fields from right panel. Click OK. Connect "Merge Join" from "Sort Dest".

Select "Conditional Split" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Merge Join". Right-click the Conditional Split and click Edit to open the Conditional Split Editor.

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <<expression>> )" function to the Condition column of the New Rows condition:

Next, drag account_id from the columns folder and drop it onto the "<<expression>>" text in the Condition column. "New" should now be defined by the condition "ISNULL( [account_id] )". This defines the WHERE clause for new rows - setting it to "WHERE account_id Is NULL".

Type "Update" into a second Output Name column. Add the expression "((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))" to the Condition column for the Update Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))". Note "||" is used to convey "OR" in SSIS Expressions:

Change the "Default output name" from "Conditional Split Default Output" to "Unchanged".

Select "OLE DB Destination" from "Data Flow Destination" and Drag it on "Control Flow" tab.

Connect it from "Conditional Split" and select "New" and assign connection and map field.

Select "OLE DB Command" from "Data Flow Transformation" and Drag it on "Control Flow" tab.

Connect it from "Conditional Split" and select "Update" and assign "Connection Manager" and map field.

Click OK. Now we are ready to execute the SSIS.

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Now Execute the command to test:

SQL

SELECT [account_id]
      ,[parent_account_id]
      ,[account_code]
      ,[account_name]
      ,[account_type]
  FROM [bi_Stage].[dbo].[chart_of_account]

Now I have changed some data in Excel file and again execute the package with debugging (press F5), the package should succeed and appear as shown here:

Conclusion

I have used one more "Sort Dest" for sorting destination table. I think it will be better if we use "ORDER BY" clause in "OLE DB Source" rather than use "Sort". Performance can be varied. I hope this might be helpful to you!

History

None so far.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)