Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

SSIS: Normalize Table by Lookup

4.00/5 (1 vote)
17 Jan 2013BSD4 min read 34.8K   717  
How to normalize a table by using lookup transformation.

Introduction

In this tutorial I will show how to normalize a table by using lookup transformation.

Background

Image 1

  • A stage is an internal data store used for transforming and preparing the data obtained from the source systems, before the data is loaded to other data stores in a data warehouse.
  • A normalized data store (NDS) is an internal master data store in the form of one or more normalized relational databases for the purpose of integrating data from various source systems captured in a stage, before the data is loaded to a user-facing data store.
  • A dimensional data store (DDS) is a user-facing data store, in the form of one or more relational databases, where the data is arranged in dimensional format for the purpose of supporting analytical queries.

Image 2

The NDS is a master data store containing the complete data sets, including all historical transaction data and all historical versions of master data. The NDS contains master tables and transaction tables. A transaction table is a table that contains a business transaction or business event. A master table is a table that contains the persons or objects involved in the business event.

When either importing data or ETL (Extract, Transform & Load) for a data warehouse, specially in data transfer from Stage to NDS where all data should be normalize. You assign a Surrogate key for referrance table and you are replacing the original text/ID with reference Surrogate key.

This is my sample denormalize Customer table which i need to normalize with:

Image 3

and this is my partial "Country" referrance table:

Image 4

and this is my partial "City" reference table:

Image 5

What you need

  • Download script and execute in SSMS.
  • SQL
    -- Create database
    create database Test
    go
    use Test
    go
     
    USE [Test]
    GO
    
    -- Create customer table
    if exists (select * from sys.tables where name = 'customer_source')
    drop table customer_source
    go
    
    CREATE TABLE [dbo].[customer_source](
    	[customer_number] [varchar](10) NOT NULL,
    	[gender] [char](1) NULL,
    	[date_of_birth] [datetime] NULL,
    	[address1] [varchar](50) NULL,
    	[address2] [varchar](50) NULL,
    	[city] [varchar](50) NULL,
    	[zipcode] [varchar](10) NULL,
    	[country] [varchar](50) NULL
    
    ) 
    
    if exists
    ( select * from sys.tables
      where name = 'DimCity')
    drop table DimCity
    go
    
    create table DimCity
    ( 
    city_id         int identity(1,1),
    city_name        varchar(50)
    )
    go
    
    
    if exists
    ( select * from sys.tables
      where name = 'DimCountry')
    drop table DimCountry
    go
    
    create table DimCountry
    ( 
    country_id         int identity(1,1),
    country_code        char(2),
    country_name        varchar(50)
    )
    go
    
    -- Create customer table with Normalization
    
    if exists (select * from sys.tables where name = 'DimCustomer')
    drop table DimCustomer
    go
    
    CREATE TABLE [dbo].DimCustomer(
    	[customer_number] [varchar](10) NOT NULL,
    	[gender] [char](1) NULL,
    	[date_of_birth] [datetime] NULL,
    	[address1] [varchar](50) NULL,
    	[address2] [varchar](50) NULL,
    	[zipcode] [varchar](10) NULL,
    	[city_id] int,
    	[country_id] int
    
    )
  • Download Customer.xls and import data into customer_source table.
  • Download Country.dat and import data into DimCountry table.
  • Download City.txt and import data into DimCity table.

This script was tested in SQL Server 2008.

Create Project

Open SQL Server Business Intelligence Development Studio.

Image 6

Then go to File->New->Project and select Integration Service Project.

Image 7

Select "Data Flow Task" from "Control Flow Items" and Drag it on "Control Flow" tab. Then double click it.

Image 8

Select "OLE DB Source" from "Data Flow Source" and Drag it on "Data Flow" tab. Double click on “OLE DB Source” task to configure it.

Image 9

Click Connection for new connection or select from existing connection.

Image 10

Click New button to create new Data Connection or select from left tab.

Image 11

Select "Server Name", "Authentication" and "Database" which will be "Test" for this example. Click Test Connection for checking then Click Ok and then again click OK.

Image 12

Select "customer_source" table then again click OK.

Image 13

Select "Multicast" from "Data Flow Transformation" and Drag it on "Data Flow" tab and connect extended green arrow from “OLE DB Source” to your "Multicast".

Image 14

Select "Lookup" from "Data Flow Transformation" and Drag it on "Data Flow" tab and rename it "Lookup - Country" and connect extended green arrow from “Multicast” to your "Lookup". Double click on “Lookup” task to configure it.

Image 15

Select "Ignore failure" value from "General" tab.

Image 16

Select "DimCountry" value from "Connection" tab.

Image 17

Map source field with lookup field and select id from "Column" tab.

Image 18

Select another "Lookup" from "Data Flow Transformation" and Drag it on "Data Flow" tab and rename it "Lookup - City" and connect extended green arrow from “Multicast” to your "Lookup". Double click on “Lookup” task to configure it.

Image 19

Select "Ignore failure" value from "General" tab.

Image 20

Select "DimCity" value from "Connection" tab.

Image 21

Map source field with lookup field and select id from "Column" tab.

Image 22

Select "Sort" from "Data Flow Transformation" and Drag it on "Data Flow" tab and rename it "Sort - Country" and connect extended green arrow from "Lookup - Country" to your "Sort". Double click on "Sort" task to configure it.

Image 23

Assign value for sort.

Image 24

Select another "Sort" from "Data Flow Transformation" and Drag it on "Data Flow" tab and rename it "Sort" and connect extended green arrow from "Lookup" to your "Sort". Double click on "Sort" task to configure it.

Image 25

Select another "Merge Join" from "Data Flow Transformation" and Drag it on "Data Flow" tab and connect extended green arrow from "Sort - Country" and "Sort" to your "Merge Join". Double click on "Merge Join" task to configure it.

Image 26

Assign value.

Image 27

Select another "OLE DB Destination" from "Data Flow Destination" and Drag it on "Data Flow" tab and connect extended green arrow from "Merge Join" to your "OLE DB Destination". Double click on "OLE DB Destination" task to configure it.

Image 28

Assign value.

Image 29

Image 30

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Image 31

To check what is happining here:

SQL
select * from [Test].[dbo].[DimCustomer]

Image 32

Now our destination table is normalized.

Conclusion

You can see there are some NULL value in city_id and country_id column which means no value was matched with reference value. It's also possible to collect all the unmatched value, store then in another table and merge then with lookup and again execute the process so all value is matched.

I hope this might be helpful to you!

References

  • Carla Sabotta
  • Vincent Rainardi

History

None so far.

License

This article, along with any associated source code and files, is licensed under The BSD License