Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

SSIS: Transform data using Regex

5.00/5 (1 vote)
21 Jan 2013BSD3 min read 43.3K   401  
Transform data by using regex and Script Component.

Introduction

In this tutorial I will show how to transform data using a Script Component within a Data Flow Task and regex. My source data contains customer name, and I need to separate it to first, middle, and last names.

Background

Regular Expressions are a powerful way for you to search for patterns in strings of text. In SSIS, we can use Regular Expressions to assist us in cleansing or transforming data.

This is my partial "Customer" reference table.

Image 1

What you need

Download script and execute in SSMS.

SQL
-- Create database
create database Test
go
use Test
go
 
USE [Test]
GO

-- Create customer table
if exists (select * from sys.tables where name = 'customer_destination')
drop table customer_destination
go

CREATE TABLE [dbo].[customer_destination](
	[customer_number] [varchar](10) NULL,
	[customer_type] [varchar](1) NULL,
	[name] [varchar](100) NULL,
	[gender] [varchar](1) NULL,
	[email_address] [varchar](200) NULL,
	[date_of_birth] [datetime] NULL,
	[occupation] [varchar](50) NULL,
	[status] [varchar](3) NULL,
	[fname] [varchar](50) NULL,
	[mname] [varchar](50) NULL,
	[lname] [varchar](50) NULL
)

This script was tested in SQL Server 2008.

Create Project

Open SQL Server Business Intelligence Development Studio.

Image 2

Then go to File->New->Project and select Integration Service Project.

Image 3

Select "Data Flow Task" from "Control Flow Items" and drag it on "Control Flow" tab. Then double click it.

Image 4

Select "Excel Source" from "Data Flow Source" and drag it on "Data Flow" tab. Double click on “Excel Source” task to configure it.

Image 5

Click New button for new OLE DB connection or select from existing connection.

Image 6

Click Browse button to select the "Customer.xls" file from your local path.

Image 7

Select the Excel sheet name.

Image 8

Check the available column and click OK.

Image 9

Select "Data Conversion" from "Data Flow Transformation" and drag it on the "Data Flow" tab and connect the extended green arrow from “Excel Source” to your "Data Conversion".

Image 10

Convert data. The length should be the same, otherwise you will get a warning message. Add "convert" prefix in each value.

Image 11

Select "Script Component" from "Data Flow Transformation" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Data Conversion” to your "Script Component".

Image 12

Image 13

Click Input Column tab and select the "convert_name" column.

Image 14

Click the Output Column tab and create three outputs: "fname", "mname", and "lname".

Image 15

Click the "Script" tab and click "Edit Script". I have chosen my ScriptLanguage as "Microsoft Visual C# 2008".

Image 16

In the Main.cs file we need to add a reference. We need this for Regular Expressions.

Image 17

C#
using System.Text.RegularExpressions;

In the Main.cs file it's time to write some code in the Input0_ProcessInputRow procedure.

Image 18

C#
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    /*
      Add your code here
    */
    string keyVal = Row.convertname.ToString();
    Match match = Regex.Match(keyVal, @"^(?<first>\w+) (?<last>\w+)(?: (?<middle>\w+))?$");
    if (match.Success)
    {
        string f = match.Groups["first"].Value;
        string l = match.Groups["last"].Value;
        string m = null;
        if (match.Groups["middle"].Success)
        {
            m = match.Groups["middle"].Value;
        }

        Row.fname = f;
        Row.lname = l;
        Row.mname = m;
    }
    else
        Row.fname = Row.convertname.ToString();
}</middle></last></first>

Select "OLE DB Destination" from "Data Flow Destination" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Script Component” to your "OLE DB Destination".

Image 19

Double click on the "OLE DB Destination" task to configure it.

Image 20

Click New to create a new connection.

Image 21

Select "Server Name", "Authentication", and "Database" which will be "Test" for this example. Click Test Connection for checking, then click OK and then again click OK.

Image 22

Select the "customer_destination" table.

Image 23

Click the mapping tab and map each field and click OK.

Image 24

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Image 25

To check what is happening here:

SQL
SELECT [customer_number]
      ,[customer_type] as [type]
      ,[name]
      ,[gender]
      ,[email_address]
      ,[date_of_birth]
      ,[occupation]
      ,[status]
      ,[fname]
      ,[mname]
      ,[lname]
  FROM [Test].[dbo].[customer_destination]

Image 26

Conclusion

You can see that the fname, mname, and lname fields have been populated. The .NET Framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provides an incredibly powerful way of defining and finding string patterns. You can use them for string pattern match, data cleaning, and data transformation. Email, postal code, and phone number verification are a few of the uses. I have included some Regex here, though all are not perfect, they are collected from Google.

Mail: [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Phone: ^[0-9+\(\)#\.\s\/ext-]+$

First character capital:- ^[A-Z][a-zA-Z0-9]+$

I hope this might be helpful to you!

References

History

None so far.

License

This article, along with any associated source code and files, is licensed under The BSD License