Introduction
In this tutorial I will show how to transform data using a Script Component within a Data Flow Task and regex.
My source data contains customer name, and I need to separate it to first, middle, and last names.
Background
Regular Expressions are a powerful way for you to search for patterns in strings of text. In SSIS, we can use Regular Expressions to assist us in cleansing or transforming data.
This is my partial "Customer" reference table.
What you need
Download script and execute in SSMS.
create database Test
go
use Test
go
USE [Test]
GO
if exists (select * from sys.tables where name = 'customer_destination')
drop table customer_destination
go
CREATE TABLE [dbo].[customer_destination](
[customer_number] [varchar](10) NULL,
[customer_type] [varchar](1) NULL,
[name] [varchar](100) NULL,
[gender] [varchar](1) NULL,
[email_address] [varchar](200) NULL,
[date_of_birth] [datetime] NULL,
[occupation] [varchar](50) NULL,
[status] [varchar](3) NULL,
[fname] [varchar](50) NULL,
[mname] [varchar](50) NULL,
[lname] [varchar](50) NULL
)
This script was tested in SQL Server 2008.
Create Project
Open SQL Server Business Intelligence Development Studio.
Then go to File->New->Project and select Integration Service Project.
Select "Data Flow Task" from "Control Flow Items" and drag it on "Control Flow" tab. Then double click it.
Select "Excel Source" from "Data Flow Source" and drag it on "Data Flow" tab. Double click on “Excel Source” task to configure it.
Click New button for new OLE DB connection or select from existing connection.
Click Browse button to select the "Customer.xls" file from your local path.
Select the Excel sheet name.
Check the available column and click OK.
Select "Data Conversion" from "Data Flow Transformation" and drag it on the "Data Flow" tab and connect
the extended green arrow from “Excel Source” to your "Data Conversion".
Convert data. The length should be the same, otherwise you will get a warning message. Add "convert" prefix in each value.
Select "Script Component" from "Data Flow Transformation" and drag it on the "Data Flow" tab.
Connect the extended green arrow from “Data Conversion” to your "Script Component".
Click Input Column tab and select the "convert_name" column.
Click the Output Column tab and create three outputs: "fname", "mname", and "lname".
Click the "Script" tab and click "Edit Script". I have chosen my ScriptLanguage as "Microsoft Visual C# 2008".
In the Main.cs file we need to add a reference. We need this for Regular Expressions.
using System.Text.RegularExpressions;
In the Main.cs file it's time to write some code in the Input0_ProcessInputRow
procedure.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string keyVal = Row.convertname.ToString();
Match match = Regex.Match(keyVal, @"^(?<first>\w+) (?<last>\w+)(?: (?<middle>\w+))?$");
if (match.Success)
{
string f = match.Groups["first"].Value;
string l = match.Groups["last"].Value;
string m = null;
if (match.Groups["middle"].Success)
{
m = match.Groups["middle"].Value;
}
Row.fname = f;
Row.lname = l;
Row.mname = m;
}
else
Row.fname = Row.convertname.ToString();
}</middle></last></first>
Select "OLE DB Destination" from "Data Flow Destination" and drag it on the "Data Flow" tab. Connect
the extended green arrow from “Script Component” to your "OLE DB Destination".
Double click on the "OLE DB Destination" task to configure it.
Click New to create a new connection.
Select "Server Name", "Authentication", and "Database" which will be "Test" for this example. Click Test Connection for checking, then
click OK and then again click OK.
Select the "customer_destination" table.
Click the mapping tab and map each field and click OK.
If you execute the package with debugging (press F5), the package should succeed and appear as shown here:
To check what is happening here:
SELECT [customer_number]
,[customer_type] as [type]
,[name]
,[gender]
,[email_address]
,[date_of_birth]
,[occupation]
,[status]
,[fname]
,[mname]
,[lname]
FROM [Test].[dbo].[customer_destination]
Conclusion
You can see that the fname, mname, and lname fields have been populated. The .NET Framework includes full support for regular
expressions (Regex), in the System.Text.RegularExpressions
namespace. Regex provides an incredibly powerful way of defining and finding string patterns.
You can use them for string pattern match, data cleaning, and data
transformation. Email, postal code, and phone number verification are a few of the
uses.
I have included some Regex here, though all are not perfect, they are collected from
Google.
Mail: [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Phone: ^[0-9+\(\)#\.\s\/ext-]+$
First character capital:- ^[A-Z][a-zA-Z0-9]+$
I hope this might be helpful to you!
References
History
None so far.