An indespensible SSIS transformation component - Script Component
Table of Content
- Introduction
- Background
- Example 1: A String Splitter Program in SSIS using Script Component Transformation.
- Example 2: Parse XML files using Script Component and insert them into database using Bulk Insert Task.
- Conclusion
Script component is a SSIS transformation component whose task is to run custom script code. Many times it happens that for some situation we do not have a built in transformation component; however, we can do so by writing some code snippet for the needed transformation. Script component come into play in such situations.
Transformation is an integral part of most of the SSIS life cyle. Once the raw data comes to our hand, it is responsibility of the transformation components to make the needed morphisms and bring the data in the needed format. Though various kind of transformations are available in SSIS, but some time we need to have some kind of custom transformation for which no component is available. We can either use Script component in such situations or make our own custom component. In this article, we will look into the script component transformation into action by using two real time examples while usage of custom component will be discussed in another article.
Context
In this program we will read the file contents which is given as under
Id Value
1 Name1, Name2, Name3
2 Name4, Name5, Name1
and by using Script Component transformation we will bring the below output
Step to be carried out
Step 1:
Open Bids.Choose Integration Services Project from the available project type.Drag and drop a Dataflow Task in the control flow designer.
Drag and drop Flat File Source in the Data Flow designer.Right click on the Flat File Source component and from the popup, click Edit… to bring the Flat File Source Editor. Alternatively we can double click on the Flat File Source component for bringing up the Flat File Source Editor. In the connection manager of the Flat File Source Editor, click on the New button and specify the source file
In the Columns tab, the Row delimiter should be {CR}{LF} while the Columns delimiter should be Tab {t}
Click OK button.
Step 2:
Add a Script Component Transformation and set it as transformation.
Step 3:
Add precedence constraint from Flat file Source to the Script component.Right click on Script component and from the popup, click Edit… to bring up the Script Transformation Editor.
Step 4: Configuring the script component
In the Input Columns tab, add the two available columns: Column 0, Column 1
In Input and Outputs tab, select the Output 0, and rename it as Result.Set the SynchronousInputID property to None which will rather change the script component to asynchronous.
N.B. ~ there are two types of transformation in SSIS.
Synchronous Transformation
The output is synchronized with the input and the input data will be processed on a row by row basis.
Asynchronous Transformation
The output is not synchronized with the input. All the input data will be fetched initially, then all the rows will be read and followed by the output generation.
Add the below output columns under Result
Column name |
DataType |
ID |
string [DT_STR] |
CustomerName |
string [DT_STR] |
In Script tab, set Script Language as Microsoft Visual C# 2008, and let's click on Edit Script button
Override the Input0_ProcessInput method to fetch all data till end of file as below:
public override void Input0_ProcessInput(Input0Buffer Buffer)
{
while (Buffer.NextRow())
{
Input0_ProcessInputRow(Buffer);
}
if (Buffer.EndOfRowset())
{
ResultBuffer.SetEndOfRowset();
}
}
Next we need to override the Input0_ProcessInputRow method to add new rows to output as below:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var arr = Row.Column1.Split(',');
foreach (string str in arr)
{
ResultBuffer.AddRow();
if (!string.IsNullOrEmpty(Row.Column1))
{
ResultBuffer.ID = Row.Column0;
ResultBuffer.CustomerName = str;
}
}
}
Build the application and close it.
Step 5:
Add a Row Sampling and enable the data viewer. The final package design looks as under
Let us now run the package and we will the below output
What we learnt from Example 1
From this small example, we have seen how to configure the component,write script for transformation, what asynchronous and synchronous modes are etc. Now, we can go ahead and do more complex transformations with this powerful component. We will see one such example in the next section.
Context
Consider the below sample player information in xml format.
<Players>
<BelongsTo CountryName = "Some Country">
<Player Id = "1" Name = "Player Name" MatchPlayed="1" RunsMade = "1" WicketTaken="1"/>
</BelongsTo>
</Players>
It's a simple xml structure that contains some player attributes like ID, Name, etc. and also information about the country the player(s) belongs to.
We have many such player information xml files as our source (for the sake of experiment, we will use only four such xml files as PlayerList[n].xml where n=1...4]. The source xml files are attached in the zipped file.
Any ways, our task is to read the player information from the xm l files, and to insert them into the database in one go.
Step to be carried out
Step 1:
Open Bids. Choose Integration Services Project from the available project type. Drag and drop the below components into the Control Flow designer in the following sequence
- 1 Execute SQL task
- 1 Dataflow Task embedded within 1 Foreach Loop Container
- 1 Bulk Insert Task.
And in the Data Flow designer, let us add a Script Component. Kindly note that in the Select Script Component Type pop up, the radio button should be set to "Source".
Step 2: Configure the Execute SQL Task
Double click on the Execute SQL Task for bringing up the Execute SQL Task Editor and let us do the below changes
In the SQL Statement section, let us write the below query
-- Drop the table if it exists
IF EXISTS (SELECT * FROM sys.objects WHERE name = N'tbl_PlayerInfo' AND type = 'U')
DROP TABLE tbl_PlayerInfo
GO
--Create the table
CREATE TABLE [dbo].[tbl_PlayerInfo](
[PlayerId] [int] NOT NULL,
[PlayerName] [varchar](50) NOT NULL,
[BelongsTo] [varchar](50) NOT nULL,
[MatchPlayed] [int] NOT NULL,
[RunsMade] [int] NOT NULL,
[WicketTaken] [int] NOT NULL
) ON [PRIMARY]
GO
That's all about the Execute SQL Task configuration.
Step 3: Configure the Foreach Loop Container
Double click on the Foreach Loop container for opening up the Foreach Loop Editor.
Out of the various Foreach enumerators found in the Collection tab, we will choose only the Foreach File enumerator. We will also specify the source folder and since we will deal only with xml files, so the Files will be filtered by .xml extension as shown below.
In the variable mapping section, we will choose <New Variable>
The Add Variable screen will be configure as under
Click OK.Since after every iteration, the Foreach enumerator will return only one value, so the variable's index should be mapped to 0.
Click OK to save the settings for the Foreach Loop container.
Step 4: Configure the Script Component
Double click on the Script Component to bring up the Script Transformation Editor. In the Script tab available on the left, set the user defined variable (varCompleteFileName) in the ReadOnlyVariable Properties. For doing so, let us first click on the elipses(…) marked as 1 in the below figure. For the Select Variables Screen, choose the user define variable which is varCompleteFileName in this case (marked as 2 in the figure). Click OK button and the value will be set as marked as 3 in the figure.
Next, in the Input and Outputs tab, select the Output 0, and rename it as PlayerProcessedInfoAdd the below output columns under Output Columns
Column Name |
Data Type |
PlayerID |
string [DT_STR] |
PlayerName |
string [DT_STR] |
BelongsTo |
string [DT_STR] |
MatchPlayed |
string [DT_STR] |
RunsMade |
string [DT_STR] |
WicketTaken |
string [DT_STR] |
At this stage, it looks as under
In Script tab, set Script Language as Microsoft Visual C# 2008, and let's click on Edit Script button.
In the ScriptWindow, let us first create an object of XMLDocument Class as well as StreamWriter class
XmlDocument xDoc = new XmlDocument();
StreamWriter sw = null;
In the PreExecute Method, let us initialize the StreamWriterClass as shown under
var destinationFileName = @"D:\All_PlayerInfo.txt";
sw = new StreamWriter(destinationFileName,true);
As can be figured out, this can even be set as a variable and be pass at runtime.
Next override the CreateNewOutputRows() by the below code piece
xDoc.Load(Variables.varCompleteFileName);
foreach (XmlNode xBelongsToNode in xDoc.SelectNodes("//BelongsTo"))
{
foreach (XmlNode xPlayerNode in xDoc.SelectNodes("//Player"))
{
var playerInfo = string.Concat(xPlayerNode.Attributes[0].Value, ",",
xPlayerNode.Attributes[1].Value, ",",
xBelongsToNode.Attributes[0].Value, ",",
xPlayerNode.Attributes[2].Value, ",",
xPlayerNode.Attributes[3].Value, ",",
xPlayerNode.Attributes[4].Value );
sw.Write(playerInfo + Environment.NewLine);
}
}
Basically we are looping through the PlayerNode for getting the PlayerInformation for every xml file available in the source directory and then by using the StreamWriter class, we are writing the information to the destination location.
Finally, in the PostExecute Method, let's dispose the StreamWriter object.
The full piece of code is given as under
using System;
using System.IO;
using System.Xml;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
XmlDocument xDoc = new XmlDocument();
StreamWriter sw = null;
public override void PreExecute()
{
base.PreExecute();
var destinationFileName = @"D:\All_PlayerInfo.txt";
sw = new StreamWriter(destinationFileName,true);
}
public override void PostExecute()
{
base.PostExecute();
sw.Close();
}
public override void CreateNewOutputRows()
{
xDoc.Load(Variables.varCompleteFileName);
foreach (XmlNode xBelongsToNode in xDoc.SelectNodes("//BelongsTo"))
{
foreach (XmlNode xPlayerNode in xDoc.SelectNodes("//Player"))
{
var playerInfo = string.Concat(xPlayerNode.Attributes[0].Value, ",",
xPlayerNode.Attributes[1].Value, ",",
xBelongsToNode.Attributes[0].Value, ",",
xPlayerNode.Attributes[2].Value, ",",
xPlayerNode.Attributes[3].Value, ",",
xPlayerNode.Attributes[4].Value );
sw.Write(playerInfo + Environment.NewLine);
}
}
}
}
If we run the application, we can see the below output generated in the All_PlayerInfo.txt
1,India Player1,India,10,900,24
2,India Player2,India,12,456,10
3,India Player3,India,6,88,34
4,India Player4,India,9,345,22
5,India Player5,India,11,889,4
1,Australia Player1,Australia,6,500,54
2,Australia Player2,Australia,7,123,5
3,Australia Player3,Australia,15,888,14
4,Australia Player4,Australia,77,5000,223
5,Australia Player5,Australia,55,5567,18
1,Canada Player1,Canada,16,112,64
2,Canada Player2,Canada,5,173,51
1,USA Player1,USA,1,16,3
2,USA Player2,USA,2,14,1
Our last task is to insert this record into the database by using the Bulk Insert task.
Step 5: Configure the Bulk Inset Task
Double click on the Bulk Insert Task to open up the Bulk Insert Task Editor. In the Connection Section, let us do the below changes
- Give the Proper Connection String and the Destination Table Name
- The Column Delimiter should be Comma Seperated {,} in this case
- Enter the source file name which is All_PlayerInfo.txt here.
That's it.
Step 6:
Let us now run the package and it will happily execute as depicted under
Inorder to verify that our player records has been inserted properly into the database, let us issue the below query in the SSMS query window
select * from [dbo].[tbl_PlayerInfo]
and the result pane turns out with the below result
What we learnt from Example 2
In this lesson, we learnt how to configure and use Execute SQL task, Bulk Insert Task, picking up the file names at runtime using variables defined within the scope of a Foreach Loop Container and reading the same inside a script component.
So we have seen two small examples of script component and we have learnt how to configure this component, how to use it in conjunction with other components etc. But these are only a few of the various scenarios where the power of this component comes into picture.I will add more example about this component in this post very soon.
Thanks for reading.