Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

An indespensible SSIS transformation component - Script Component

0.00/5 (No votes)
9 May 2011 1  
This article will demonstrate as how to work with SSIS Script component with two live example

An indespensible SSIS transformation component - Script Component

Table of Content

  1. Introduction
  2. Background
  3. Example 1: A String Splitter Program in SSIS using Script Component Transformation.
  4. Example 2: Parse XML files using Script Component and insert them into database using Bulk Insert Task.
  5. Conclusion

Introduction

Script component is a SSIS transformation component whose task is to run custom script code. Many times it happens that for some situation we do not have a built in transformation component; however, we can do so by writing some code snippet for the needed transformation. Script component come into play in such situations.

Background

Transformation is an integral part of most of the SSIS life cyle. Once the raw data comes to our hand, it is responsibility of the transformation components to make the needed morphisms and bring the data in the needed format. Though various kind of transformations are available in SSIS, but some time we need to have some kind of custom transformation for which no component is available. We can either use Script component in such situations or make our own custom component. In this article, we will look into the script component transformation into action by using two real time examples while usage of custom component will be discussed in another article.

Example 1: A String Splitter Program in SSIS using Script Component Transformation.

Context

In this program we will read the file contents which is given as under

Id	Value
1	Name1, Name2, Name3
2	Name4, Name5, Name1

and by using Script Component transformation we will bring the below output

1.jpg

Step to be carried out

Step 1:

Open Bids.Choose Integration Services Project from the available project type.Drag and drop a Dataflow Task in the control flow designer.

Drag and drop Flat File Source in the Data Flow designer.Right click on the Flat File Source component and from the popup, click Edit… to bring the Flat File Source Editor. Alternatively we can double click on the Flat File Source component for bringing up the Flat File Source Editor. In the connection manager of the Flat File Source Editor, click on the New button and specify the source file

2.jpg

In the Columns tab, the Row delimiter should be {CR}{LF} while the Columns delimiter should be Tab {t}

3.jpg

Click OK button.

Step 2:

Add a Script Component Transformation and set it as transformation.

4.jpg

Step 3:

Add precedence constraint from Flat file Source to the Script component.Right click on Script component and from the popup, click Edit… to bring up the Script Transformation Editor.

Step 4: Configuring the script component

In the Input Columns tab, add the two available columns: Column 0, Column 1

5.jpg

In Input and Outputs tab, select the Output 0, and rename it as Result.Set the SynchronousInputID property to None which will rather change the script component to asynchronous.

6.jpg

N.B. ~ there are two types of transformation in SSIS.

Synchronous Transformation

The output is synchronized with the input and the input data will be processed on a row by row basis.

Asynchronous Transformation

The output is not synchronized with the input. All the input data will be fetched initially, then all the rows will be read and followed by the output generation.

Add the below output columns under Result

Column name DataType
ID string [DT_STR]
CustomerName string [DT_STR]

7.jpg

In Script tab, set Script Language as Microsoft Visual C# 2008, and let's click on Edit Script button

Override the Input0_ProcessInput method to fetch all data till end of file as below:

public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        while (Buffer.NextRow())
        {
            Input0_ProcessInputRow(Buffer);
        }

        if (Buffer.EndOfRowset())
        {
            ResultBuffer.SetEndOfRowset();
        }
    }

Next we need to override the Input0_ProcessInputRow method to add new rows to output as below:

public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
      var arr = Row.Column1.Split(','); // Splitting the rows of Names column

      foreach (string str in arr)
      {
         ResultBuffer.AddRow(); //Adding rows to the Result Buffer

         //If the Names are not empty or Null, then set the values to the  
         //  corresponding Result Buffer properties
            if (!string.IsNullOrEmpty(Row.Column1)) 
            {
                ResultBuffer.ID = Row.Column0;
                ResultBuffer.CustomerName = str;
            }
        }
    }

Build the application and close it.

Step 5:

Add a Row Sampling and enable the data viewer. The final package design looks as under

8.jpg

Let us now run the package and we will the below output

9.jpg

What we learnt from Example 1

From this small example, we have seen how to configure the component,write script for transformation, what asynchronous and synchronous modes are etc. Now, we can go ahead and do more complex transformations with this powerful component. We will see one such example in the next section.

Example 2: Parse XML files using Script Component and insert them into database using Bulk Insert Task.

Context

Consider the below sample player information in xml format.

<Players>
	<BelongsTo CountryName = "Some Country">
		<Player Id = "1" Name = "Player Name" MatchPlayed="1"  RunsMade = "1" WicketTaken="1"/>	
	</BelongsTo>
</Players>

It's a simple xml structure that contains some player attributes like ID, Name, etc. and also information about the country the player(s) belongs to.

We have many such player information xml files as our source (for the sake of experiment, we will use only four such xml files as PlayerList[n].xml where n=1...4]. The source xml files are attached in the zipped file.

Any ways, our task is to read the player information from the xm l files, and to insert them into the database in one go.

Step to be carried out

Step 1:

Open Bids. Choose Integration Services Project from the available project type. Drag and drop the below components into the Control Flow designer in the following sequence

  1. 1 Execute SQL task
  2. 1 Dataflow Task embedded within 1 Foreach Loop Container
  3. 1 Bulk Insert Task.

10.jpg

And in the Data Flow designer, let us add a Script Component. Kindly note that in the Select Script Component Type pop up, the radio button should be set to "Source".

11.jpg

Step 2: Configure the Execute SQL Task

Double click on the Execute SQL Task for bringing up the Execute SQL Task Editor and let us do the below changes

12.jpg

In the SQL Statement section, let us write the below query

-- Drop the table if it exists
IF EXISTS (SELECT * FROM sys.objects WHERE name = N'tbl_PlayerInfo' AND type = 'U')
    DROP TABLE tbl_PlayerInfo
GO
--Create the table
CREATE TABLE [dbo].[tbl_PlayerInfo](
	[PlayerId] [int] NOT NULL,
	[PlayerName] [varchar](50) NOT NULL,
	[BelongsTo] [varchar](50) NOT nULL,
	[MatchPlayed] [int] NOT NULL,
	[RunsMade] [int] NOT NULL,
	[WicketTaken] [int] NOT NULL
) ON [PRIMARY]
GO

That's all about the Execute SQL Task configuration.

Step 3: Configure the Foreach Loop Container

Double click on the Foreach Loop container for opening up the Foreach Loop Editor.

13.jpg

Out of the various Foreach enumerators found in the Collection tab, we will choose only the Foreach File enumerator. We will also specify the source folder and since we will deal only with xml files, so the Files will be filtered by .xml extension as shown below.

14.jpg

In the variable mapping section, we will choose <New Variable>

15.jpg

The Add Variable screen will be configure as under

16.jpg

Click OK.Since after every iteration, the Foreach enumerator will return only one value, so the variable's index should be mapped to 0.

17.jpg

Click OK to save the settings for the Foreach Loop container.

Step 4: Configure the Script Component

Double click on the Script Component to bring up the Script Transformation Editor. In the Script tab available on the left, set the user defined variable (varCompleteFileName) in the ReadOnlyVariable Properties. For doing so, let us first click on the elipses(…) marked as 1 in the below figure. For the Select Variables Screen, choose the user define variable which is varCompleteFileName in this case (marked as 2 in the figure). Click OK button and the value will be set as marked as 3 in the figure.

18.jpg

Next, in the Input and Outputs tab, select the Output 0, and rename it as PlayerProcessedInfoAdd the below output columns under Output Columns

Column Name Data Type
PlayerID string [DT_STR]
PlayerName string [DT_STR]
BelongsTo string [DT_STR]
MatchPlayed string [DT_STR]
RunsMade string [DT_STR]
WicketTaken string [DT_STR]

At this stage, it looks as under

19.jpg

In Script tab, set Script Language as Microsoft Visual C# 2008, and let's click on Edit Script button.

In the ScriptWindow, let us first create an object of XMLDocument Class as well as StreamWriter class

XmlDocument xDoc = new XmlDocument();
StreamWriter sw = null; 

In the PreExecute Method, let us initialize the StreamWriterClass as shown under

//Initialize the StreamWriter at the PreExecute method.
var destinationFileName = @"D:\All_PlayerInfo.txt";        
sw = new StreamWriter(destinationFileName,true);     

As can be figured out, this can even be set as a variable and be pass at runtime.

Next override the CreateNewOutputRows() by the below code piece

//Load the file name
xDoc.Load(Variables.varCompleteFileName);

//Visit the nodes and get the records
 foreach (XmlNode xBelongsToNode in xDoc.SelectNodes("//BelongsTo"))
 {
     foreach (XmlNode xPlayerNode in xDoc.SelectNodes("//Player"))
     {
		var playerInfo = string.Concat(xPlayerNode.Attributes[0].Value,// Player ID
									   ",",
									   xPlayerNode.Attributes[1].Value,// Player Name
									   ",",
									   xBelongsToNode.Attributes[0].Value, // Belongs To
									   ",",
									   xPlayerNode.Attributes[2].Value, //Match Played
									   ",",
									   xPlayerNode.Attributes[3].Value, // Runs Made
									   ",",
									   xPlayerNode.Attributes[4].Value // Wickets taken
									   );

		sw.Write(playerInfo + Environment.NewLine);

	}
}

Basically we are looping through the PlayerNode for getting the PlayerInformation for every xml file available in the source directory and then by using the StreamWriter class, we are writing the information to the destination location.

Finally, in the PostExecute Method, let's dispose the StreamWriter object.

The full piece of code is given as under

using System;
using System.IO;
using System.Xml;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{    
    XmlDocument xDoc = new XmlDocument();
    StreamWriter sw = null;   

    public override void PreExecute()
    {
        base.PreExecute();

        //Initialize the StreamWriter at the PreExecute method.
        var destinationFileName = @"D:\All_PlayerInfo.txt";        
        sw = new StreamWriter(destinationFileName,true);        
    }

    public override void PostExecute()
    {
        base.PostExecute();
        //Close the StreamWriter when the work is over
        sw.Close();
    }

    public override void CreateNewOutputRows()
    {    
        //Load the vaious file names
        xDoc.Load(Variables.varCompleteFileName);

		//Visit the nodes and get the records
        foreach (XmlNode xBelongsToNode in xDoc.SelectNodes("//BelongsTo"))
        {
            foreach (XmlNode xPlayerNode in xDoc.SelectNodes("//Player"))
            {
                var playerInfo = string.Concat(xPlayerNode.Attributes[0].Value,// Player ID
                                               ",",
                                               xPlayerNode.Attributes[1].Value,// Player Name
                                               ",",
                                               xBelongsToNode.Attributes[0].Value, // Belongs To
                                               ",",
                                               xPlayerNode.Attributes[2].Value, //Match Played
                                               ",",
                                               xPlayerNode.Attributes[3].Value, // Runs Made
                                               ",",
                                               xPlayerNode.Attributes[4].Value // Wickets taken
                                               );

                sw.Write(playerInfo + Environment.NewLine);

            }
        }
    }
}

If we run the application, we can see the below output generated in the All_PlayerInfo.txt

1,India Player1,India,10,900,24
2,India Player2,India,12,456,10
3,India Player3,India,6,88,34
4,India Player4,India,9,345,22
5,India Player5,India,11,889,4
1,Australia Player1,Australia,6,500,54
2,Australia Player2,Australia,7,123,5
3,Australia Player3,Australia,15,888,14
4,Australia Player4,Australia,77,5000,223
5,Australia Player5,Australia,55,5567,18
1,Canada Player1,Canada,16,112,64
2,Canada Player2,Canada,5,173,51
1,USA Player1,USA,1,16,3
2,USA Player2,USA,2,14,1

Our last task is to insert this record into the database by using the Bulk Insert task.

Step 5: Configure the Bulk Inset Task

Double click on the Bulk Insert Task to open up the Bulk Insert Task Editor. In the Connection Section, let us do the below changes

  1. Give the Proper Connection String and the Destination Table Name
  2. The Column Delimiter should be Comma Seperated {,} in this case
  3. Enter the source file name which is All_PlayerInfo.txt here.

20.jpg

That's it.

Step 6:

Let us now run the package and it will happily execute as depicted under

21.jpg

Inorder to verify that our player records has been inserted properly into the database, let us issue the below query in the SSMS query window

	select * from [dbo].[tbl_PlayerInfo]

and the result pane turns out with the below result

22.jpg

What we learnt from Example 2

In this lesson, we learnt how to configure and use Execute SQL task, Bulk Insert Task, picking up the file names at runtime using variables defined within the scope of a Foreach Loop Container and reading the same inside a script component.

Conclusion

So we have seen two small examples of script component and we have learnt how to configure this component, how to use it in conjunction with other components etc. But these are only a few of the various scenarios where the power of this component comes into picture.I will add more example about this component in this post very soon.

Thanks for reading.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here