Introduction
SSIS is a great ETL tool and has been there now for a long time.But not always we get all the desired component that we need for transformation.For that we go ahead with Script component. But the disadvantage being that it is application specific.If we need to do the same kind of operation in other SSIS packages, then script component won't help.Instead we can go ahead with our own custom transformation component and even that can be reusable.This article will focus on building, deploying and using a custom script transformation component in a step by step manner
Background
SSIS transformation components helps us to transform the source data to some desirable format. Though SSIS provides many transformation components, but, sometime we may need tomperform certain actions for which there is no components available.In such cases custom components comes very handy.In this article we will look into how we can make a custom SSIS transformation component.The custom component is a simple email validator that will read the emails from some source and will let us know if it is a valid email or not. We will use C# as the language of choice.
Using the code
Step 1:
To begin with, let us create a Class Library project (say DFCEmailValidator).
Step 2:
Once the project is created, let us add the following dll references in our project.
- Microsoft.SqlServer.Dts.Design
- Microsoft.SqlServer.DTSPipelineWrap
- Microsoft.SqlServer.DTSRuntimeWrap
- Microsoft.SqlServer.ManagedDTS
- Microsoft.SqlServer.PipelineHost
Step 3:
Let us create a class by the name CustomEmailValidator.cs .Inherit the class from the PipelineComponent class and decorate it with DtsPipelineComponent attribute as under
using System;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
namespace DFCEmailValidator
{
[DtsPipelineComponent(DisplayName = "CustomEmailValidator"
, Description = "Custom Email Validator of Data Flow
Component-Niladri Biswas"
, ComponentType = ComponentType.Transform)]
public class CustomEmailValidator : PipelineComponent
{
}
}
As we are preparing a custom Data Flow Component henceforth it needs to inherit PipelineComponent Base class. The presence of the DTSPipelineComponentAttribute makes this class as a Transformation type Data Flow Component. Pipeline components are the backbone of SSIS components and works in conjunction with Connection Manager for moving and transforming the data.
Step 4: Override the ProvideComponentProperties method
Component initialization happens in the ProvideComponentProperties method. This method is called when a component is first added to the data flow task, to initialize the Microsoft.SqlServer.Dts.Pipeline.PipelineComponent.ComponentMetaData of the component. We will perform the below action in this method
- Set the component information
- Add an input object
- Add an output object
- Add an error object
public override void ProvideComponentProperties()
{
ComponentMetaData.Name = "CustomEmailValidator";
ComponentMetaData.Description = "Custom Email Validator of Data Flow Component-Niladri Biswas";
ComponentMetaData.ContactInfo = "mail2nil123@gmail.com";
RemoveAllInputsOutputsAndCustomProperties();
IDTSInput100 inputObject = ComponentMetaData.InputCollection.New();
inputObject.Name = "InputToCustomEmailValidator";
inputObject.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
IDTSOutput100 outputObject = ComponentMetaData.OutputCollection.New();
outputObject.Name = "OutputFromCustomEmailValidator";
outputObject.SynchronousInputID = inputObject.ID;
AddErrorOutput("ErrorFromCustomEmailValidator", inputObject.ID, outputObject.ExclusionGroup);
}
Step 5: Override the Validate method
This method verifies if the component is correctly configured.It returns a DTSValidationStatus enum which is as under
Enum Member Name | Value | Descrption |
VS_ISVALID | 0 | The component is correctly configured and ready for execution |
VS_ISBROKEN | 1 | The component is incorrectly configured; typically, this indicates that a property is set incorrectly |
VS_NEEDSNEWMETADATA | 2 | The component's metadata is outdated or corrupt, and a call to Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSDesigntimeComponent100.ReinitializeMetaData()will repair the component |
VS_ISCORRUPT | 3 | The component is irreparably damaged and must be completely reset. The designer calls the component's Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSDesigntimeComponent100.ProvideComponentProperties() method in response |
We will perform the below action in this method
- Check whether the column type is DT_STR/DT_WSTR or not
- Dynamically create the output column of Boolean type for every input column e.g. If the input column is name as MyInput, then the output column will be name as IsValidEmail_ MyInput
public override DTSValidationStatus Validate()
{
bool pbCancel = false;
IDTSInput100 input = ComponentMetaData.InputCollection[0];
string errorMsg = "Wrong datatype specified for {0}. It accepts DT_STR and DT_WSTR";
for (int x = 0; x < input.InputColumnCollection.Count; x++)
{
if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
|| input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
{
ComponentMetaData.FireError(
0
, ComponentMetaData.Name
, String.Format(errorMsg, input.InputColumnCollection[x].Name)
, string.Empty
, 0
, out pbCancel);
return DTSValidationStatus.VS_ISCORRUPT;
}
}
IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
foreach (IDTSInputColumn100 inputColumn in input.InputColumnCollection)
{
bool IsPresent = false;
foreach (IDTSOutputColumn100 outputColumn in output.OutputColumnCollection)
{
if (outputColumn.Name == "IsValidEmail_" + inputColumn.Name)
{
IsPresent = true;
}
}
if (!IsPresent)
{
IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
outputcol.Name = "IsValidEmail_" + inputColumn.Name;
outputcol.Description = String.Format("Tells if {0} is a Valid Email Address.", inputColumn.Name);
outputcol.SetDataTypeProperties(DataType.DT_BOOL, 0, 0, 0, 0);
}
}
return DTSValidationStatus.VS_ISVALID;
}
Step 6: Override ReinitializeMetaData method
This method repairs any errors identified during validation that cause the component to return Microsoft.SqlServer.Dts.Pipeline.Wrapper.DTSValidationStatus.VS_NEEDSNEWMETADATA at design time.
public override void ReinitializeMetaData()
{
ComponentMetaData.RemoveInvalidInputColumns();
ReinitializeMetaData();
}
Step 7: Override InsertOutputColumnAt method
Prevents addition of new output column from Advanced Editor as we have already adding the output column in the Validate method.
public override IDTSOutputColumn100 InsertOutputColumnAt(
int outputID,
int outputColumnIndex,
string name,
string description)
{
throw new Exception(
string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name)
, null);
}
Step 8: Override the PreExecute method
It is call in the data execution flow once per component.
public override void PreExecute()
{
IDTSInput100 input = ComponentMetaData.InputCollection[0];
inputBufferColumnIdx = new int[input.InputColumnCollection.Count];
Enumerable
.Range(0, input.InputColumnCollection.Count)
.ToList()
.ForEach(i =>
{
IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
inputBufferColumnIdx[i] = BufferManager
.FindColumnByLineageID(input.Buffer, inputCol.LineageID);
});
IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
outputBufferColumnIdx = new int[output.OutputColumnCollection.Count];
Enumerable
.Range(0, input.InputColumnCollection.Count)
.ToList()
.ForEach(i =>
{
IDTSOutputColumn100 outputCol = output.OutputColumnCollection[i];
outputBufferColumnIdx[i] = BufferManager
.FindColumnByLineageID(input.Buffer, outputCol.LineageID);
});
}
Step 9: Override the ProcessInput method
It is called repeatedly at run time when a PipelineBuffer from an upstream component is available to the component to let the component process the incoming rows. In this method, the logic for email validation is implemented.
public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
if (!buffer.EndOfRowset)
{
while (buffer.NextRow())
{
for (int x = 0; x < inputBufferColumnIdx.Length; x++)
{
bool isValidEmail = false;
DataType BufferColDataType = buffer.GetColumnInfo(inputBufferColumnIdx[x]).DataType;
if (BufferColDataType == DataType.DT_STR ||
BufferColDataType == DataType.DT_WSTR)
{
isValidEmail = IsValidEmail(buffer.GetString(inputBufferColumnIdx[x]));
}
buffer.SetBoolean(outputBufferColumnIdx[x], isValidEmail);
}
}
}
}
And the IsValidEmail method is as under
private bool IsValidEmail(string email)
{
string regexPattern = @"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$";
return new Regex(regexPattern, RegexOptions.IgnoreCase).IsMatch(email);
}
The full code is as under
using System;
using System.Linq;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
namespace DFCEmailValidator
{
[DtsPipelineComponent(DisplayName = "CustomEmailValidator"
, Description = "Custom Email Validator of Data Flow Component-Niladri Biswas"
, ComponentType = ComponentType.Transform)]
public class CustomEmailValidator : PipelineComponent
{
private int[] inputBufferColumnIdx;
private int[] outputBufferColumnIdx;
#region ProvideComponentProperties
public override void ProvideComponentProperties()
{
ComponentMetaData.Name = "CustomEmailValidator";
ComponentMetaData.Description = "Custom Email Validator of Data Flow Component-Niladri Biswas";
ComponentMetaData.ContactInfo = "mail2nil123@gmail.com";
RemoveAllInputsOutputsAndCustomProperties();
IDTSInput100 inputObject = ComponentMetaData.InputCollection.New();
inputObject.Name = "InputToCustomEmailValidator";
inputObject.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
IDTSOutput100 outputObject = ComponentMetaData.OutputCollection.New();
outputObject.Name = "OutputFromCustomEmailValidator";
outputObject.SynchronousInputID = inputObject.ID;
AddErrorOutput("ErrorFromCustomEmailValidator", inputObject.ID, outputObject.ExclusionGroup);
}
public override DTSValidationStatus Validate()
{
bool pbCancel = false;
IDTSInput100 input = ComponentMetaData.InputCollection[0];
string errorMsg = "Wrong datatype specified for {0}. It accepts DT_STR and DT_WSTR";
for (int x = 0; x < input.InputColumnCollection.Count; x++)
{
if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
|| input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
{
ComponentMetaData.FireError(
0
, ComponentMetaData.Name
, String.Format(errorMsg, input.InputColumnCollection[x].Name)
, string.Empty
, 0
, out pbCancel);
return DTSValidationStatus.VS_ISCORRUPT;
}
}
IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
foreach (IDTSInputColumn100 inputColumn in input.InputColumnCollection)
{
bool IsPresent = false;
foreach (IDTSOutputColumn100 outputColumn in output.OutputColumnCollection)
{
if (outputColumn.Name == "IsValidEmail_" + inputColumn.Name)
{
IsPresent = true;
}
}
if (!IsPresent)
{
IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
outputcol.Name = "IsValidEmail_" + inputColumn.Name;
outputcol.Description = String.Format("Tells if {0} is a Valid Email Address.", inputColumn.Name);
outputcol.SetDataTypeProperties(DataType.DT_BOOL, 0, 0, 0, 0);
}
}
return DTSValidationStatus.VS_ISVALID;
}
public override void ReinitializeMetaData()
{
ComponentMetaData.RemoveInvalidInputColumns();
ReinitializeMetaData();
}
public override IDTSOutputColumn100 InsertOutputColumnAt(
int outputID,
int outputColumnIndex,
string name,
string description)
{
throw new Exception(
string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name)
, null);
}
#endregion
#region Runtime methods
public override void PreExecute()
{
IDTSInput100 input = ComponentMetaData.InputCollection[0];
inputBufferColumnIdx = new int[input.InputColumnCollection.Count];
Enumerable
.Range(0, input.InputColumnCollection.Count)
.ToList()
.ForEach(i =>
{
IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
inputBufferColumnIdx[i] = BufferManager
.FindColumnByLineageID(input.Buffer, inputCol.LineageID);
});
IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
outputBufferColumnIdx = new int[output.OutputColumnCollection.Count];
Enumerable
.Range(0, input.InputColumnCollection.Count)
.ToList()
.ForEach(i =>
{
IDTSOutputColumn100 outputCol = output.OutputColumnCollection[i];
outputBufferColumnIdx[i] = BufferManager
.FindColumnByLineageID(input.Buffer, outputCol.LineageID);
});
}
public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
if (!buffer.EndOfRowset)
{
while (buffer.NextRow())
{
for (int x = 0; x < inputBufferColumnIdx.Length; x++)
{
bool isValidEmail = false;
DataType BufferColDataType = buffer.GetColumnInfo(inputBufferColumnIdx[x]).DataType;
if (BufferColDataType == DataType.DT_STR ||
BufferColDataType == DataType.DT_WSTR)
{
isValidEmail = IsValidEmail(buffer.GetString(inputBufferColumnIdx[x]));
}
buffer.SetBoolean(outputBufferColumnIdx[x], isValidEmail);
}
}
}
}
private bool IsValidEmail(string email)
{
string regexPattern = @"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$";
return new Regex(regexPattern, RegexOptions.IgnoreCase).IsMatch(email);
}
#endregion Run Time Methods
}
}
Step 10: Build and deploy the component into the Gac
Now, we should build the class library, generate a Strong name and then deploy it to the gac by using the command
C:\Program Files\Microsoft Visual Studio 9.0\VC>gacutil -i "D:\DFCEmailValidator\bin\Debug\DFCEmailValidator.dll"
N.B.~ The dll path will vary... So change it accordingly
If successful we will get the message
Assembly successfully added to the cache
We can even verify this browsing to C:\WINDOWS\assembly where we will find the assembly by the name DFCEmailValidator
Step 11: Copy the dll and put into the PipeLineComponents folder
Once the previous step is over, next we can copy the dll to the PipelineComponents folder which will be in C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents.
Step 12:
Open BIDS. Go to the DataFlow editor’s toolbox. Right Click -> Choose Items.Visit the SSIS DataFlow Items tab and locate the CustomEmailValidator. Tick the checkbox option and click the OK button.
We can find that the custom transformation has been added to the toolbox
Step 13:
Drag and drop a Flat File Source and configure it. Let’s say we have a text file (Source.txt) whose content is as under
testmailwrong
testmail@test.com
invalid@invalid
valid@gmail.com
valid_123@yahoo.co.in
123@123@123@1123#456~xud.com
Next drag and drop our CustomEmailValidator and add the data flow path from Flat File Source to it. Double click on the CustomEmailValidator for opening the Editor.
The screen shows that it has three tabs viz Custom Properties, Input columns and Input Output Properties.
The first tab shows the Custom Properties that we set.
Visiting the Input Columns tab we find the available input columns and that needs to be selected. In this case it is Column0.
And in the InputOutput Properties, if we expand the Output columns , we can find that our Output column has been created automatically whose name is IsValidEmail_Column0.
Click Ok.Next drag and drop a Conditional Split and add the data flow path from the CustomEmailValidator to it. And Configure the Conditional Split component as under
Output Name | Condition |
Success | [IsValidEmail_Column0] == TRUE |
Failure | [IsValidEmail_Column0] == FALSE |
Lastly add two flat file destination component one of whose source will be the success path of the Conditional path and the other being the Failure. The destination files are named respectively as ValidEmail.txt and InvalidEmail.txt.
The entire package is as under
Step 14:
Let us run the application and we will see the valid emails in ValidEmail.txt
testmail@test.com
valid@gmail.com
valid_123@yahoo.co.in
while the InvalidEmail.txt contains the content as
testmailwrong
invalid@invalid
123@123@123@1123#456~xud.com
Conclusion
This article has shown us with a very simple example as how to build a custom transformation component with a step by step approach. I hope it will be helpful for many developers who are new to this concept. We can follow the concept and can build any kind of custom component in SSIS. In the next article we will look into how to build custom task in SSIS.
Thanks for reading