Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

Create custom transformation component(Email Validator) in SSIS- Step by step

4.71/5 (6 votes)
3 Aug 2011CPOL6 min read 49.6K   829  
In this article we will look into how we can make a custom SSIS transformation component.The custom component is a simple

Introduction

SSIS is a great ETL tool and has been there now for a long time.But not always we get all the desired component that we need for transformation.For that we go ahead with Script component. But the disadvantage being that it is application specific.If we need to do the same kind of operation in other SSIS packages, then script component won't help.Instead we can go ahead with our own custom transformation component and even that can be reusable.This article will focus on building, deploying and using a custom script transformation component in a step by step manner

Background

SSIS transformation components helps us to transform the source data to some desirable format. Though SSIS provides many transformation components, but, sometime we may need tomperform certain actions for which there is no components available.In such cases custom components comes very handy.In this article we will look into how we can make a custom SSIS transformation component.The custom component is a simple email validator that will read the emails from some source and will let us know if it is a valid email or not. We will use C# as the language of choice.

Using the code

Step 1:

To begin with, let us create a Class Library project (say DFCEmailValidator).

Step 2:

Once the project is created, let us add the following dll references in our project.

  1. Microsoft.SqlServer.Dts.Design
  2. Microsoft.SqlServer.DTSPipelineWrap
  3. Microsoft.SqlServer.DTSRuntimeWrap
  4. Microsoft.SqlServer.ManagedDTS
  5. Microsoft.SqlServer.PipelineHost

Step 3:

Let us create a class by the name CustomEmailValidator.cs .Inherit the class from the PipelineComponent class and decorate it with DtsPipelineComponent attribute as under

using System;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

namespace DFCEmailValidator
{
    [DtsPipelineComponent(DisplayName = "CustomEmailValidator"
                        , Description = "Custom Email Validator of Data Flow 
                                         Component-Niladri Biswas"
                        , ComponentType = ComponentType.Transform)]
    public class CustomEmailValidator : PipelineComponent
    {
    }
}

As we are preparing a custom Data Flow Component henceforth it needs to inherit PipelineComponent Base class. The presence of the DTSPipelineComponentAttribute makes this class as a Transformation type Data Flow Component. Pipeline components are the backbone of SSIS components and works in conjunction with Connection Manager for moving and transforming the data.

Step 4: Override the ProvideComponentProperties method

Component initialization happens in the ProvideComponentProperties method. This method is called when a component is first added to the data flow task, to initialize the Microsoft.SqlServer.Dts.Pipeline.PipelineComponent.ComponentMetaData of the component. We will perform the below action in this method

  1. Set the component information
  2. Add an input object
  3. Add an output object
  4. Add an error object
public override void ProvideComponentProperties()
{
    // Set component information
    ComponentMetaData.Name = "CustomEmailValidator";
    ComponentMetaData.Description = "Custom Email Validator of Data Flow Component-Niladri Biswas";
    ComponentMetaData.ContactInfo = "mail2nil123@gmail.com";

    // Reset the component.
    //Deletes each Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSInput100 
    //and Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSOutput100 object from the component.
    RemoveAllInputsOutputsAndCustomProperties();

    // Add an input object
    IDTSInput100 inputObject = ComponentMetaData.InputCollection.New();
    inputObject.Name = "InputToCustomEmailValidator";
    inputObject.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;

    // Add an output object.
    IDTSOutput100 outputObject = ComponentMetaData.OutputCollection.New();
    outputObject.Name = "OutputFromCustomEmailValidator";
    outputObject.SynchronousInputID = inputObject.ID; //Synchronous Transform           

    //Add an error object
    AddErrorOutput("ErrorFromCustomEmailValidator", inputObject.ID, outputObject.ExclusionGroup);

}

Step 5: Override the Validate method

This method verifies if the component is correctly configured.It returns a DTSValidationStatus enum which is as under

Enum Member NameValueDescrption
VS_ISVALID0The component is correctly configured and ready for execution
VS_ISBROKEN1The component is incorrectly configured; typically, this indicates that a property is set incorrectly
VS_NEEDSNEWMETADATA2The component's metadata is outdated or corrupt, and a call to Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSDesigntimeComponent100.ReinitializeMetaData()will repair the component
VS_ISCORRUPT3The component is irreparably damaged and must be completely reset. The designer calls the component's Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSDesigntimeComponent100.ProvideComponentProperties() method in response

We will perform the below action in this method

  1. Check whether the column type is DT_STR/DT_WSTR or not
  2. Dynamically create the output column of Boolean type for every input column e.g. If the input column is name as MyInput, then the output column will be name as IsValidEmail_ MyInput
public override DTSValidationStatus Validate()
{
    bool pbCancel = false;
    IDTSInput100 input = ComponentMetaData.InputCollection[0];
    string errorMsg = "Wrong datatype specified for {0}. It accepts DT_STR and DT_WSTR";

    //Check whether the column type is DT_STR/DT_WSTR or not
    for (int x = 0; x < input.InputColumnCollection.Count; x++)
    {
	if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
	   || input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
	{
	    ComponentMetaData.FireError(
					0
					, ComponentMetaData.Name
					, String.Format(errorMsg, input.InputColumnCollection[x].Name)
					, string.Empty
					, 0
					, out pbCancel);
	    return DTSValidationStatus.VS_ISCORRUPT;


	}
    }

   // Dynamically create the output column of Boolean type
    IDTSOutput100 output = ComponentMetaData.OutputCollection[0];

    foreach (IDTSInputColumn100 inputColumn in input.InputColumnCollection)
    {
	bool IsPresent = false;
	foreach (IDTSOutputColumn100 outputColumn in output.OutputColumnCollection)
	{
	    if (outputColumn.Name == "IsValidEmail_" + inputColumn.Name)
	    {
		IsPresent = true;
	    }
	}

	if (!IsPresent)
	{
	    IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
	    outputcol.Name = "IsValidEmail_" + inputColumn.Name;
	    outputcol.Description = String.Format("Tells if {0} is a Valid Email Address.", inputColumn.Name);
	    outputcol.SetDataTypeProperties(DataType.DT_BOOL, 0, 0, 0, 0);
	}
    }
    return DTSValidationStatus.VS_ISVALID;
}

Step 6: Override ReinitializeMetaData method

This method repairs any errors identified during validation that cause the component to return Microsoft.SqlServer.Dts.Pipeline.Wrapper.DTSValidationStatus.VS_NEEDSNEWMETADATA at design time.

public override void ReinitializeMetaData()
{
    ComponentMetaData.RemoveInvalidInputColumns();
    ReinitializeMetaData();
}

Step 7: Override InsertOutputColumnAt method

Prevents addition of new output column from Advanced Editor as we have already adding the output column in the Validate method.

public override IDTSOutputColumn100 InsertOutputColumnAt(
							     int outputID,
							     int outputColumnIndex,
							     string name,
							     string description)
{
    throw new Exception(
	string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name)
	, null);

}

Step 8: Override the PreExecute method

It is call in the data execution flow once per component.

public override void PreExecute()
{
    IDTSInput100 input = ComponentMetaData.InputCollection[0];
    inputBufferColumnIdx = new int[input.InputColumnCollection.Count];

    Enumerable
	.Range(0, input.InputColumnCollection.Count)
	.ToList()
	.ForEach(i =>
	{
	    IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
	    inputBufferColumnIdx[i] = BufferManager
				       .FindColumnByLineageID(input.Buffer, inputCol.LineageID);
	});


    IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
    outputBufferColumnIdx = new int[output.OutputColumnCollection.Count];

    Enumerable
	.Range(0, input.InputColumnCollection.Count)
	.ToList()
	.ForEach(i =>
	{
	    IDTSOutputColumn100 outputCol = output.OutputColumnCollection[i];
	    outputBufferColumnIdx[i] = BufferManager
					.FindColumnByLineageID(input.Buffer, outputCol.LineageID);
	});
}

Step 9: Override the ProcessInput method

It is called repeatedly at run time when a PipelineBuffer from an upstream component is available to the component to let the component process the incoming rows. In this method, the logic for email validation is implemented.

public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
    if (!buffer.EndOfRowset)
    {
	while (buffer.NextRow())
	{
	    for (int x = 0; x < inputBufferColumnIdx.Length; x++)
	    {
		bool isValidEmail = false;
		DataType BufferColDataType = buffer.GetColumnInfo(inputBufferColumnIdx[x]).DataType;

		if (BufferColDataType == DataType.DT_STR ||
		    BufferColDataType == DataType.DT_WSTR)
		{
		    isValidEmail = IsValidEmail(buffer.GetString(inputBufferColumnIdx[x]));
		}

		buffer.SetBoolean(outputBufferColumnIdx[x], isValidEmail);
	    }
	}
    }
}

And the IsValidEmail method is as under

private bool IsValidEmail(string email)
{
    string regexPattern = @"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$";
    return new Regex(regexPattern, RegexOptions.IgnoreCase).IsMatch(email);
}

The full code is as under

using System;
using System.Linq;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

namespace DFCEmailValidator
{
    [DtsPipelineComponent(DisplayName = "CustomEmailValidator"
                        , Description = "Custom Email Validator of Data Flow Component-Niladri Biswas"
                        , ComponentType = ComponentType.Transform)]
    public class CustomEmailValidator : PipelineComponent
    {
        private int[] inputBufferColumnIdx;
        private int[] outputBufferColumnIdx;

        #region ProvideComponentProperties
        public override void ProvideComponentProperties()
        {
            // Set component information
            ComponentMetaData.Name = "CustomEmailValidator";
            ComponentMetaData.Description = "Custom Email Validator of Data Flow Component-Niladri Biswas";
            ComponentMetaData.ContactInfo = "mail2nil123@gmail.com";

            // Reset the component.
            //Deletes each Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSInput100 
            //and Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSOutput100 object from the component.
            RemoveAllInputsOutputsAndCustomProperties();

            // Add an input object
            IDTSInput100 inputObject = ComponentMetaData.InputCollection.New();
            inputObject.Name = "InputToCustomEmailValidator";
            inputObject.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;

            // Add an output object.
            IDTSOutput100 outputObject = ComponentMetaData.OutputCollection.New();
            outputObject.Name = "OutputFromCustomEmailValidator";
            outputObject.SynchronousInputID = inputObject.ID; //Synchronous Transform           

            //Add an error object
            AddErrorOutput("ErrorFromCustomEmailValidator", inputObject.ID, outputObject.ExclusionGroup);

        }
       
        public override DTSValidationStatus Validate()
        {
            bool pbCancel = false;
            IDTSInput100 input = ComponentMetaData.InputCollection[0];
            string errorMsg = "Wrong datatype specified for {0}. It accepts DT_STR and DT_WSTR";

            //Check whether the column type is DT_STR/DT_WSTR or not
            for (int x = 0; x < input.InputColumnCollection.Count; x++)
            {
                if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
                   || input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
                {
                    ComponentMetaData.FireError(
                                                0
                                                , ComponentMetaData.Name
                                                , String.Format(errorMsg, input.InputColumnCollection[x].Name)
                                                , string.Empty
                                                , 0
                                                , out pbCancel);
                    return DTSValidationStatus.VS_ISCORRUPT;
                    
                    
                }
            }

           // Dynamically create the output column of Boolean type
            IDTSOutput100 output = ComponentMetaData.OutputCollection[0];

            foreach (IDTSInputColumn100 inputColumn in input.InputColumnCollection)
            {
                bool IsPresent = false;
                foreach (IDTSOutputColumn100 outputColumn in output.OutputColumnCollection)
                {
                    if (outputColumn.Name == "IsValidEmail_" + inputColumn.Name)
                    {
                        IsPresent = true;
                    }
                }

                if (!IsPresent)
                {
                    IDTSOutputColumn100 outputcol = output.OutputColumnCollection.New();
                    outputcol.Name = "IsValidEmail_" + inputColumn.Name;
                    outputcol.Description = String.Format("Tells if {0} is a Valid Email Address.", inputColumn.Name);
                    outputcol.SetDataTypeProperties(DataType.DT_BOOL, 0, 0, 0, 0);
                }
            }
            return DTSValidationStatus.VS_ISVALID;
        }
       
        public override void ReinitializeMetaData()
        {
            ComponentMetaData.RemoveInvalidInputColumns();
            ReinitializeMetaData();
        }
        
        public override IDTSOutputColumn100 InsertOutputColumnAt(
             int outputID,
             int outputColumnIndex,
             string name,
             string description)
        {
            throw new Exception(
                string.Format("Fail to add output column name to {0} ", ComponentMetaData.Name)
                , null);

        }
        #endregion


        #region Runtime methods

        public override void PreExecute()
        {
            IDTSInput100 input = ComponentMetaData.InputCollection[0];
            inputBufferColumnIdx = new int[input.InputColumnCollection.Count];

            Enumerable
                .Range(0, input.InputColumnCollection.Count)
                .ToList()
                .ForEach(i =>
                {
                    IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
                    inputBufferColumnIdx[i] = BufferManager
                                               .FindColumnByLineageID(input.Buffer, inputCol.LineageID);
                });


            IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
            outputBufferColumnIdx = new int[output.OutputColumnCollection.Count];

            Enumerable
                .Range(0, input.InputColumnCollection.Count)
                .ToList()
                .ForEach(i =>
                {
                    IDTSOutputColumn100 outputCol = output.OutputColumnCollection[i];
                    outputBufferColumnIdx[i] = BufferManager
                                                .FindColumnByLineageID(input.Buffer, outputCol.LineageID);
                });
        }

        public override void ProcessInput(int inputID, PipelineBuffer buffer)
        {
            if (!buffer.EndOfRowset)
            {
                while (buffer.NextRow())
                {
                    for (int x = 0; x < inputBufferColumnIdx.Length; x++)
                    {
                        bool isValidEmail = false;
                        DataType BufferColDataType = buffer.GetColumnInfo(inputBufferColumnIdx[x]).DataType;

                        if (BufferColDataType == DataType.DT_STR ||
                            BufferColDataType == DataType.DT_WSTR)
                        {
                            isValidEmail = IsValidEmail(buffer.GetString(inputBufferColumnIdx[x]));
                        }

                        buffer.SetBoolean(outputBufferColumnIdx[x], isValidEmail);
                    }
                }
            }
        }


        private bool IsValidEmail(string email)
        {
            string regexPattern = @"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$";
            return new Regex(regexPattern, RegexOptions.IgnoreCase).IsMatch(email);
        }

        #endregion Run Time Methods
    }
}

Step 10: Build and deploy the component into the Gac

Now, we should build the class library, generate a Strong name and then deploy it to the gac by using the command

C:\Program Files\Microsoft Visual Studio 9.0\VC>gacutil  -i "D:\DFCEmailValidator\bin\Debug\DFCEmailValidator.dll"

N.B.~ The dll path will vary... So change it accordingly

If successful we will get the message

Assembly successfully added to the cache

1.jpg

We can even verify this browsing to C:\WINDOWS\assembly where we will find the assembly by the name DFCEmailValidator

2.jpg

Step 11: Copy the dll and put into the PipeLineComponents folder

Once the previous step is over, next we can copy the dll to the PipelineComponents folder which will be in C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents.

3.jpg

Step 12:

Open BIDS. Go to the DataFlow editor’s toolbox. Right Click -> Choose Items.Visit the SSIS DataFlow Items tab and locate the CustomEmailValidator. Tick the checkbox option and click the OK button.

4.jpg

We can find that the custom transformation has been added to the toolbox

5.jpg

Step 13:

Drag and drop a Flat File Source and configure it. Let’s say we have a text file (Source.txt) whose content is as under

testmailwrong
testmail@test.com
invalid@invalid
valid@gmail.com
valid_123@yahoo.co.in
123@123@123@1123#456~xud.com

Next drag and drop our CustomEmailValidator and add the data flow path from Flat File Source to it. Double click on the CustomEmailValidator for opening the Editor.

6.jpg

The screen shows that it has three tabs viz Custom Properties, Input columns and Input Output Properties.

The first tab shows the Custom Properties that we set.

Visiting the Input Columns tab we find the available input columns and that needs to be selected. In this case it is Column0.

7.jpg

And in the InputOutput Properties, if we expand the Output columns , we can find that our Output column has been created automatically whose name is IsValidEmail_Column0.

8.jpg

Click Ok.Next drag and drop a Conditional Split and add the data flow path from the CustomEmailValidator to it. And Configure the Conditional Split component as under

9.jpg

Output NameCondition
Success[IsValidEmail_Column0] == TRUE
Failure[IsValidEmail_Column0] == FALSE

Lastly add two flat file destination component one of whose source will be the success path of the Conditional path and the other being the Failure. The destination files are named respectively as ValidEmail.txt and InvalidEmail.txt.

The entire package is as under

10.jpg

Step 14:

Let us run the application and we will see the valid emails in ValidEmail.txt

testmail@test.com
valid@gmail.com
valid_123@yahoo.co.in

while the InvalidEmail.txt contains the content as

testmailwrong
invalid@invalid
123@123@123@1123#456~xud.com

11.jpg

Conclusion

This article has shown us with a very simple example as how to build a custom transformation component with a step by step approach. I hope it will be helpful for many developers who are new to this concept. We can follow the concept and can build any kind of custom component in SSIS. In the next article we will look into how to build custom task in SSIS.

Thanks for reading

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)