Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

Getting Started With SSIS Script transforms

4.50/5 (6 votes)
20 Aug 2012CPOL9 min read 116.6K  
Using a script transform with C# to convert Julian (ordinal?) dates for a date dimension.

Introduction

This article was written to demonstrate using a script transform to convert Julian dates and create extra columns for building a date dimension. It will show the basics of wiring up one of these script components and how to use it to create an output row for every row input/processed. I had trouble finding useful examples the first time that I tried to use one of these SSIS components and I wrote this believing that others might benefit from what I learned.  I also want to include a warning. I believe there is a limitation limitation in this code for dates prior to 2000. I have an older version of it in vb6 where it checks to see if a date is prior to 2000 by checking to see if the length is less than 6. It may also need to be modified to handle leap years.

Background 

I wrote this script transform as part of a project to move JD Edwards transactional data out of an OLTP system with a DB2 backend. The company had decided to close several lines of business and would no longer need to use the JD Edwards system for new transactions. Looking ahead they decided that they were not going to renew their license for JD Edwards and would therefore lose access to the system. A decision was also made to sell off the IBM iSeries hardware that ran the system.

My job was to create a copy of the data on a SQL Server database and create reports that would let them review the data at any point in the future. SQL Server was chosen because the remaining business segment would continue to maintain the databases on that server to support continuing operations. So I had a new Data Warehouse project to develop.

The company used JD Edwards for order transactions and so one of the first parts that I needed to build was an order detail fact table and some dimensions such as product, customer and date.  So far it looks like a pretty typical Data Warehouse project. In the order detail table there were three important dates. The first was the transaction date, the second was the ship date and from an
accounting standpoint the all-important invoice date which is when the revenue is recognized.

The problem I ran into was that the dates were in a format called Julian. The JD Edwards system does a fine job of presenting a normal looking date in the user interfaces and reports but the underlying date in the system was in this Julian format. So for example 1/1/2012 is represented as 112001 and 2/1/2012 is 112032. The first three digits indicate the year. 2011 would be 111. The last three are the day of the year.

I decided that since the company only needed the last four or five years of history the date dimension would not be very large, only about 365 rows for each year minus weekends, holidays and any other days where there was no activity. I would use the Julian date as the key to the date dimension and assign it to an integer data type for efficiency.

What follows is the part of the ETL process which converts the Julian date into a regular date for populating the date dimension with all existing dates and new dates entered between now and the end of the year when the software will no longer be used.

Building the ETL

So my ETL process for the basic conversion looks like this:

Image 1

I have an OLE DB data source that looks at the order detail table as it exists on the SQL Server and it queries a distinct list of the dates.

Image 2

For simplicity what you see above is a single column retrieved from the OrderDetail table.

The next step is to add a script component to the data flow. When you do this you are presented with the choice to use the script component for a source, destination or a transform. In this case the choice is transformation.

Image 3

Now you have a scrip component on the Data Flow and the first thing that you want to do is connect it to the OLE DB Source so that it has an input. If you skip this step you will not be able to fully configure the component and if you try you will get the following warning:

Image 4

Now right click on the script component and choose edit. The left pane contains a list of items to configure and the main pane is where you make changes and settings. So for the first item Script we can set the properties. The following properties appear in bold and you can change them. You have Name, Description and Script Language. I use C# for most of my projects so the code example that follows will be in C#.

Image 5

Now we can look at the next item which is input columns. Below you will see the single column returned in the select statement. It is always better to write an explicit SQL statement telling the SSIS OLE DB source exactly what you want returned. This keeps things efficient by bringing back only the data you need and preventing issues that may arise in the future if the source table is modified. Notice the default name for the input column (input 0). The names of the inputs and outputs are important because they match the method signatures in the script code that you see when you click the edit script button that you see in the image above.

Image 6

Ok, since InvoiceDate is checked off in the above image we are done with this item. If we were bringing in multiple columns or the entire table we could chose columns using the check boxes. Now we move on the next and final item that we need to configure which is the Inputs and Outputs item. In the image below I have expanded both inputs and outputs. The input is what you would expect, InvoiceDate. Notice that we cannot set the data type because the component knows the underlying data type of that column. In the JD Edwards source database the Julian date field might even be a string but I am using it as an integer in my project because it will be the key to the date dimension and it will be an index and integers are more efficient in an index. It also just makes sense because incrementing the field as an integer does exactly what you would expect to the date value and that is, it adds a day within the domain of 0-365.

Also notice that there is an output (Output 0) but there are no columns. You should also know that you can rename both the input(s) and the output(s). If you do they will be changed in the script. The point is that if you are going to change them then it is best to do it before you code the script section. Now in this example I want to add two columns to the output and I do this by using the add column button. This button will be enabled after clicking on and highlighting the Output Columns folder.

Image 7

In the image below you can see that I have added two columns which are JulianDate and RegularDate. I want the Julian date column to have the original Julian date as it appears in the source table and the regular date will be a normal date in the usual format (or at least a format that can be modified by common built-in functions).

Image 8

OK now we can do some coding by choosing Script, the first item on the left. When you do so, you will once again have access to the Edit Script button that appears in one of the images above. Below is what you would see in this example which uses the default input and output names.

Image 9

I know that it is somewhat hard to see the details but you will see it clearly on your machine. After adding code my Input0_ProcessInputRow looks like this:

C#
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
  String julianDate = Convert.ToString(Row.InvoiceDate);
    DateTime dt_date;
    if (julianDate.Length == 6)
    {
        try
        {
            string sDay = julianDate.Substring(3);
            string sYear = "20" + julianDate.Substring(1, 2);
            dt_date = DateTime.Now;
            int iCurrentDay = Convert.ToInt16(sDay) - dt_date.DayOfYear;

            dt_date = dt_date.AddDays(iCurrentDay);
            dt_date = dt_date.AddYears(Convert.ToInt16(sYear) - dt_date.Year);
            julianDate = Convert.ToString(dt_date.ToShortDateString());
            Output0Buffer.AddRow();
            Output0Buffer.RegularDate = Convert.ToDateTime(julianDate);
            Output0Buffer.JulianDate = Row.InvoiceDate;
        }
        catch
        {
        }
    }
}

The first part of the code does the Julian date conversion and it is pretty straight forward. The last three lines in the try block create the output on a row by row basis without any need for a loop. notice the use of the name Output0 in the Output0Buffer statements. If you change the input name the method name will change. If you change the output name you will have to adapt these statements as well so do any renaming first. So the first of those statements adds an output row for each input row. Because I have added two output columns I can now output both the regular date in the second statement and the original Julian date in the third statement. I will be adding additional columns for dayOfWeek, Year etc.

Now when I put a dataviewer on the input of the script component and one on the output and send it to a Union All to hold the data and enable me to run it you can see one column of Julian values going in and two columns coming out (one Julian and one Regular). If you would like to see some Webinars that demo SSIS tasks try Pragmatic Works and look on the training menu. You can choose to sign up for future webinars and view past ones as well.

Image 10

Image 11

Conclusion  

Hopefully you can see the power available in this component. Best practice says that you should not use the script component to accomplish what the existing components can do for you because they will typically be more efficient and besides why waste the time (unless of course you enjoy it like I do Smile | <img src= " src="http://www.codeproject.com/script/Forums/Images/smiley_smile.gif" />. I also hope that this example will help someone get started. The first time I tried to use this component I did so without any instruction and I did not understand the nuances of adding columns, out put rows and input processing. Once I figured out how all that worked I was off and running, hopefully you will be too!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)