An Introduction to Sql 11 (Code Name Denali) –Part V (SSIS Features in CTP 3)
Table of Content
- Introduction
- Background
- SSIS New Features And Enhancements
- SSIS Expression Task
- DQS Cleansing Component
- Load files with variable row formats
- Displaying Success and failures in different styles
- Sort packages by name
- Shared Connection Managers
- New expression language functions(Left,Token,TokenCount)
- Package Parameter Tab
- Removal of Data Sources and Data Source Views folders
- Solution Explorer now have Project Parameters node
- Conclusion
After the release of Denali CTP 1 on 8th November, 2010, the next big hit was on 12th July 2011 when Denali CTP 3 has been launched. CTP3 is available both in 32-bit and 64-bit versions. As usual, Denali has brought some new features for Sql lovers may that be developers, Administrators or Business Intelligence (BI) professionals. In this series we will explore on some of the new features of SSIS that has been included in this CTP.
In the last few years, Microsoft has brought many technologies under the developers’ hood. A drastic change has been made in the Sql Server jargon with the advent of Sql Server 2005(code name Yukon) and in the subsequent releases like Sql Server 2008(code name Katmai) and Sql 11(code name Denali), the same pace has been kept with introduction to new features and enhancements as well as improvements. In this article we will explore some of the new features that Denali CTP 3 has already offer us from SSIS perspective.Earlier we have seen the enhancements made in Denali CTP1 . The subsequent articles will focus on the enhancements made in the other areas.
You can read my other articles on Denali as given under
- An Introduction to Sql 11 (Code Name Denali) –Part I (SSMS Features in CTP 1)
- An Introduction to Sql 11 (Code Name Denali) –Part II (T-Sql Features in CTP 1)
- An Introduction to Sql 11 (Code Name Denali) –Part III (SSIS Features in CTP 1)
- An Introduction to Sql 11 (Code Name Denali) -Part IV (Contained Database in CTP 1)
I.SSIS Expression Task
We have a new Task component in the toolbox by the name Expression Task. It's purpose is to assign the result of a SSIS expression to a variable at runtime.
Let us see this into action.We will form an expression that will give the age of a person.
Open bids and in the toolbox of the Control Flow panel, we will find the Expression Task
Now let us create a package level variable.First let us right click on the designer and choose Variables
Then in the variable window that appears, create a variable by the name Age of type Int32 and it's scope will be Package
>
Drag and drop an Expression Task. Double click and the Expression Builder will open where we will find the package variable created
In the expression text box, let us write the below expression
@[User::Age] = DATEDIFF("YY", (DT_DBTIMESTAMP)"01/01/1999",GetDate())
and click on the Evaluate Expression button.
Click OK.Next drag and drop a Script Task component.
Click OK button. Then in the ReadOnlyVariable properties, let us put the variable that we created
Next, let us click the Edit Script button and add the below code
public void Main()
{
string displayMessage = "Your age is {0}";
MessageBox.Show(string.Format(displayMessage,Dts.Variables[0].Value.ToString()));
Dts.TaskResult = (int)ScriptResults.Success;
}
Run the application and we will find the result
N.B.~We should take care of the fact that while creating the expression, we should not give any space;else it will complain error as shown under
II.DQS Cleansing Component
Another new component has been added as part of the transformation componet. It is the Data Quality Services (DQS) Cleansing transformation component. It helps us to cleanse our data based on the knowledge base we will build. This article will not focus on DQS as it is a seperate topic altogether and will be published soon. In this we will focus on creation of our own knowledge base and then validating the record based on that.
We need to create our own Knowledge base.
Our intension is that, our knowledge base will have Country domain values and the rules will be such that the country length should be more than or equal to 5 and should not be abbreviated. Also if the user record has "England" as the value , it will be corrected to "United Kingdom".
For doing so, let us follow the below steps
Connect to the DQS server
Once connected properly, next click on the New Knowledge base button that comes under the Knowledge Base Management category
Once done, we can now create the new knowledge base
Let us enter a Domain name, give some description and click on the create button.Next create a domain.
After clicking on the Create a domain button, we will get the Create Domain popup window.We need to enter a domain name and leave the reset as it is. Click on OK button.
To the right hand side we can see our options for setting the Data Quality rules that apply to the domain in question
Next we should create some rules that will be applied. For that, let us click on the Domain Rules tab
Next we need to click on the Add a new domain rule button for creating a new domain rule.Once clicked,we will get the below screen
Enter Name,Description and from the Build a rule drop down choose "Length greater than or equal to" and set the value to 5 as shown under
Along with this, we also want to apply that no country name should be abbreviated.For that, let us click on the Adds a new condition to the selected clause button
From the dropdown , let us choose “Value does not contain” and set the value to "."(dot)
After that click on Apply All Rules button.
Click YES and we will receive the success message stating that the domain rule has been applied successfully
Once we have set up the domain rule, the next step will be to add the domain values.So let us visit the Domain Values tab and click on the Add new domain value button
Now let us add some domain values.Observe that, if anyone gives the value of England, the output will be corrected to United Kingdom.
Once done, click Finish button.
As a last step , we need to click no the Publish button and the KB(Knowledge base) will be available publickly
And we can make out that “OurTestDomain” has been created
For more information about DQS Knowledge Bases and Domains, we can refer to this article
So, our knowledge base is now prepared. Next we need to prepare our source data. In this example we will use Excel as our data source.The excel data source is as under
Open bids and drag and drop a Data flow task onto the Control Flow designer.In the dataflow designer, first of all make an Excel Source and give the appropriate File name and path and the needed configuration.Next let us drag and drop a DQS Cleansing component.
Double click on that to open DQS Cleansing Transformation editor.
We need to give the Connection manager. So click on the New Button and the connection manager window opens.
Specify the server name and then click on the test connection to get the successful message.
Next we need to specify the Data quality Knowledge base. We will choose “OurTestDomain” which we just created. And the available domain will be listed here.
Next let's visit to the Mapping column
Check the checkbox next to Country Input column.
From the Domain dropdown,choose Country. Give some name to Corrected and Status Output alias.Click OK.Add a Row sampling and a data viewer.
Run the package.The output in the DataViewer is as under
The first 4 rows is easy to understand as it has acted in the way we have defined in the rule. Country lengthmust be >=5 and hence USA Country status is Invalid. It cannot have any (.) hence, U.K. is invalid. The name England must be changed to United Kingdom which is being reflected. Since there is no enter for Vatican City, hence the Country status is “Unknown”.And it is a success
N.B.~This operation is an asynchronous one as can be reveal from the below image
III.Load files with variable row formats
It was not at all a smooth approach to load file contents where the row formats are variable.Consider the below example
In this example,we can figure out that we have some parent-child content in the text file.The headers contains the information about the managers while the detail section about the employees that works under the managers
In the Pre-Denali CTP 3 approach, in the Flat File source component, we could have previwed the content as under
As can be figured out that, the record set has been messed up as highlighted in red color.One of the way to handle the same in PreDenali CTP 3 approach is to change the source as under
And then the preview for the Flat file source component will be as under
Suppose, now we want to segregate the data as Manager records and ManagerEmployee Records, so we can go ahead with by adding a conditional split componet and specifying the below condition
Then adding RowSampling as under, we can get the records as
But the case in different in Denali CTP 3. When we load such a file, the preview looks as under
Running the package with the same design, yields the below result
IV.Displaying Success and failures in different styles
In Pre-Denali CTP 3 approach, the way of representing a success / failure is as under
But Denali CTP 3 has changed that to
V.Sort packages by name
Another small feature is that, we can now sort packages by their name as shown under
Before sorting package name
After sorting package name
VI.Shared Connection Managers
In earlier version , it is not possible to share connection among the packages. For example ,in Package1, we are using a Flat file source component.
As can be seen that we have a Connection Manager by the name TxtFileConnManager in Package1.dtsx.Now let us create another package by the name Package2.dtsx
As can be figure out that there is no option to reuse the existing connection. We need to create a new connection.
But in Denali CTP 3, we have the Connection Manager by which we can share the connection
Clicking on the Connection Manager, opens up the SSIS Connection Manager window
Choose FLATFILE and click on ADD button and give the Connection Manager Name.
Click OK.
As can be seen that our new connection manager has been added. Since it is shared so we can access it in any package.Let us open Package1.dtsx and we will find the Connection Manager at the down
The same applies for Package2.dtsx.
VII.New expression language functions(Left,Token,TokenCount)
We have three new functions in the expression language viz. Left , Token and TokenCount.
Let us analyze these functions one by one
Left:Returns the left part of a character string with the specified number of characters.This function is already there in TSql for a long time. Also it's counter part Right was already available in SSIS for a long time.
So if we run the below code in the Expression Task, we will get the following output
@[User::Result1] = LEFT("Test for Left function", 4)
The left function in this case strips out the first 4 character from the supplied string
Token:Returns the specified occurrence of a token in a string
So if we run the below code in the Expression Task, we will get the following output
@[User::Result1] = TOKEN("Test for Token Function"," ",2)
Because for appeared in the second place
TokenCount:Returns the number of tokens in a string
So if we run the below code in the Expression Task, we will get the following output
@[User::Result2] = TOKENCOUNT("Test for TokenCount function"," ")
Because there are 4 tokens
We can even combine Token and TokenCount as under
@[User::Result1] = TOKEN("Test for Token Function"," ", TOKENCOUNT("Test for TokenCount function"," "))
Because, TokenCount function yields 4 and in the fourth place , we have "Function"
VIII.Package Parameter Tab
We now have a dedicated tabe for Package-scoped Parameters within the designer
IX.Removal of Data Sources and Data Source Views folders
In SSIS 2008 R2
In Denali CTP 3
X.Solution Explorer now have Project Parameters node
In Denali CTP1, Project Parameters were visible upon right clicking in the Solution Explorer.But in Denali CTP 3 , they have their own seperate node
In the current series we have seen some of the cool enhancements made in SSIS Denali CTP 3 like Expression Task,DQS Cleansing Component,Load files with variable row formats , New expression language functions(Left,Token,TokenCount) etc.But that is not all.There are many more to explore.I will keep on adding those topics as and when I will explore them. So stay tune and share your opinion about the article.