Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

SSIS Do I Union All or Merge??

4.43/5 (3 votes)
28 Mar 2014CPOL2 min read 37.1K   1  
SSIS - Do I Union All or Merge?

The Union All and Merge SSIS data flow transformations provide confusingly similar functionality by combining result sets. So what are the differences and when should you use one or the other? Despite the almost identical results, there are some clearly defined rules as when to use the Union All over the Merge and visa versa. The Union All should be used over Merge when:

  1. The transformation inputs are not sorted
  2. The combined output does not need to be sorted
  3. The transformation has more than two inputs

Probably the most notable limitation of a Merge transformation that the Union All overcomes is that a Merge can accept ONLY two inputs while the Union All supports multiple inputs. Another detractor to the Merge is that both inputs must be sorted. This can be done by using a Sort transformation task before the Merge or using an ORDER BY clause in the source and setting the IsSorted property and setting the SortKeyPosition of the columns in the sort order desired. Technet outlines setting the sort order here.

Both transformations require columns in the inputs have matching metadata, for example they must have compatible data types. This requirement is the same as any T-SQL set operator.

To demonstrate these two transformations, create an SSIS package first with an Execute SQL task that will be used to populate two different tables with an identical schema to be used for both transformations:

SQL
USE tempdb;
GO

IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘employees’)BEGIN
DROP TABLE employeesEND

IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘customers’)BEGIN
DROP TABLE customers
END

SELECT TOP 10 BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
INTO employees
FROM AdventureWorks2012.Person.Person 
ORDER BY BusinessEntityID

SELECT BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
INTO customers
FROM AdventureWorks2012.Person.Person _
WHERE BusinessEntityID IN(3, 5, 8, 277, 45, 14847, 3456, 76, 9874, 15937) 

Now, add a data flow task that will be used for the Union All transformations with two OLEDB data sources, one for the customers table and one for the employees. Then, add a Union All transformation and connect both sources:

image Add a flat file destination and connect the Union All transformation.

imageAdd another data flow task and again add two sources, one for customers and employees, each source should use a query with an ORDER BY clause using the BusinessEntityID:

SQL
SELECT BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
FROM employees/customers
ORDER BY BusinessEntityID

Add a Merge transformation and connect both sources to the Merge transformation. An error now appears saying that the IsSorted property must be configured for the Merge transformation. Right click on both customer and employee sources and go to the advanced editor and select the Input and Output Properties tab and select the OLE DB Source Output. In the Common Properties pane, set the IsSorted property to True:

image Once the IsSorted property has been set, you must still set the column(s) that are sorted. Open the OLE DB Source Output and the Output Columns and select the BusinessEntityID and set the SortKeyPosition to 1:

image Again, add a flat file destination and connect the Merge transformation.

Running the package and reviewing the output files shows near identical results with the exception that the NamesMerged file has the same records, but they are ordered by the BusinessEntityID.

The sample package outlined can be downloaded here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)