Introduction
When using data from a relational database, a denormalized form, like a spread sheet, often gives us a better presentation and understanding. One of the challenges a developer faces is data transformation from rows to columns. In this article, I’ll discuss a custom solution using Dynamic SQL.
Problem to Solve
A fictional database, EmployeeDB, is created, which keeps employee’s contact, education and job history information. There are three source tables, Employee
, Education
and Jobhistory
, that hold relational data, and three target tables, Trans_Employee
, Trans_Education
and Trans_JobHistory
, in which the data transformed from rows to columns are saved. Let’s take a look at a screen shot below. It shows the education records for EmployeeID=3.
The top portion of the image lists three records, consisting of three columns, SchoolName
, Degree
and YearOfGraducation
. They are selected from Education
table. The bottom part illustrates the transformed data saved in target table Trans_Education
. The colored boxes and arrows indicate the positions of the rows from Education
table to the columns in Trans_Education
table. 3 rows are transformed to 9 columns.
These two tables are defined as follows.
Education (source)
Trans_Education (target)
It can be seen in the target Trans_Education
table that column names follow a naming convention:
A columnname from_Education_table
+ a hyphen(-)
+ a corresponding trailing number(1, 2, or 3)
The trailing number matches a row number in the selected records from Education
table. In other words, the first row is placed into the first group of 3 columns with a trailing number “-1”, the second into that with a trailing number “-2” and the third into that with a trailing number “-3”. We only transform a maximum of three rows here. Therefore, in the target Trans_Education
table, there are three groups of columns with a trailing number 1, 2 and 3 respectively. If more rows need to be transformed, more columns have to be added to Trans_Education
table.
Employee’s job history records consist of five columns (excluding two ID columns). Similar to the above, a maximum of 3 records are selected from source JobHistory
table and are transformed to 15 columns in Trans_JobHistory
table. The screen shot that illustrates data transformation, and the definition of source JobHistory
and target Trans_JobHistory
are shown below.
JobHistory (source)
Trans_JobHistory (target)
For employee’s contact information, we’ll transform 1 row from source Employee
table to 6 columns in Trans_Employee
table. Screen shot for data transformation, and source Employee
table and target Trans_Employee
table are presented below.
Employee (source)
Trans_Employee (target)
Although only 1 row is involved in transformation, the same naming convention is used in target Trans_Employee
table in order to achieve a certain level of automation using dynamic SQL that is to be discussed later.
Logical Steps to Transform Rows to Columns
Now let’s look at logical steps required to transform rows from source Education
table to columns in target Trans_Education
table for EmployeeID=3.
Step 1: insert a new record into target Trans_Education
table with a value of 3 for the EmployeeID column and the rest of columns equal to NULL (default value). The NULL columns will be updated later with appropriate rows pulled from the source Education table. If a record for this employee exists in the target table, the record should be deleted before inserting a new one.
Listing 1:
DELETE FROM Trans_Education WHERE EmployeeID = 3
INSERT INTO Trans_Education(EmployeeID) VALUES( 3 )
Step 2: pull top 3 records from the Education
table and place them into a global temporary table, ##GlobalTempTable
.
Listing 2:
SELECT TOP 3 [SchoolName],[Degree],[YearOfGraduation]
INTO ##GlobalTempTable
FROM [Education] WHERE EmployeeID=3
ORDER BY YearOfGraduation DESC
Using a global temporary table may not seem to be necessary. However, when all required SQL statements are prepared and placed into their respective stored procedures, a local temporary table created in one stored procedure is out of scope in another. The global temporary table avoids that issue.
Step 3: declare a SQL cursor populated with the data from ##GlobalTempTable
. Iterate through each record to update relevant columns of the record created in Step 1 in the Trans_Education
table.
Listing 3:
DECLARE @p1 VARCHAR(250), @p2 VARCHAR(250), @p3 VARCHAR(250)
DECLARE @Counter INT, @UpdateRevised NVARCHAR(4000), @Params NVARCHAR(4000)
SELECT @Counter = 0 , @params = '@p1 varchar(250), @p2 varchar(250), @p3 varchar(250)'
DECLARE DataCursor CURSOR FOR
SELECT [SchoolName],[Degree],[YearOfGraduation]
FROM ##GlobalTempTable
OPEN DataCursor;
FETCH NEXT FROM DataCursor INTO @p1, @p2, @p3
WHILE @@Fetch_Status = 0
BEGIN
SELECT @Counter = @Counter + 1
SELECT @UpdateRevised =
REPLACE('update Trans_Education
set [SchoolName-0]=@p1,
[Degree-0]=@p2,
[YearOfGraduation-0]=@p3
Where EmployeeID=3','-0','-' + CONVERT(VARCHAR, @Counter))
EXECUTE sp_ExecuteSql @UpdateRevised, @params, @p1, @p2, @p3
FETCH NEXT FROM DataCursor INTO @p1, @p2, @p3
END
CLOSE DataCursor
DEALLOCATE DataCursor
DROP TABLE ##GlobalTempTable
In Listing 3, a SQL system stored procedure sp_ExecuteSql
is executed to update relavant columns. The stored procedure has the following syntax:
EXECUTE sp_ExecuteSql
@UpdateRevised,
@params,
@p1, @p2, @p3
Parameters @p1
, @p2
, @p2
and @params
are declared based on the the requirements of the system stored procedure. The reason that only three parameters, @p1
, @p2
and @p3
, are declared instead of more or less is that each row consists of three columns.
The initial update statement is:
Listing 4:
update Trans_Education
set [SchoolName-0]=@p1, [Degree-0]=@p2, [YearOfGraduation-0]=@p3
Where EmployeeID=3
Notice that column names have a trailing number “-0”. In the first fetch, @p1
, @p2
and @p3
are assigned with values from the first record in the cursor, and the trailing number “-0” is replaced by “-1”. As a result, the statement becomes:
Listing 5:
update Trans_Education
set [SchoolName-1]=@p1, [Degree-1]=@p2, [YearOfGraduation-1]=@p3
Where EmployeeID=3
The modifed update statement is placed in a parameter @UpdateRevised
and is then executed with the system stored procedure sp_ExecuteSql
to update the first group of 3 columns that have a trailing number “-1” in Trans_Education
table. Subsequently, in next two fetches, trailing numbers are incremented through a parameter @Counter
and the udpate statements are modified to update other columns with trailing number “-2” and “-3” respectively as listed in Listing 6.
Listing 6:
update Trans_Education
set [SchoolName-2]=@p1, [Degree-2]=@p2, [YearOfGraduation-2]=@p3
Where EmployeeID=3
update Trans_Education
set [SchoolName-3]=@p1, [Degree-3]=@p2, [YearOfGraduation-3]=@p3
Where EmployeeID=3
The above three steps have transformed an employee’s education records from rows to columns. In a similar fashion, SQL scripts may be prepared to do transformation for employee’s job history records. In this case, five parameters @p1
through @p5
should be declared. Top 3 records should be pulled from JobHistory
table and saved into ##GlobalTempTable
. The update statements are written according to the definitions of Trans_JobHistory
table.
For employee contact information, six parameters @p1
through @p6
should be declared. Top 1 record should be pulled and saved. Its update statements are based on the definitions of Trans_Employee
table.
Obviously,these are repeating tasks with different number of parameters and different update statements associated with a particular target table. Instead of re-writing similar SQL statements over and over, we need a way to automate the process. Here is where dynamic SQL becomes handy.
Dynamic SQL
In SQL database, definitions for user defined tables are saved in system tables. We are able to see the definitions, such as column name, number of columns, data type, size, etc. from the system tables. With this information, it is possible to construct SQL statements dynamically to transform data from rows to columns for any target table involved.
The stored procedure, sp_EmployeeDB_DataTransformation_Data_Processing, in Listing 7 presents complete SQL scripts that dynamically builds and executes SQL statements for the data transformation.
Listing 7:
CREATE PROCEDURE [dbo].[sp_EmployeeDB_DataTransformation_Data_Processing]
@EmployeeID int
,@TargetTableName VARCHAR(100)
AS
BEGIN
SET NOCOUNT ON
DECLARE @SqlTemp VARCHAR(1000)
SELECT @SqlTemp='delete from ' + @TargetTableName + ' where EmployeeID=' + CONVERT(VARCHAR, @EmployeeID) + CHAR(10)
SELECT @SqlTemp=@SqlTemp + 'insert into ' + @TargetTableName + '(EmployeeID) values(' + CONVERT(VARCHAR, @EmployeeID) + ')'
EXECUTE(@SqlTemp)
IF @TargetTableName='Trans_Employee'
BEGIN
EXEC sp_EmployeeDB_Employee_Select @EmployeeID
END
IF @TargetTableName='Trans_Education'
BEGIN
EXEC sp_EmployeeDB_Education_Select @EmployeeID
END
IF @TargetTableName='Trans_JobHistory'
BEGIN
EXEC sp_EmployeeDB_JobHistory_Select @EmployeeID
END
DECLARE @Tbl_Column_Info TABLE(ID INT IDENTITY(1, 1),ColName VARCHAR(150))
INSERT INTO @Tbl_Column_Info(ColName)
SELECT REPLACE(c.Name, '-1', '')
FROM sys.syscolumns c JOIN sys.sysobjects o ON c.id = o.id
WHERE o.Name = @TargetTableName AND SUBSTRING(c.Name, PATINDEX('%-%',c.Name), 2)='-1'
DECLARE @ParamDefinitions VARCHAR(1000),@CursorParameterList VARCHAR(500)
,@CursorSelectStatement VARCHAR(4000) ,@UpdateSql VARCHAR(8000)
SELECT @ParamDefinitions='', @CursorParameterList='', @CursorSelectStatement='select '
SELECT @UpdateSql='update ' + @TargetTableName + ' set ' + CHAR(9)
DECLARE @ColumnName VARCHAR(100), @NextRowID INT, @CurrentRowID INT
SELECT @NextRowID=0, @CurrentRowID=0
SELECT TOP 1 @NextRowID = ID, @ColumnName = ColName FROM @Tbl_Column_Info ORDER BY ID
WHILE @NextRowID IS NOT NULL
BEGIN
SELECT @CurrentRowID = @NextRowID
SELECT @ParamDefinitions=@ParamDefinitions + '@p' + CONVERT(varchar, @CurrentRowID) + ' varchar(250), ';
SELECT @CursorParameterList=@CursorParameterList + '@p' + CONVERT(varchar, @CurrentRowID) + ', ';
SELECT @CursorSelectStatement = @CursorSelectStatement + '[' + @ColumnName + '],'
SELECT @UpdateSql = @UpdateSql + '[' + @ColumnName + '-0]=@p' + CONVERT(varchar(5), @CurrentRowID) + ', '
SELECT @NextRowID=NULL
SELECT TOP 1 @NextRowID = ID, @ColumnName = colName FROM @Tbl_Column_Info
WHERE ID > @CurrentRowID ORDER BY ID
END
SELECT @ParamDefinitions=SUBSTRING(@ParamDefinitions, 1, LEN(@ParamDefinitions)-1)
SELECT @CursorParameterList=SUBSTRING(@CursorParameterList, 1, LEN(@CursorParameterList)-1)
SELECT @CursorSelectStatement=SUBSTRING(@CursorSelectStatement, 1, LEN(@CursorSelectStatement)-1) + CHAR(10) + 'from ##GlobalTempTable' + CHAR(10)
SELECT @UpdateSql=LEFT(@UpdateSql, LEN(@UpdateSql)-1) + ' Where EmployeeID=' + CONVERT(VARCHAR, @EmployeeID)
DECLARE @CursorSql VARCHAR(8000)
SELECT @CursorSql='Declare ' + @ParamDefinitions + CHAR(10)
SELECT @CursorSql=@CursorSql + 'Declare @Counter INT, @UpdateRevised nvarchar(4000), @Params nvarchar(4000) ' + CHAR(10)
SELECT @CursorSql=@CursorSql + 'select @Counter=0, @params=' + QUOTENAME(@ParamDefinitions, CHAR(39)) +''+ CHAR(10)
SELECT @CursorSql=@CursorSql + 'Declare DataCursor CURSOR FOR ' + CHAR(10) + @CursorSelectStatement + CHAR(10)
SELECT @CursorSql=@CursorSql + 'Open DataCursor;' + CHAR(10) + 'Fetch next from DataCursor into ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + 'While @@Fetch_Status=0 ' + CHAR(10) + 'Begin ' + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'select @Counter=@Counter + 1 ' + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'select @UpdateRevised=REPLACE(''' + @UpdateSql + ''', ''-0'', ''-'' + CONVERT(VARCHAR, @Counter))'+ CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'Execute sp_ExecuteSql @UpdateRevised, @params, ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'Fetch next from DataCursor into ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + 'End ' + CHAR(10) + 'Close DataCursor' + CHAR(10) + 'Deallocate DataCursor' + CHAR(10)
EXEC(@CursorSql)
DROP TABLE ##GlobalTempTable
END
This is a relatively long stored procedure with intensive string concatenation. Let’s take a detailed look to see how it works. Two parameters are passed in: @EmployeeID
– for retrieving employee’s data and @TargetTableName
- a target table name.
First of all, remove the record created in a previous execution from the target table and then insert a new record with the passed-in value of @EmployeeID
and the rest of the columns default to NULL. See Listing 8.
Listing 8:
DECLARE @SqlTemp VARCHAR(1000)
SELECT @SqlTemp='delete from ' + @TargetTableName + ' where EmployeeID=' + CONVERT(VARCHAR, @EmployeeID) + CHAR(10)
SELECT @SqlTemp=@SqlTemp + 'insert into ' + @TargetTableName + '(EmployeeID) values(' + CONVERT(VARCHAR, @EmployeeID) + ')'
EXECUTE(@SqlTemp)
Secondly, retrieve relevant records for this employee and save the records into ##GlobalTempTable
as shown in Listing 9. Data retrieval is accomplished through a stored procedure that is related a particular target table. For Trans_Education, sp_EmployeeDB_Education_Select @EmployeeID
is executed. For Trans_JobHistory, sp_EmployeeDB_JobHistory_Select @EmployeeID
, and for Trans_Employee table, sp_EmployeeDB_Employee_Select @EmployeeID
.
It should be noted that, in this demo, data is simply selected from one of the source tables, Education, JobHistory
or Employee
. However, in a real business application, the data could come from various tables, views, functions and stored procedures with complex joints and conditions as long as the returned columns match that in the target table.
Listing 9:
IF @TargetTableName='Trans_Employee'
BEGIN
EXEC sp_EmployeeDB_Employee_Select @EmployeeID
END
IF @TargetTableName='Trans_Education'
BEGIN
EXEC sp_EmployeeDB_Education_Select @EmployeeID
END
IF @TargetTableName='Trans_JobHistory'
BEGIN
EXEC sp_EmployeeDB_JobHistory_Select @EmployeeID
END
Thirdly, utilize system tables, syscolumns
and sysobjects
, to pull out columns of the target table into a table variable, @Tbl_Column_Info
, as shown in Listing 10. Only first group of column names with trailing number “-1” are pulled. The trailing number is removed during the process so that the column names are the same as that in its relevant source table and an appropirate trailing numbers will be added dynamically.
Listing 10:
DECLARE @Tbl_Column_Info TABLE(ID INT IDENTITY(1, 1),ColName VARCHAR(150))
INSERT INTO @Tbl_Column_Info(ColName)
SELECT REPLACE(c.Name, '-1', '')
FROM sys.syscolumns c JOIN sys.sysobjects o ON c.id = o.id
WHERE o.Name = @TargetTableName AND SUBSTRING(c.Name, PATINDEX('%-%',c.Name), 2)='-1'
Fourthly, loop through the table variable @Tbl_Column_Info
to create a parameter definition string: @ParamDefinitions
, a parameter list: @CursorParameterList<code>
, and to build SQL cursor's select statement: @CursorSelectStatement
and to build an update statement: @UpdateSql
. For simplicity, parameters are all declared as varchar(250) which are long enough for all columns involved. Parameter declarations are based on the requirement of the SQL system stored procedure – sp_ExecuteSql
, as described earlier. This work is accomplished using SQL scripts in Listing 11.
Listing 11:
DECLARE @ParamDefinitions VARCHAR(1000),@CursorParameterList VARCHAR(500)
,@CursorSelectStatement VARCHAR(4000) ,@UpdateSql VARCHAR(8000)
SELECT @ParamDefinitions='', @CursorParameterList='', @CursorSelectStatement='select '
SELECT @UpdateSql='update ' + @TargetTableName + ' set ' + CHAR(9)
DECLARE @ColumnName VARCHAR(100), @NextRowID INT, @CurrentRowID INT
SELECT @NextRowID=0, @CurrentRowID=0
SELECT TOP 1 @NextRowID = ID, @ColumnName = ColName FROM @Tbl_Column_Info ORDER BY ID
WHILE @NextRowID IS NOT NULL
BEGIN
SELECT @CurrentRowID = @NextRowID
SELECT @ParamDefinitions=@ParamDefinitions + '@p' + CONVERT(varchar, @CurrentRowID) + ' varchar(250), ';
SELECT @CursorParameterList=@CursorParameterList + '@p' + CONVERT(varchar, @CurrentRowID) + ', ';
SELECT @CursorSelectStatement = @CursorSelectStatement + '[' + @ColumnName + '],'
SELECT @UpdateSql = @UpdateSql + '[' + @ColumnName + '-0]=@p' + CONVERT(varchar(5), @CurrentRowID) + ', '
SELECT @NextRowID=NULL
SELECT TOP 1 @NextRowID = ID, @ColumnName = colName FROM @Tbl_Column_Info
WHERE ID > @CurrentRowID ORDER BY ID
END
SELECT @ParamDefinitions=SUBSTRING(@ParamDefinitions, 1, LEN(@ParamDefinitions)-1)
SELECT @CursorParameterList=SUBSTRING(@CursorParameterList, 1, LEN(@CursorParameterList)-1)
SELECT @CursorSelectStatement=SUBSTRING(@CursorSelectStatement, 1, LEN(@CursorSelectStatement)-1) + CHAR(10) + 'from ##GlobalTempTable' + CHAR(10)
SELECT @UpdateSql=LEFT(@UpdateSql, LEN(@UpdateSql)-1) + ' Where EmployeeID=' + CONVERT(VARCHAR, @EmployeeID)
Finally, Listing 12 puts everything together to create a SQL cursor and to execute the dynamically constructed SQL scripts to transform the rows saved in ##GlobalTempTable
to the columns in the target table. Listing 3 in a previous section are the sample scripts generated by the stored procedure when @EmployeeID
=3 and @TargetTableName
=’Trans_Education’ are passed in.
Listing 12:
DECLARE @CursorSql VARCHAR(8000)
SELECT @CursorSql='Declare ' + @ParamDefinitions + CHAR(10)
SELECT @CursorSql=@CursorSql + 'Declare @Counter INT, @UpdateRevised nvarchar(4000), @Params nvarchar(4000) ' + CHAR(10)
SELECT @CursorSql=@CursorSql + 'select @Counter=0, @params=' + QUOTENAME(@ParamDefinitions, CHAR(39)) +''+ CHAR(10)
SELECT @CursorSql=@CursorSql + 'Declare DataCursor CURSOR FOR ' + CHAR(10) + @CursorSelectStatement + CHAR(10)
SELECT @CursorSql=@CursorSql + 'Open DataCursor;' + CHAR(10) + 'Fetch next from DataCursor into ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + 'While @@Fetch_Status=0 ' + CHAR(10) + 'Begin ' + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'select @Counter=@Counter + 1 ' + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'select @UpdateRevised=REPLACE(''' + @UpdateSql + ''', ''-0'', ''-'' + CONVERT(VARCHAR, @Counter))'+ CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'Execute sp_ExecuteSql @UpdateRevised, @params, ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + CHAR(9) + 'Fetch next from DataCursor into ' + @CursorParameterList + CHAR(10)
SELECT @CursorSql=@CursorSql + 'End ' + CHAR(10) + 'Close DataCursor' + CHAR(10) + 'Deallocate DataCursor' + CHAR(10)
EXEC(@CursorSql)
DROP TABLE ##GlobalTempTable
That is how the SQL statements for data transformation are dynamically constructed and executed. To process all target tables for a particular employee, run the procedure multiple times by passing a different target table name in each run as illustrated in Listing 13.
Listing 13:
CREATE PROCEDURE [dbo].[sp_EmployeeDB_DataTransformation_Main]
@EmployeeID int
AS
BEGIN
SET NOCOUNT ON
EXEC sp_EmployeeDB_DataTransformation_Data_Processing @EmployeeID, 'Trans_Employee'
EXEC sp_EmployeeDB_DataTransformation_Data_Processing @EmployeeID, 'Trans_Education'
EXEC sp_EmployeeDB_DataTransformation_Data_Processing @EmployeeID, 'Trans_JobHistory'
END
The transformed data may be presented in a SQL view or in a stored procedure using Trans_Employee
table left join Trans_Education
and Trans_JobHistory
, similar to Listing 14 below.
Listing 14:
SELECT emp.[EmployeeID] ,
[FirstName-1] ,
[LastName-1] ,
[PhoneNumber-1] ,
[Fax-1] ,
[Email-1] ,
[DateHired-1] ,
[SchoolName-1] ,
[Degree-1] ,
[YearOfGraduation-1] ,
[SchoolName-2] ,
[Degree-2] ,
[YearOfGraduation-2] ,
[SchoolName-3] ,
[Degree-3] ,
[YearOfGraduation-3] ,
[Company-1] ,
[Position-1] ,
[Class-1] ,
[Supervisor-1] ,
[TerminationDate-1] ,
[Company-2] ,
[Position-2] ,
[Class-2] ,
[Supervisor-2] ,
[TerminationDate-2] ,
[Company-3] ,
[Position-3] ,
[Class-3] ,
[Supervisor-3] ,
[TerminationDate-3]
FROM Trans_Employee emp
LEFT JOIN Trans_Education edu ON emp.EmployeeID = edu.EmployeeID
LEFT JOIN Trans_JobHistory his ON his.EmployeeID = emp.EmployeeID
WHERE emp.EmployeeID = @EmployeeID
Download
The download consists of a SQL script file for the entire sample database, EmployeeDB, and an ASP.NET web site project.
Open the SQL script file in SSMS. Make sure that the database folder path at the beginning of the scripts exists on your local machine. If it does not exist, create one or revise the scripts to point to a different folder on your machine. After EmployeeDB database is created, execute the stored procedure, sp_EmployeeDB_DataTransformation_Main @EmployeeID, to transform data and sp_EmployeeDB_DataTransformation_Result_Select @EmployeeID to view results.
The SQL version is: Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) Apr 2 2010 15:48:46 Copyright (c) Microsoft Corporation Express Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
The ASP.NET web site project, DynamicSql, presents another mean to execute stored procedures and to view results. The database, EmployeeDB, is located inside App_Data folder. Load the web site project into Visual Studio. Right click on the default.aspx and select View in Browser from contact menu to run the application.
The Visual Studio version in the demo is: Visual Studio 2012, Update 1.
Further Study
Simplifications have been applied in the demo. For example, only varchar data type is used for all columns in target tables. In addition, parameters involved in sp_ExecuteSql execution are all declared as varchar(250). These simplifications make the process of building SQL statements much easier and are acceptable in many business cases. However, there might be times that different data types are strictly required. Further work needs to be done in this area.
Another possible issue is that the SQL scripts cannot handle concurrent execution by multiple users. Since a single name for the global temporary table - ##GlobalTempTable
is used, collision could occur during concurrent execution. To avoid the issue, a possible modification is to append a login user name to the end of the global temporary table name, like ##GlobalTempTable_[LoginUserName]
. Accordingly, the login user name should be included in the target tables as well as in the dynamically constructed SQL scripts.
Reference
MSDN: sp_ExecuteSql (Transaction SQL)