Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Calculate aggregates for dynamic columns using UNPIVOT

5.00/5 (1 vote)
17 Jan 2013CPOL3 min read 32.4K   48  
This tip describes how to calculate an aggregate for dynamically defined columns using UNPIVOT clause.

Introduction 

Sometimes an aggregate calculation such as SUM or AVG is needed on dynamically defined columns. For example, if you have several columns in a table, you may need to calculate a SUM for columns 1 and 2 or in another situation for columns 2 and 3.

This can be done in several ways, for example using a cursor, fetching all the values and aggregating the needed result. Another way could be to define a condition for each column if the column is included in the calculation or not. Based on that information, do the aggregation and so on.

This tip demonstrates how this task can be accomplished by using UNPIVOT with dynamically defined set of columns.  

Test data 

First we need some test data. For that let's create a table.

SQL
CREATE TABLE TestData (
   RowId int,
   Col1  int,
   Col2  int,
   Col3  int
); 

And populate it with some data.

SQL
INSERT INTO TestData VALUES (1, 1, 2, 3);
INSERT INTO TestData VALUES (2, 4, 5, 6);
INSERT INTO TestData VALUES (3, 7, 8, 9);
INSERT INTO TestData VALUES (4, 10, 11, 12);

Intermediate results 

To understand the solution, let's have a look how the data from separate columns is transferred to rows.

For the first situation, values from columns Col1 and Col2 are needed in a single result column in order to calculate a sum. For this, the following statement can be used.

SQL
SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col1, Col2 
      FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col1, Col2)) AS td2; 

The query above selects the values from both Col1 and Col2 and places them in the result column called ColValues. The name of the source column is placed in ColHeader. The RowId is included to show from which row the data is coming from. 

So the result looks like the following

RowId   ColHeader   ColValues
-----   --------    ---------
1       Col1        1
1       Col2        2
2       Col1        4
2       Col2        5
3       Col1        7
3       Col2        8
4       Col1        10
4       Col2        11 

The same query to get values from Col2 and Col3 would look like 

SQL
SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col2, Col3 
      FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col2, Col3)) AS td2; 

And the result 

RowId   ColHeader   ColValues
-----   --------    ---------
1       Col2        2
1       Col3        3
2       Col2        5
2       Col3        6
3       Col2        8
3       Col3        9
4       Col2        11
4       Col3        12 

To calculate the aggregate 

Now, when we have the intermediate results, let's do the aggregate calculation. In order to keep this in a single query, the queries above are used as inline view in the SQL statement 

SQL
SELECT SUM(iv.ColValues) AS TotalSum
FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
      FROM (SELECT RowId, Col1, Col2 
            FROM TestData) td
      UNPIVOT (ColValues FOR ColHeader IN (Col1, Col2)) AS td2
	  ) iv;  

So the previous example query is just surrounded with an outer SELECT statement and the aggregation is done in the outermost statement. The result from the query above is

TotalSum
--------
48 

The same query for case 2 (columns Col2 and Col3) would be

SQL
SELECT SUM(iv.ColValues) AS TotalSum
FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
      FROM (SELECT RowId, Col2, Col3 
            FROM TestData) td
      UNPIVOT (ColValues FOR ColHeader IN (Col2, Col3)) AS td2
	  ) iv;

And the result

TotalSum
--------
56 

To dynamically select the columns to include in the calculation

The last step of this tip is to include a variable to define which columns are used for the calculation.  For the first test case the code could look like the following 

SQL
DECLARE @columns VARCHAR(1000);
SET @columns = 'Col1, Col2';
 
EXECUTE ('SELECT SUM(iv.ColValues) AS TotalSum
          FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
                FROM (SELECT RowId, ' + @columns + ' 
                      FROM TestData) td
                UNPIVOT (ColValues FOR ColHeader IN (' + @columns + ')) AS td2
	       ) iv'); 

So basically the query is just the same. The only difference is that the columns to be unpivoted are included via a variable which of course can be changed at run-time without modifying the skeleton of the statement. The statement itself is executed using EXECUTE statement. So the example above would result to

TotalSum
--------
48 

For the second test case, let's just modify the values for the @columns variable  

SQL
...
SET @columns = 'Col2, Col3';
... 

And the result would be

TotalSum
--------
56 

Few things to keep in mind

There are few things that should be observed:

  • Since the SQL statement is embedded inside a string, the syntax isn't verified until the statement is actually executed.
  • The column names are just concatenated to the SQL statement, so if the column names are coming from an outside source (such as parameters to a stored procedure) the input should be verified properly to prevent SQL injections etc. 
  • Even though both examples use two columns, the same code works for unspecified number of columns, including just a single column (have a try only with Col1 Smile | <img src=  )  

The download contains all the statements used in this tip. Have fun. 

History

  • December 28th, 2012: Tip created. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)