Introduction
Sometimes an aggregate calculation such as SUM
or AVG
is needed on dynamically defined columns. For example, if you have several columns in a table, you may need to calculate a SUM
for columns 1 and 2 or in another situation for columns 2 and 3.
This can be done in several ways, for example using a cursor, fetching all the values and aggregating the needed result. Another way could be to define a condition for each column if the column is included in the calculation or not. Based on that information, do the aggregation and so on.
This tip demonstrates how this task can be accomplished by using UNPIVOT
with dynamically defined set of columns.
Test data
First we need some test data. For that let's create a table.
CREATE TABLE TestData (
RowId int,
Col1 int,
Col2 int,
Col3 int
);
And populate it with some data.
INSERT INTO TestData VALUES (1, 1, 2, 3);
INSERT INTO TestData VALUES (2, 4, 5, 6);
INSERT INTO TestData VALUES (3, 7, 8, 9);
INSERT INTO TestData VALUES (4, 10, 11, 12);
Intermediate results
To understand the solution, let's have a look how the data from separate columns is transferred to rows.
For the first situation, values from columns Col1
and Col2
are needed in a single result column in order to calculate a sum. For this, the following statement can be used.
SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col1, Col2
FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col1, Col2)) AS td2;
The query above selects the values from both Col1
and Col2
and places them in the result column called ColValues
. The name of the source column is placed in ColHeader
. The RowId
is included to show from which row the data is coming from.
So the result looks like the following
RowId ColHeader ColValues
----- -------- ---------
1 Col1 1
1 Col2 2
2 Col1 4
2 Col2 5
3 Col1 7
3 Col2 8
4 Col1 10
4 Col2 11
The same query to get values from Col2
and Col3
would look like
SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col2, Col3
FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col2, Col3)) AS td2;
And the result
RowId ColHeader ColValues
----- -------- ---------
1 Col2 2
1 Col3 3
2 Col2 5
2 Col3 6
3 Col2 8
3 Col3 9
4 Col2 11
4 Col3 12
To calculate the aggregate
Now, when we have the intermediate results, let's do the aggregate calculation. In order to keep this in a single query, the queries above are used as inline view in the SQL statement
SELECT SUM(iv.ColValues) AS TotalSum
FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col1, Col2
FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col1, Col2)) AS td2
) iv;
So the previous example query is just surrounded with an outer SELECT
statement and the aggregation is done in the outermost statement. The result from the query above is
TotalSum
--------
48
The same query for case 2 (columns Col2 and Col3) would be
SELECT SUM(iv.ColValues) AS TotalSum
FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, Col2, Col3
FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (Col2, Col3)) AS td2
) iv;
And the result
TotalSum
--------
56
To dynamically select the columns to include in the calculation
The last step of this tip is to include a variable to define which columns are used for the calculation. For the first test case the code could look like the following
DECLARE @columns VARCHAR(1000);
SET @columns = 'Col1, Col2';
EXECUTE ('SELECT SUM(iv.ColValues) AS TotalSum
FROM (SELECT td2.RowId, td2.ColHeader, td2.ColValues
FROM (SELECT RowId, ' + @columns + '
FROM TestData) td
UNPIVOT (ColValues FOR ColHeader IN (' + @columns + ')) AS td2
) iv');
So basically the query is just the same. The only difference is that the columns to be unpivoted are included via a variable which of course can be changed at run-time without modifying the skeleton of the statement. The statement itself is executed using EXECUTE
statement. So the example above would result to
TotalSum
--------
48
For the second test case, let's just modify the values for the @columns
variable
...
SET @columns = 'Col2, Col3';
...
And the result would be
TotalSum
--------
56
Few things to keep in mind
There are few things that should be observed:
- Since the SQL statement is embedded inside a string, the syntax isn't verified until the statement is actually executed.
- The column names are just concatenated to the SQL statement, so if the column names are coming from an outside source (such as parameters to a stored procedure) the input should be verified properly to prevent SQL injections etc.
- Even though both examples use two columns, the same code works for unspecified number of columns, including just a single column (have a try only with
Col1
)
The download contains all the statements used in this tip. Have fun.
History
- December 28th, 2012: Tip created.