Introduction
Overall, formatting the output data from the database server should never be done in SQL. There are a lot of better tools and languages to do the formatting. SQL is primarily designed for data retrieval and manipulation, not for creating fancy output.
Having that said, a question that often arises is, "How can I remove repeating values for a column in my result set?". An easy answer lies in the LAG
function: LAG (Transact-SQL). The LAG
function retrieves the value from the previous row in the result set using a defined order. By using this function, we can compare the value from the previous row to the value on the current row. Let's take an example.
The Code
First, we need a test table and some data. In order to have those, run the following statements:
CREATE Table MyReportData (
Item varchar(100) NOT NULL,
SubItem1 varchar(100) NOT NULL,
SubItem2 varchar(100) NULL
);
GO
INSERT INTO MyReportData (Item, SubItem1, SubItem2) VALUES
('First item', 'Sub item 1', NULL),
('First item', 'Sub item 2', 'Extra data'),
('First item', 'Sub item 2', 'Additional data'),
('First item', 'Sub item 3', NULL),
('Second item', 'Sub item 1', 'Info'),
('Second item', 'Sub item 2', 'Data'),
('Second item', 'Sub item 3', NULL),
('Third item', 'Sub item 1', 'Nonsense'),
('Third item', 'Sub item 1', 'Even more nonsense'),
('Second item', 'Sub item 2', NULL);
GO
Now if we just fetch the data using query like this:
SELECT Item,
SubItem1,
SubItem2
FROM MyReportData
ORDER BY Item,
SubItem1;
We get the following result:
Item SubItem1 SubItem2
First item Sub item 1
First item Sub item 2 Extra data
First item Sub item 2 Additional data
First item Sub item 3
Second item Sub item 1 Info
Second item Sub item 2 Data
Second item Sub item 2
Second item Sub item 3
Third item Sub item 1 Nonsense
Third item Sub item 1 Even more nonsense
If the goal was to remove the repeating values in the first two columns, we can do it by running the following query:
SELECT CASE
WHEN Item = LAG(Item) OVER (ORDER BY Item, SubItem1) THEN NULL
ELSE Item
END AS Item,
CASE
WHEN SubItem1 = LAG(SubItem1) OVER (ORDER BY Item, SubItem1) THEN NULL
ELSE SubItem1
END AS SubItem1,
SubItem2
FROM MyReportData
ORDER BY Item,
SubItem1;
The CASE
structure compares the value from the previous row to the current value. If the value is the same, then return NULL
but if the value differs, then return the value itself. The previous row is defined by ordering the data:
- First by
Item
column - Then by
SubItem1
column
Now the result would be:
Item SubItem1 SubItem2
Additional data
Even more nonsense
Sub item 2 Data
Sub item 2 Extra data
Sub item 3
Sub item 3
First item Sub item 1
Second item Sub item 1 Info
Third item Sub item 1 Nonsense
While the data is correct, we no longer have repeating values, it isn't quite what we expected. The rows are obviously in the wrong order.
So why is this happening? The reason is that I gave an alias name for the first two columns and I used the exact same name as an alias that is the name of the original column. So now the ORDER BY
clause is actually referring to the aliases, not to the original columns. When the statement is executed, the sorting is done with the modified values from the CASE
expressions, not with the original values.
Now when we understand the reason, it's easy to fix this problem. Let's just use some other alias name for columns and ensure that the ordering of the result set still takes place based on the original values. Like this:
SELECT CASE
WHEN Item = LAG(Item) OVER (ORDER BY Item, SubItem1, SubItem2) THEN NULL
ELSE Item
END AS FormattedItem,
CASE
WHEN SubItem1 = LAG(SubItem1) OVER (ORDER BY Item, SubItem1, SubItem2) THEN NULL
ELSE SubItem1
END AS FormattedSubItem1,
SubItem2
FROM MyReportData
ORDER BY Item,
SubItem1;
And now the result is:
FormattedItem FormattedSubItem1 SubItem2
------------- ----------------- --------
First item Sub item 1
Sub item 2 Extra data
Additional data
Sub item 3
Second item Sub item 1 Info
Sub item 2 Data
Sub item 3
Third item Sub item 1 Nonsense
Even more nonsense
To be honest, this isn't a foolproof solution yet. The ordering is done only by first two columns and they may contain the same values. Because of this, the ordering of individual rows having the same data in two first columns may change depending on how the data is fetched and handled by SQL Server. In the real situation, it would be a good idea to add a third, uniquely valued column to the ORDER BY
clauses to define an exact order.
References
History
- 13th October, 2016: Created