Introduction
This articles includes examples of T-SQL capabilities that allow to perform string concatenation more flexibly and effectively through the use of other constructs.
Background
In practice, tasks within the scope of concatenating strings into one come across quite often. It is a pity that the T-SQL standard does not provide for the possibility of using string data inside the aggregating
SUM
function:
Msg 8117, Level 16, State 1, Line 1
Operand data type char is invalid for sum operator.
As a solution for such tasks, there was added a GROUP_CONCAT
function for MySQL, and
LISTAGG
– for Oracle. SQL Server, in turn, has no such built-in functionality.
However, this shouldn't be considered as a disadvantage, since T-SQL capabilities allow to perform string concatenation more flexibly and effectively through the use of other constructs that will be revealed below.
Using the code
Suppose we need to concatenate several strings separated by comma in one, using data from the following table:
IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL
DROP TABLE dbo.Chars
GO
CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY)
INSERT INTO dbo.Chars ([Char])
VALUES ('A'), ('B'), ('C'), ('F'), ('D')
The most obvious solution for this task is using of a cursor:
DECLARE
@Chars VARCHAR(100)
, @Char CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
SELECT [Char]
FROM dbo.Chars
OPEN cur
FETCH NEXT FROM cur INTO @Char
WHILE @@FETCH_STATUS = 0 BEGIN
SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char)
FETCH NEXT FROM cur INTO @Char
END
CLOSE cur
DEALLOCATE cur
SELECT @Chars
Using of a cursor itself causes reduced efficiency of query execution and, at least, doesn't look very elegant.
To avoid the use of a cursor, the strings can be concatenated via variables assigning:
DECLARE @Chars VARCHAR(100)
SELECT @Chars = ISNULL(@Chars + ', ' +[Char], [Char])
FROM dbo.Chars
SELECT @Chars
On the one hand, the construct is rather simple; on the other hand, its performance on a large fetch leaves much to be desired.
Thus, this method will be insolvent in case when concatenation inside a
SELECT
block is required.
In such cases, XML features are resorted to, using the following construct:
SELECT Chars = STUFF((
SELECT ', ' + [Char]
FROM dbo.Chars
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '')
If we look at the execution plan retrieved in
dbForge Studio for SQL Server, – we can notice high query cost for the value method:
To
avoid this operation, the query can be rewritten using XQuery
properties:
SELECT Chars = STUFF(CAST((
SELECT [text()] = ', ' + [Char]
FROM dbo.Chars
FOR XML PATH(''), TYPE) AS
VARCHAR(100)), 1, 2, '')
As
a result, we will get a very simple and fast execution plan:
In principle, string concatenation by a
column does not cause special difficulties.
A more interesting situation is when
concatenation of several columns is needed. For example, we have the
following table:
IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL
DROP TABLE dbo.EntityValues
GO
CREATE TABLE dbo.EntityValues (
EntityID INT
, Value1 CHAR(1)
, Value2 CHAR(1)
)
CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID
ON dbo.EntityValues (EntityID)
GO
INSERT INTO dbo.EntityValues (EntityID, Value1, Value2)
VALUES
(1, 'A', 'X'),
(2, 'B', 'Y'),
(2, 'C', 'Z'),
(2, 'F', 'H'),
(1, 'D', 'R')
in
which data must be grouped in the following way:
Alternatively, we can copy XML calls,
but then we get re-reading, which may significantly affect the query
execution efficiency:
SELECT
ev.EntityID
, Values1 = STUFF(CAST((
SELECT [text()] = ', ' + ev2.Value1
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
, Values2 = STUFF(CAST((
SELECT [text()] = ', ' + ev2.Value2
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
FROM (
SELECT DISTINCT EntityID
FROM dbo.EntityValues
) ev
This can be easily convinced, if to
look at the execution plan:
To
reduce re-reading, a small XML hack can be used:
SELECT
ev.EntityID
, Values1 = STUFF(
REPLACE(
CAST([XML].query('for $a in /a return xs:string($a)')
AS VARCHAR(100)), ' ,', ','), 1, 1, '')
, Values2 = STUFF(
REPLACE(
CAST([XML].query('for $b in /b return xs:string($b)')
AS VARCHAR(100)), ' ,', ','), 1, 1, '')
FROM (
SELECT DISTINCT EntityID
FROM dbo.EntityValues
) ev
CROSS APPLY (
SELECT [XML] = CAST((
SELECT
[a] = ', ' + ev2.Value1
, [b] = ', ' + ev2.Value2
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH('')
) AS XML)
) t
But this query won't be optimal as well
due to multiple calls of the query method.
Alternative way – using a cursor:
IF
OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
DROP TABLE #EntityValues
GO
SELECT DISTINCT
EntityID
, Values1 = CAST(NULL AS VARCHAR(100))
, Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
DECLARE
@EntityID INT
, @Value1 CHAR(1)
, @Value2 CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
SELECT
EntityID
, Value1
, Value2
FROM dbo.EntityValues
OPEN cur
FETCH NEXT FROM cur INTO
@EntityID
, @Value1
, @Value2
WHILE @@FETCH_STATUS = 0 BEGIN
UPDATE #EntityValues
SET
Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1)
, Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2)
WHERE EntityID = @EntityID
FETCH NEXT FROM cur INTO
@EntityID
, @Value1
, @Value2
END
CLOSE cur
DEALLOCATE cur
SELECT *
FROM #EntityValues
However, as practice has shown, when
working with large ETL packages, the most efficient solution is the
ability to assign variables in the UPDATE
statement:
IF
OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
DROP TABLE #EntityValues
GO
DECLARE
@Values1 VARCHAR(100)
, @Values2 VARCHAR(100)
SELECT
EntityID
, Value1
, Value2
, RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0)
, Values1 = CAST(NULL AS VARCHAR(100))
, Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
UPDATE #EntityValues
SET
@Values1 = Values1 =
CASE WHEN RowNum = 1
THEN Value1
ELSE @Values1 + ', ' + Value1
END
, @Values2 = Values2 =
CASE WHEN RowNum = 1
THEN Value2
ELSE @Values2 + ', ' + Value2
END
SELECT
EntityID
, Values1 = MAX(Values1)
, Values2 = MAX(Values2)
FROM #EntityValues
GROUP BY EntityID
SQL Server has no built-in alternative
to GROUP_CONCAT
and LISTAGG
functions. Nevertheless, this doesn't
prevent performing string concatenation tasks efficiently. The
objective of this article is to show it clearly.