Concatenating Strings with a Custom SQL Aggregate Function

Daniel Miller

4.88/5 (6 votes)

27 Jul 2015CPOL3 min read

28.3K

294

This tip describes the code to create a custom SQL Server aggregate function that concatenates multiple string values into a single delimited value.

Download source - 23.6 KB

Introduction

For the sake of an example, suppose you have a database in which you track employees, departments, and the relationships between employees and departments. The schema might look something like this:

An employee can be assigned to multiple departments, and each department can have multiple employees assigned to it.

It is easy to think of scenarios in which you might want a query that returns a data set that looks like this:

Departments per Employee

Alice	Accounting
Bill	Development, Operations, Research
Dinah	Operations, Research

... or a data set that looks like this:

Employees per Department

Accounting	Alice
Development	Bill
Operations	Bill, Dinah
Quality Assurance
Research	Bill, Dinah

The SELECT statement to return these types of data sets should be simple and the syntax should be intuitive. For example:

SQL

SELECT
    Department.Name AS Department
  , toolbox.ConcatenateText(Employee.Name, ', ') AS Employees
FROM
    Department
    LEFT JOIN Assignment ON Department.ID = Assignment.DepartmentID
    LEFT JOIN Employee ON Assignment.EmployeeID = Employee.ID
GROUP BY
    Department.Name
ORDER BY
    Department.Name

Microsoft SQL Server does not provide an aggregate function to concatenate string values, so you need to create your own.

Background

What is an Aggregate Function?

An aggregate function performs a calculation on a set of values and returns a single value. Except for COUNT, aggregate functions ignore null values. Aggregate functions are frequently used with the GROUP BY clause of the SELECT statement.

For more information on build-in aggregate functions, refer to this article in the MSDN documentation.

You can create a database object inside SQL Server that is programmed in a CLR assembly. Database objects that can leverage the rich programming model provided by the CLR include triggers, stored procedures, functions, aggregate functions, and types. Like the built-in aggregate functions provided in Transact-SQL, user-defined aggregate functions perform a calculation on a set of values and return a single value.

For general information on user-defined aggregate functions, refer to this article in the MSDN documentation.

Using the Code

Download and unzip the attached source code. Open the SQL script named Install.sql and change the variables DatabaseName and AssemblyPath to match your environment. Make sure SQLCMD mode is enabled and then execute the script. The script enables CLR modules in your database, creates a schema named "toolbox", and adds an aggregate function named ConcatenateText.

The aggregate function accepts two parameters:

The name of the input column
The string to be used as a value-separating delimiter

For example:

SQL

SELECT
    Employee.Name AS EmployeeName
  , toolbox.ConcatenateText(Department.Name, ', ') AS DepartmentNames
FROM
    Employee
    INNER JOIN Assignment ON Employee.ID = Assignment.EmployeeID
    INNER JOIN Department ON Assignment.DepartmentID = Department.ID
GROUP BY
    Employee.Name

Here the separator is a comma followed by a blank space, so the aggregate function for a given employee name returns a comma-separated list of department names that looks like this:

Accounting, Engineering, Operations

As an added bonus, the aggregate function ensures your comma-separated output value is sorted and contains no duplicates.

Points of Interest

The C# code for the custom aggregate function is in the Concatenate class. The class is small and relatively simple, containing only about 120 lines of code.

The Accumulate method is the key to how the code works. This method is invoked for each of the values in your data set; it appends the values to a string builder and adds a separator when needed.

public void Accumulate(SqlString value, SqlString separator)
{
    if (!separator.IsNull && separator.Value.Length > 0)
        _separator = separator.Value;

    if (_accumulator.Length > 0)
        _accumulator.Append(_separator);

    if (!value.IsNull && value.Value.Length > 0)
    {
        _accumulator.Append(value.Value);
        IsNull = false;
    }
}

SQL Server invokes the Terminate method after the accumulation is complete. At this stage, the code sorts the array of values and then eliminates duplicates and empty entries.

Shrink ▲

public SqlString Terminate()
{
    if (IsNull)
        return SqlString.Null;

    String result = _accumulator.ToString();

    // Split the string value into an array and sort it.
    String[] array = result.Split(new[] {_separator}, StringSplitOptions.RemoveEmptyEntries);
    Array.Sort(array);

    // Join the array into a single string value (without duplicates) and return it.
    StringBuilder joinedWithoutDuplicates = new StringBuilder();

    for (Int32 i = 0; i < array.Length; i++)
    {
        String currentItem = array[i];

        if (i == 0 || !String.Equals(array[i - 1], currentItem, StringComparison.CurrentCulture))
        {
            if (i != 0)
                joinedWithoutDuplicates.Append(_separator);

            joinedWithoutDuplicates.Append(currentItem);
        }
    }
    
    return new SqlString(joinedWithoutDuplicates.ToString());
}

History

July 27, 2015: Posted the first version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)