Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

Configuring SQL Server Accent-Insensitivity

5.00/5 (9 votes)
21 Feb 2012CPOL7 min read 59.6K   389  
This article describes the steps to re-configure SQL Server so that queries return accent-insensitive results. It includes source code (using Microsoft SQL Server 2008 R2 Shared Management Objects) to automate some of the steps in this procedure.

Introduction

Our business applications frequently include functionality to support database queries based on first and last name, which is problematic when names contain accent characters and the database collation is accent-sensitive. Suppose, for example, you need to query the database for users having a first name that starts with "HELEN". Ideally, the result set should include:

HELEN
HELENA
HÉLÈNA
HELENE
HÉLÈNE

However, when your database collation is accent-sensitive, the result set includes only:

HELEN
HELENA
HELENE

Notice the names containing accented characters are excluded from the result set. This is because accented characters are excluded from the query. Unfortunately, this is not the desired result.

In order to include the missing names, you could query for users having a first name that starts with "HELEN" OR "HÉLÈN", but it may be impossible to predict all accent variations that might exist in a data source. More importantly for some, if query parameter values are specified by the end-user, then you can't be certain that a typical user will know how to input accented characters from his or her computer keyboard.

Background

What is collation?

A collation in SQL Server is a defined set of rules for sorting and comparing data. This includes rules for case-sensitivity and accent-sensitivity. When you select a collation (for your server, database, column, or expression), you assign specific characteristics to your data that affect the results of database operations.

When I installed SQL Server 2008 R2, the default collation was SQL_Latin1_General_CP1_CI_AS. The collation name is broken into these parts:

  • SQL = indicates a SQL Server collation name
  • Latin1_General = identifies the alphabet whose sorting rules are applied
  • CP1 = identifies Code Page 1252, which is a character encoding of the Latin alphabet
  • CI = defines case-insensitive comparison rules
  • AS = defines accent-sensitive comparison rules

Solution alternatives

What you need is an accent-insensitive search. There are several possible solutions:

  1. Remove and replace accented characters in the query. For example:
  2. SQL
    select * from users where replace ( replace ( name , 'É' ,'E' ) , 'È' ,'E' ) like 'HELEN%'
  3. Specify an accent-insensitive collation in the query. For example:
  4. SQL
    select * from users where name like 'HELEN%' collate SQL_Latin1_General_CP1_CI_AI
  5. Configure the database so that it is not accent-sensitive; change the collation to SQL_Latin1_General_CP1_CI_AI.

Options 1 and 2 are simplest from a database administrator's perspective, because they require no changes to the database. Option 3 is simplest from a programmer's perspective, because it is a permanent solution, server-wide and database-wide, and it requires no programming changes to applications. When you are installing a new SQL Server instance, then this is easy to do. However, if you have an existing SQL Server that you want to re-configure, then the procedure is more difficult. This article provides a description of the steps you can follow to implement option 3 on an existing server with existing databases.

Steps to change the collation on SQL Server

Here is a high-level summary of the required steps:

  1. Generate a script to re-create all user databases.
  2. Export the data from all user databases.
  3. Drop all user databases.
  4. Rebuild the master database specifying the new collation.
  5. Execute the script to re-create all user databases.
  6. Import the data into all user databases.

It is important to note that you can't simply create a SQL Server backup of a database, then restore it to a server that is configured with a different default collation. Collation settings are column-level attributes, which are restored "as is" from the backup. Therefore, you need to create new tables, with new collation settings, and import the original data into the new tables.

Here it is important to note that you can change the collation on the tables in an existing database without changing the default collation for the server. You can do this using a SQL script (e.g., Change Collation on All Tables.sql), but this could leave unresolved issues with other database objects. Instead it might be better to use a Change Collation utility program. Having said that, the problem with this approach is that it requires you to remember to specify the collation whenever you add new columns to your tables. If you want a "fire and forget" solution, then you will want to change the default collation for your server and for your database - not just the columns in your tables.

Step 1. Generate a SQL script to re-create all user databases

If you have a small number of databases, then you can use the Generate Scripts feature in SQL Server Management Studio. Right-click the database and select Tasks / Generate Scripts. Follow the steps in the wizard to generate a SQL script that creates all of the objects in each user database. To include INSERT statements for your data, click the Advanced button on the Scripting Options step, and select "Schema and data" for the setting named "Types of data to script". Detailed instructions (with screenshots) are available here.

The simple manual approach is not a workable solution if you have multiple servers hosting dozens (or perhaps hundreds) of databases. Furthermore, if you have databases that contain several gigabytes of data, it isn't a viable option to script massive volumes of data into INSERT statements in a SQL script file. Instead, you need a solution to automate the generation of SQL scripts for every database on a given server, and a solution to automate the use of SQL Server's BCP utility to export and import the data quickly.

I created a console application called Siphon to generate a single-file SQL script to re-create each of the user databases on my server. The code for this SQL Server utility program was inspired by Scriptio, an Open-Source project hosted at CodePlex. Siphon is developed using Microsoft Visual Studio 2010, and it targets Microsoft SQL Server 2008 R2 servers. If you are interested, you can download the source code.

Note: In order to get the project running, you will need to install Microsoft SQL Server 2008 R2 Shared Management Objects (SMO). After I installed the SMO class library on my computer, the .NET assemblies were located here - C:\Program Files\Microsoft SQL Server\100\SDK\Assemblies\.

The code to perform this step is relatively simple, and looks like this:

C#
public void ScriptDatabases(String outputPath)
{
    ServerHelper helper = new ServerHelper(_server);
    StringCollection names = helper.GetUserDatabaseNames();

    foreach (String name in names)
    {
        System.Console.WriteLine("Scripting database {0}", name);

        // Create settings for the SQL object creator

        ObjectCreatorSettings settings = CreateObjectCreatorSettings(outputPath, name);

        // Create a script helper

        Database database = _server.Databases[name];
        ScriptHelper scripter = new ScriptHelper(_server, database, settings);

        // Script objects in order (to avoid dependency issues)

        scripter.ScriptDatabase("SQL_Latin1_General_CP1_CI_AI");
        scripter.ScriptSchemas();
        scripter.ScriptAssemblies();
        scripter.ScriptSynonyms();
        scripter.ScriptTables();
        scripter.ScriptIndexes();
        scripter.ScriptAggregates();
        scripter.ScriptFunctions();
        scripter.ScriptViews();
        scripter.ScriptProcedures();
        scripter.ScriptDefaults();
        scripter.ScriptChecks();
        scripter.ScriptForeignKeys();

        // Write the output to a text file

        scripter.WriteToOutputFile(scripter.Script, settings.OutputPath);
    }
}

Step 2. Export data from all user databases

Before you start, you should perform a regular SQL Server backup of your user databases. If something goes wrong when you run the next few steps, then having these "emergency backups" will be critically important.

You can use the SQL Server bulk copy utility (BCP) to export the data in every table (in every user database) to a file. The code to do this is straightforward using SMO and BCP together:

C#
public void ExportDatabases(String folderPath)
{
    ServerHelper helper = new ServerHelper(_server);

    foreach (Database database in helper.Databases)
        ExportDatabase(database, folderPath);
}

public void ExportDatabase(Database database, String folderPath)
{
    ServerHelper helper = new ServerHelper(_server);

    if (!helper.IsUserDatabase(database))
        return;

    foreach (Table table in database.Tables)
    {
        System.Console.WriteLine("\rExporting data for {0}.{1}.{2}", 
                       database.Name, table.Schema, table.Name);
        System.Console.WriteLine("Preparing...");

        String filePath = String.Format(@"{0}\{1}\Tables\{2}.bcp", 
                          folderPath, database.Name, table.Name);
        SetupOutputFile(filePath);

        ExportTable(helper.ServerName, database.Name, table.Schema, table.Name, filePath);
    }
}

Step 3. Drop all user databases

You can drop user databases manually using Management Studio, or you can write a SQL script to do this, or you can automate the procedure using the SMO library:

C#
public void DropDatabases()
{
    ServerHelper helper = new ServerHelper(_server);
    StringCollection names = helper.GetUserDatabaseNames();

    foreach (String name in names)
    {
        Database database = helper.Databases[name];

        System.Console.WriteLine("Dropping database {0}", database.Name);
        database.UserAccess = DatabaseUserAccess.Single;
        database.Alter(TerminationClause.RollbackTransactionsImmediately);
        database.Drop();
    }
}

Step 4. Rebuild the master database specifying the new collation

Note: Microsoft has specific recommendations prior to rebuilding system databases. How closely you follow these recommendations is something you will need to determine for yourself.

Rebuild the master database specifying the new collation in the SQLCOLLATION property of the setup command. For example:

D:\> SETUP /QUIET /ACTION=REBUILDDATABASE 
  /INSTANCENAME=MSSQLSERVER /SQLCOLLATION=SQL_Latin1_General_CP1_CI_AI 
  /SQLSYSADMINACCOUNTS="BUILTIN\Administrators" /SAPWD=MyPassword

Re-run the Maintenance/Repair option from your Microsoft SQL Server 2008 R2 installation media, just to ensure that everything is intact after your system databases are rebuilt.

You can verify that the collation is correct by executing a simple query:

SQL
SELECT
        SERVERPROPERTY('ProductVersion ') AS ProductVersion
       ,SERVERPROPERTY('ProductLevel') AS ProductLevel
       ,SERVERPROPERTY('ResourceVersion') AS ResourceVersion
       ,SERVERPROPERTY('ResourceLastUpdateDateTime') AS ResourceLastUpdateDateTime
       ,SERVERPROPERTY('Collation') AS Collation

Step 5. Execute the script to re-create all user databases

Execute the scripts you created in Step 1. You can perform this step manually using the SQLCMD console application, or you can create a command-line batch file, or you can use the SMO class library:

C#
public void CreateDatabases(String outputPath)
{
    ServerHelper helper = new ServerHelper(_server);
    StringCollection names = GetFolderNames(outputPath);

    foreach (String name in names)
    {
        System.Console.WriteLine("Creating database {0}", name);
        String path = String.Format(@"{0}\{1}\Database.sql", outputPath, name);
        ExecuteScript(helper.ServerName, name, path);
    }
}

private static void ExecuteScript(String server, String database, String path)
{
    Process p = new Process();

    p.StartInfo.UseShellExecute = false;
    p.StartInfo.RedirectStandardOutput = true;
    p.StartInfo.RedirectStandardError = true;
    p.StartInfo.FileName = "sqlcmd";
    p.StartInfo.Arguments = String.Format("-S {0} -E -d temp -i \"{1}\"", server, path);

    p.OutputDataReceived += bcp_ExecuteScript_OutputDataReceived;
    p.ErrorDataReceived += bcp_ExecuteScript_ErrorDataReceived;

    p.Start();

    // Start the asynchronous read of the standart output stream and error stream
    p.BeginOutputReadLine();
    p.BeginErrorReadLine();

    // Wait for the child process to exit
    p.WaitForExit();
}

Step 6. Import data into all user databases

You can use the SQL Server bulk copy utility (BCP) to import data in every table (in every user database) from the export files created in Step 2. The code to do this is straightforward using SMO and BCP together:

C#
public void ImportDatabases(String rootPath)
{
    rootPath = StringHelper.RemoveTrailingSlash(rootPath);

    ServerHelper helper = new ServerHelper(_server);

    foreach (Database database in helper.Databases)
        ImportDatabase(database, rootPath);
}

public void ImportDatabase(Database database, String rootPath)
{
    rootPath = StringHelper.RemoveTrailingSlash(rootPath);

    ServerHelper helper = new ServerHelper(_server);

    if (!helper.IsUserDatabase(database))
        return;

    String databasePath = String.Format(@"{0}\{1}", rootPath, database.Name);
    if (!Directory.Exists(databasePath))
        return;

    foreach (Table table in database.Tables)
    {
        String filePath = String.Format(@"{0}\Tables\{1}.bcp", databasePath, table.Name);
        if (File.Exists(filePath))
        {
            System.Console.WriteLine("\rImporting data for {0}.{1}.{2}", 
                                     database.Name, table.Schema, table.Name);
            System.Console.WriteLine("Preparing...");

            ImportTable(helper.ServerName, database.Name, 
                        table.Schema, table.Name, filePath);
        }
    }
}

Done

All of the databases on your server will now correctly handle accent-insensitive queries. For example, this query will now return all of the desired results:

SQL
select * from users where name like 'HELEN%'

Output:

HELEN
HELENA
HÉLÈNA
HELENE
HÉLÈNE

History

  • Feb 18, 2012 - Improved error detection and handling in the SQL Server utility.
  • Jan 4, 2012 - Completed the first version of the SQL Server utility for the scripting and DBA work described in this article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)