Introduction
Our business applications frequently include functionality to support database queries based on first and last name, which is problematic when names contain
accent characters and the database collation is accent-sensitive. Suppose, for example, you need to query the database for users having a first name that starts
with "HELEN". Ideally, the result set should include:
HELEN
HELENA
HÉLÈNA
HELENE
HÉLÈNE
However, when your database collation is accent-sensitive, the result set includes only:
HELEN
HELENA
HELENE
Notice the names containing accented characters are excluded from the result set.
This is because accented characters are excluded from the query. Unfortunately, this is not the desired result.
In order to include the missing names, you could query for users having a first name that starts with "HELEN" OR "HÉLÈN", but it
may be impossible to predict all accent variations that might exist in a data source. More importantly for some, if query parameter values
are specified by the end-user, then you can't be certain that a typical user will know how to input accented characters from his or her computer keyboard.
Background
What is collation?
A collation in SQL Server is a defined set of rules for sorting and comparing data. This includes rules for case-sensitivity and accent-sensitivity.
When you select a collation (for your server, database, column, or expression), you assign specific characteristics to your data that affect the results of database operations.
When I installed SQL Server 2008 R2, the default collation was SQL_Latin1_General_CP1_CI_AS. The collation name is broken into these parts:
- SQL = indicates a SQL Server collation name
- Latin1_General = identifies the alphabet whose sorting rules are applied
- CP1 = identifies Code Page 1252, which is a character encoding of the Latin alphabet
- CI = defines case-insensitive comparison rules
- AS = defines accent-sensitive comparison rules
Solution alternatives
What you need is an accent-insensitive search. There are several possible solutions:
- Remove and replace accented characters in the query. For example:
select * from users where replace ( replace ( name , 'É' ,'E' ) , 'È' ,'E' ) like 'HELEN%'
- Specify an accent-insensitive collation in the query. For example:
select * from users where name like 'HELEN%' collate SQL_Latin1_General_CP1_CI_AI
- Configure the database so that it is not accent-sensitive; change the collation to SQL_Latin1_General_CP1_CI_AI.
Options 1 and 2 are simplest from a database administrator's perspective, because they require no changes to the database.
Option 3 is simplest from a programmer's perspective, because it is a permanent solution, server-wide and database-wide, and it requires no programming changes
to applications. When you are installing a new SQL Server instance, then this is easy to do. However, if you have an existing SQL Server that you want to re-configure,
then the procedure is more difficult. This article provides a description of the steps you can follow to implement option 3 on an existing server with existing databases.
Steps to change the collation on SQL Server
Here is a high-level summary of the required steps:
- Generate a script to re-create all user databases.
- Export the data from all user databases.
- Drop all user databases.
- Rebuild the master database specifying the new collation.
- Execute the script to re-create all user databases.
- Import the data into all user databases.
It is important to note that you can't simply create a SQL Server backup of a database, then restore it to a server that is configured with a different default collation.
Collation settings are column-level attributes, which are restored "as is" from the backup. Therefore, you need to create new tables, with new collation settings,
and import the original data into the new tables.
Here it is important to note that you can change the collation on the tables in an existing database without changing the default collation for the server.
You can do this using a SQL script (e.g., Change Collation on All Tables.sql),
but this could leave unresolved issues with other database objects. Instead it might be better to use
a Change Collation utility program. Having said that, the problem with this approach is that
it requires you to remember to specify the collation whenever you add new columns to your tables. If you want a "fire and forget" solution, then you will want to change
the default collation for your server and for your database - not just the columns in your tables.
Step 1. Generate a SQL script to re-create all user databases
If you have a small number of databases, then you can use the Generate Scripts feature in SQL Server Management Studio.
Right-click the database and select Tasks / Generate Scripts. Follow the steps in the wizard to generate a SQL script that creates all of the objects in each user database.
To include INSERT
statements for your data, click the Advanced button on the Scripting Options step, and select "Schema and data" for the setting named "Types of data to script".
Detailed instructions (with screenshots) are
available here.
The simple manual approach is not a workable solution if you have multiple servers hosting dozens (or perhaps hundreds) of databases.
Furthermore, if you have databases that contain several gigabytes of data, it isn't a viable option to script massive volumes of data into
INSERT
statements in a SQL script file. Instead, you need a solution to automate the generation of SQL scripts for every database on a given server,
and a solution to automate the use of SQL Server's BCP utility to export and import the data quickly.
I created a console application called Siphon to generate a single-file SQL script to re-create each of the user databases on my server.
The code for this SQL Server utility program was inspired by Scriptio, an Open-Source project hosted
at CodePlex. Siphon is developed using Microsoft Visual Studio 2010, and it targets Microsoft SQL Server 2008 R2 servers.
If you are interested, you can download the source code.
Note: In order to get the project running, you will need to install Microsoft SQL Server 2008
R2 Shared Management Objects (SMO). After I installed the SMO class library on my computer, the .NET assemblies were located
here - C:\Program Files\Microsoft SQL Server\100\SDK\Assemblies\.
The code to perform this step is relatively simple, and looks like this:
public void ScriptDatabases(String outputPath)
{
ServerHelper helper = new ServerHelper(_server);
StringCollection names = helper.GetUserDatabaseNames();
foreach (String name in names)
{
System.Console.WriteLine("Scripting database {0}", name);
ObjectCreatorSettings settings = CreateObjectCreatorSettings(outputPath, name);
Database database = _server.Databases[name];
ScriptHelper scripter = new ScriptHelper(_server, database, settings);
scripter.ScriptDatabase("SQL_Latin1_General_CP1_CI_AI");
scripter.ScriptSchemas();
scripter.ScriptAssemblies();
scripter.ScriptSynonyms();
scripter.ScriptTables();
scripter.ScriptIndexes();
scripter.ScriptAggregates();
scripter.ScriptFunctions();
scripter.ScriptViews();
scripter.ScriptProcedures();
scripter.ScriptDefaults();
scripter.ScriptChecks();
scripter.ScriptForeignKeys();
scripter.WriteToOutputFile(scripter.Script, settings.OutputPath);
}
}
Step 2. Export data from all user databases
Before you start, you should perform a regular SQL Server backup of your user databases. If something goes wrong when you run the next few steps, then having these
"emergency backups" will be critically important.
You can use the SQL Server bulk copy utility (BCP) to export the data in every table (in every user database)
to a file. The code to do this is straightforward using SMO and BCP together:
public void ExportDatabases(String folderPath)
{
ServerHelper helper = new ServerHelper(_server);
foreach (Database database in helper.Databases)
ExportDatabase(database, folderPath);
}
public void ExportDatabase(Database database, String folderPath)
{
ServerHelper helper = new ServerHelper(_server);
if (!helper.IsUserDatabase(database))
return;
foreach (Table table in database.Tables)
{
System.Console.WriteLine("\rExporting data for {0}.{1}.{2}",
database.Name, table.Schema, table.Name);
System.Console.WriteLine("Preparing...");
String filePath = String.Format(@"{0}\{1}\Tables\{2}.bcp",
folderPath, database.Name, table.Name);
SetupOutputFile(filePath);
ExportTable(helper.ServerName, database.Name, table.Schema, table.Name, filePath);
}
}
Step 3. Drop all user databases
You can drop user databases manually using Management Studio, or you can write a SQL script to do this, or you can automate the procedure using the SMO library:
public void DropDatabases()
{
ServerHelper helper = new ServerHelper(_server);
StringCollection names = helper.GetUserDatabaseNames();
foreach (String name in names)
{
Database database = helper.Databases[name];
System.Console.WriteLine("Dropping database {0}", database.Name);
database.UserAccess = DatabaseUserAccess.Single;
database.Alter(TerminationClause.RollbackTransactionsImmediately);
database.Drop();
}
}
Step 4. Rebuild the master database specifying the new collation
Note: Microsoft has specific recommendations prior to rebuilding system databases.
How closely you follow these recommendations is something you will need to determine for yourself.
Rebuild the master database specifying the new collation in the SQLCOLLATION
property of the setup command. For example:
D:\> SETUP /QUIET /ACTION=REBUILDDATABASE
/INSTANCENAME=MSSQLSERVER /SQLCOLLATION=SQL_Latin1_General_CP1_CI_AI
/SQLSYSADMINACCOUNTS="BUILTIN\Administrators" /SAPWD=MyPassword
Re-run the Maintenance/Repair option from your Microsoft SQL Server 2008 R2 installation media, just to ensure that everything is intact after your system databases are rebuilt.
You can verify that the collation is correct by executing a simple query:
SELECT
SERVERPROPERTY('ProductVersion ') AS ProductVersion
,SERVERPROPERTY('ProductLevel') AS ProductLevel
,SERVERPROPERTY('ResourceVersion') AS ResourceVersion
,SERVERPROPERTY('ResourceLastUpdateDateTime') AS ResourceLastUpdateDateTime
,SERVERPROPERTY('Collation') AS Collation
Step 5. Execute the script to re-create all user databases
Execute the scripts you created in Step 1. You can perform this step manually using the SQLCMD console application, or you can create a command-line batch file,
or you can use the SMO class library:
public void CreateDatabases(String outputPath)
{
ServerHelper helper = new ServerHelper(_server);
StringCollection names = GetFolderNames(outputPath);
foreach (String name in names)
{
System.Console.WriteLine("Creating database {0}", name);
String path = String.Format(@"{0}\{1}\Database.sql", outputPath, name);
ExecuteScript(helper.ServerName, name, path);
}
}
private static void ExecuteScript(String server, String database, String path)
{
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.FileName = "sqlcmd";
p.StartInfo.Arguments = String.Format("-S {0} -E -d temp -i \"{1}\"", server, path);
p.OutputDataReceived += bcp_ExecuteScript_OutputDataReceived;
p.ErrorDataReceived += bcp_ExecuteScript_ErrorDataReceived;
p.Start();
p.BeginOutputReadLine();
p.BeginErrorReadLine();
p.WaitForExit();
}
Step 6. Import data into all user databases
You can use the SQL Server bulk copy utility (BCP) to import data in every
table (in every user database) from the export files created in Step 2. The code to do this is straightforward using SMO and BCP together:
public void ImportDatabases(String rootPath)
{
rootPath = StringHelper.RemoveTrailingSlash(rootPath);
ServerHelper helper = new ServerHelper(_server);
foreach (Database database in helper.Databases)
ImportDatabase(database, rootPath);
}
public void ImportDatabase(Database database, String rootPath)
{
rootPath = StringHelper.RemoveTrailingSlash(rootPath);
ServerHelper helper = new ServerHelper(_server);
if (!helper.IsUserDatabase(database))
return;
String databasePath = String.Format(@"{0}\{1}", rootPath, database.Name);
if (!Directory.Exists(databasePath))
return;
foreach (Table table in database.Tables)
{
String filePath = String.Format(@"{0}\Tables\{1}.bcp", databasePath, table.Name);
if (File.Exists(filePath))
{
System.Console.WriteLine("\rImporting data for {0}.{1}.{2}",
database.Name, table.Schema, table.Name);
System.Console.WriteLine("Preparing...");
ImportTable(helper.ServerName, database.Name,
table.Schema, table.Name, filePath);
}
}
}
Done
All of the databases on your server will now correctly handle accent-insensitive queries. For example, this query will now return all of the desired results:
select * from users where name like 'HELEN%'
Output:
HELEN
HELENA
HÉLÈNA
HELENE
HÉLÈNE
History
- Feb 18, 2012 - Improved error detection and handling in the SQL Server utility.
- Jan 4, 2012 - Completed the first version of the SQL Server utility for the scripting and DBA work described in this article.