Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Solving the collation mix with SAP

3.43/5 (4 votes)
20 Jun 2006CPOL4 min read 2  
Solving the collation mix with SAP (SQL_Latin1_General_CP850_BIN vs. Latin1_General_CI_AS).

Introduction

If you run your SAP installation on SQL Server (does not matter if you use version 2000 or 2005), you have to use the SQL_Latin1_General_CP850_BIN collation (see e.g., http://searchsap.techtarget.com/generic/0,295582,sid21_gci1169892,00.html?bucket=REF). This is not the default collation in SQL Server; it would use Latin1_General_CI_AS if you follow the suggestions of the installer. This leads into problems if you want to correctly transfer data from your SAP database into, e.g., your DWH database running on SQL Server's standard collation via a linked server.

Fortunately, you can solve this problem quite easily. Before we discuss the solutions, let me give you more details about the problem. As a starting point, I execute the following SQL SELECT statement on a SAP database running on SQL_Latin1_General_CP850_BIN:

SQL
select MTEXT, cast(MTEXT as varbinary) from T000

It returns all rows from table T000 and displays the content of the varchar column MTEXT as text as well as in binary format. In my example, I have one row which contains an umlaut (German "Umlaut O" character: Ö): Österr.[...]. If we take a look at the binary format of this text, it contains the following ASCII-codes:

 Ö  s  t  e  r  r  . 
D6 73 74 65 72 72 2E 

If you like, you can check the character codes yourself. You can look up a character map for codepage 850 in Wikipedia at http://en.wikipedia.org/wiki/CP850. When I did that, I was quite surprised that the code for Umlaut O was not correct (it should be 99; it is D6). In fact, D6 is the character code of Umlaut O in codepage 1252 (=Latin1_General_CI_AS; see http://en.wikipedia.org/wiki/CP1252)! It seams that SAP demands codepage 850 but stores values using codepage 1252. So let's see how this affects us when transferring data using a linked server.

In this article, I assume that the destination server is running on SQL Server version 2005. If you create a linked server there, you have two options concerning collations:

  • Use Remote Collation = True or
  • Use Remote Collation = False

Image 1

Create the linked server and try to execute the following query one time with Use Remote Collation set to true and the other time set to false:

SQL
select x1.MTEXT, cast(x1.MTEXT as varbinary)
from SAP01SQL.P01.dbo.T000 x1 
inner join SAP01SQL.P01.dbo.T000 x2 on 
x1.MANDT=x2.MANDT 
and x1.MTEXT=x2.MTEXT 
and x2.MTEXT like '_sterr%'

The result with Use Remote Collation set to false looks like this:

Image 2

As you can see, the result is correct! The Umlaut O is displayed correctly. Why? Simple reason: The setting Use Remote Collation: false tells SQL Server to retrieve the content of the linked server without any codepage conversion. As we saw before, SAP stores ANSI character codes (=codepage 1252); therefore ignoring all conversions leads to the correct result. However, this solution leads to one important disadvantage: SQL Server cannot pass joins and where-clauses to the remote server because the remote server would use its codepage and we have told SQL Server not to use the remote collation. Therefore the content of all tables is completely transferred to the target server and joins, as well as where-clauses are executed on the target server -> poor performance. Here is the execution plan that shows what's happening:

Image 3

Now let's change the setting of Use Remote Collation to true and execute the SELECT statement again. Here is the result:

Image 4

This time the result is wrong. The reason is that SQL Server tries to convert the content of the column in codpage 850 to ANSI (=codpage 1252). Remember that SAP writes ANSI character codes into an ASCII column -> the conversion leads to the wrong result shown above. The advantage is that joins and where-clauses are executed on the SAP server -> good performance:

Image 5

It seems that we have to choose between the correct results with poor performance or wrong results with good performance?! No, there are ways that ensure correct results and good performance.

The first solution is to encapsulate joins and where-clauses on the SAP server in views. You can access this views with Use Remote Collation set to true and you will get the correct results. The joins and where-clauses are executed on the SAP server. As an alternative, you could also use OPENQUERY instead of views.

If you do not want to put business logic on your SAP server but put the logic into the target server, you could change the collation manually. For this, I wrote a small C# assembly that you can use in SQL Server 2005. Here is the code:

C#
using System; 
using System.Text; 
using System.Data; 
using System.Data.SqlClient; 
using System.Data.SqlTypes; 
using Microsoft.SqlServer.Server; 

public partial class UserDefinedFunctions 
{ 
  [Microsoft.SqlServer.Server.SqlFunction] 
  public static SqlString ConvertCodepage( byte[] binString, SqlInt32 len ) 
  { 
    if (len.IsNull) 
      return SqlString.Null; 
    else 
    { 
      Encoding ai = Encoding.GetEncoding(1252); 
      return new SqlString(ai.GetString(binString, 0, len.Value)); 
    } 
  } 
};

You can deploy this function and use it in the SELECT-statement shown before:

SQL
select x1.MTEXT, cast(x1.MTEXT as varbinary), 
  dbo.ConvertCodepage( cast(x1.MTEXT as varbinary), len(x1.MTEXT) ) 
from SAP01SQL.P01.dbo.T000 x1 
inner join SAP01SQL.P01.dbo.T000 x2 on 
x1.MANDT=x2.MANDT 
and x1.MTEXT=x2.MTEXT 
and x2.MTEXT like '_sterr%' 

Check the result and you will see that it works:

Image 6

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)