Mimicking null-safe equal to operator in SQL Server

Wendelius

0.00/5 (No votes)

20 Sep 2011CPOL6 min read

37.7K

This article describes how to mimic the null-safe equal to operator using an UDF in SQL Server.

Download script - 3.81 KB

Introduction

MySQL has a non-standard extension to operators called null-safe equal to operator, <=>. The operator compares two values and if they are identical, returns 1. The difference from the normal equal to operator is that if both of the operands are null, the result is 1 instead of null.

Before jumping into the implementation

So why is null handled differently from other values anyway? ANSI standard defines a null value as unknown. When comparing a null to a null, they are not equal, ever. To be more specific, they are not unequal either, one null is never less than another null and so on. Since the real value of null is unknown, any comparison result is unknown by definition, we simply don't know the result.

From a design point of view, null represents a missing piece of information and it should always be handled that way. Typically, if you need to find records where null matches a null, the best thing is to sit down, put your legs on the table, and figure out the root cause. When found, it should be fixed if possible.

So why write this article at all? I thought about it a lot before publishing this. One reason was to raise discussions about null equality and the other was that there still are cases where this kind of comparison may be justified. One example could be a reporting application where dynamic conditions are used. In such an application, null along any other value may be a valid criteria defined by the user to find records. In such a case, this comparison may simplify the query building process.

Long story short: before using this, think about the situation carefully and if you still are convinced that this is the correct way, then hopefully this article saves some work.

Implementing in SQL Server

Since SQL Server doesn’t have this kind of operator, the typical solution is to use two separate conditions, for example:

SQL

… 
WHERE Column1 = Column2
OR (Column1 IS NULL AND Column2 IS NULL)

Another solution is to create a user defined function which handles the comparison. In order to manage different data types, it’s easiest to use the sql_variant data type for the parameters. Sql_variant can receive most of the other data types except:

timestamp
image
ntext
text
xml
hierarchyid
and CLR UDT

In the simplest form, the function would evaluate the equality of the operands or null values directly, so the function body would look something like:

SQL

CREATE FUNCTION NullSafeEqual_incomplete(
       @left sql_variant, @right sql_variant) RETURNS smallint
AS
BEGIN
    DECLARE @conversionDone smallint = 0;
    DECLARE @returnValue smallint = 0;

    IF (@left IS NULL AND @right IS NULL) BEGIN
        SET @returnValue = 1;
        SET @conversionDone = 1;
    END;

    IF (@left IS NULL AND @right IS NOT NULL) BEGIN
        SET @returnValue = 0;
        SET @conversionDone = 1;
    END;

    IF (@left IS NOT NULL AND @right IS NULL) BEGIN
        SET @returnValue = 0;
        SET @conversionDone = 1;
    END;

    IF (@conversionDone = 0 AND @left = @right) BEGIN 
        SET @returnValue = 1;
        SET @conversionDone = 1;
    END;

    RETURN (@returnValue);
END;

The function first checks if both of the operands are null. If that’s true, 1 is returned. If not it checks if either one of the parameters is null and the other isn’t, and at last, the equality is tested. Now, let’s test this with a simple query with integer values:

SQL

-- Test 1
SELECT dbo.NullSafeEqual_incomplete (1, 1)       AS Comparison1, 
       dbo.NullSafeEqual_incomplete (1, null)    AS Comparison2, 
       dbo.NullSafeEqual_incomplete (null, null) AS Comparison3;

The results are:

Comparison1  Comparison2  Comparison3
-----------  -----------  -----------
1            0            1

So 1 is equal to 1 but also null is equal to null. What if other data types are used or data types are mixed?

SQL

-- Test 2
SELECT dbo.NullSafeEqual_incomplete('A', 'A')  AS Comparison1, 
       dbo.NullSafeEqual_incomplete('A', null) AS Comparison2, 
       dbo.NullSafeEqual_incomplete(1, '1')    AS Comparison3;

Results then are:

Comparison1  Comparison2  Comparison3
-----------  -----------  -----------
1            0            0

Everything else seems fine, but the comparison 3 is incorrect. If a condition 1 = '1' is evaluated, for example, in a WHERE clause, it results to true. However, the function returned 0. This is because the sql_variant values are different. Even though they seemingly contain the same value, it’s a different data type and no automatic conversion takes place. This actually means that the function won’t return correct results if data types are mixed.

SQL Server has a function called SQL_VARIANT_PROPERTY, which returns the actual underlying data type. In order to do this comparison properly, the data needs to be cast to the actual data type and then compared using the same rules that apply when actual data types are used.

This means that each data type has to be checked individually. One way of doing this is to list all the combinations for different data types. So the function can, for example, have a logic like:

SQL

DECLARE @conversionDone smallint = 0;
DECLARE @returnValue smallint = 0;
DECLARE @leftdatatype nvarchar(100) = 
  CONVERT(nvarchar, SQL_VARIANT_PROPERTY(@left, 'BaseType'));
DECLARE @rightdatatype nvarchar(100) = 
  CONVERT(nvarchar, SQL_VARIANT_PROPERTY(@right, 'BaseType'));

IF (@left IS NULL AND @right IS NULL) BEGIN
    SET @returnValue = 1;
    SET @conversionDone = 1;
END;

IF (@left IS NULL AND @right IS NOT NULL) BEGIN
    SET @returnValue = 0;
    SET @conversionDone = 1;
END;

IF (@left IS NOT NULL AND @right IS NULL) BEGIN
    SET @returnValue = 0;
    SET @conversionDone = 1;
END;

IF (@conversionDone = 0 AND @left = @right) BEGIN 
    SET @returnValue = 1;
    SET @conversionDone = 1;
END;

IF (@conversionDone = 0) BEGIN
    -- Left operand is int
    IF (@conversionDone = 0 AND @leftdatatype = N'int') BEGIN
        IF (@conversionDone = 0 AND @rightdatatype = N'binary') BEGIN
            IF CONVERT(int, @left) = CONVERT(binary, @right) BEGIN
                SET @returnValue = 1;
            END;
            SET @conversionDone = 1;
        END;
        IF (@conversionDone = 0 AND @rightdatatype = N'varbinary') BEGIN
            IF CONVERT(int, @left) = CONVERT(varbinary, @right)  BEGIN
                SET @returnValue = 1;
            END;
            SET @conversionDone = 1;
        END;
… etc.

I know the implementation isn't very fancy, but actually this was very easy to generate.

Also note that not all of the possible data type combinations are listed in the function, for example, if equal comparison isn't allowed between the two data types.

Now, let’s run the same test with the complete function:

SQL

-- Test 3 
SELECT dbo.NullSafeEqual('A', 'A')  AS Comparison1, 
       dbo.NullSafeEqual('A', null) AS Comparison2, 
       dbo.NullSafeEqual(1, '1')    AS Comparison3;

The result is:

Comparison1  Comparison2  Comparison3
-----------  -----------  -----------
1            0            1

So now the equality check also works with mixed data types.

The function can receive valid parameter data types for sql_variant but the comparison between these data types isn’t valid and an error has to be raised in such situations. However, RAISERROR cannot be used in functions, so this limitation has been overcome by causing a casting error like the following:

SQL

SET @returnValue = N'Incompatible datatype conversion from ' + 
                   @leftdatatype + N' to ' + @rightdatatype;

Let’s test the error handling. If we execute the following:

SQL

-- Test 4, incompatible data types
SELECT dbo.NullSafeEqual( convert(money,5), NEWID()) AS Comparison1;

An error is generated:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value 
'Incompatible datatype conversion from money to uniqueidentifier' 
to data type smallint.

The run-time error is the conversion from nvarchar to smallint, but the trick here is that the actual error message for incompatible parameters is embedded into the error message; What cannot be converted: 'Incompatible datatype conversion from money to uniqueidentifier'.

Another type of error is a conversion error detected by SQL Server. For example, if a varchar and date are compared, the varchar is converted to a date if possible. If this fails, an error occurs:

SQL

-- Test 5, wrong format, dates
SELECT dbo.NullSafeEqual(CURRENT_TIMESTAMP, 'This is not a date') AS Comparison1;

When the above is run, a conversion error is returned:

Msg 241, Level 16, State 1, Line 2
Conversion failed when converting date and/or time from character string.

So no additional tests in the function are needed to test if the conversion itself succeeds.

Some further investigations

Single table queries

So how can this be used in SQL queries? First, let’s create a small table:

SQL

CREATE TABLE Table1 (
T1_Col1 int not null identity(1,1),
T1_Col2 int,
T1_Col3 int,
CONSTRAINT pk_Table1 PRIMARY KEY (T1_Col1) 
);

And some data:

SQL

INSERT INTO Table1 (T1_Col2, T1_Col3) VALUES (1, 1);
INSERT INTO Table1 (T1_Col2, T1_Col3) VALUES (1, null);
INSERT INTO Table1 (T1_Col2, T1_Col3) VALUES (null, null);
INSERT INTO Table1 (T1_Col2, T1_Col3) VALUES (2, null);

Now if we want to get all the rows where T1_Col2 is equal to T1_Col3, the query would look like:

SQL

SELECT * 
FROM  Table1 t
WHERE dbo.NullSafeEqual(t.T1_Col2, t.T1_Col3) = 1

And the results would be:

T1_Col1  T1_Col2  T1_Col3
-------  -------  -------
1        1        1
3        NULL     NULL

So both rows are found.

Joins

What about joins then? First, let’s create a child table:

SQL

CREATE TABLE Table2 ( 
T2_Col1 int not null identity(1,1),
T2_Col2 int,
CONSTRAINT pk_Table2 PRIMARY KEY (T2_Col1) 
);

And some data again:

SQL

INSERT INTO Table2 (T2_Col2) VALUES (1);
INSERT INTO Table2 (T2_Col2) VALUES (null);
INSERT INTO Table2 (T2_Col2) VALUES (null);
INSERT INTO Table2 (T2_Col2) VALUES (2);

If we want to get matching records from these tables based on T1_Col2 and T2_Col2, the query could look like:

SQL

SELECT *  
FROM Table1 t1, 
     Table2 t2
WHERE dbo.NullSafeEqual(t1.T1_Col2, t2.T2_Col2) = 1
ORDER BY t1.T1_Col1, t2.T2_Col1;

And the results:

T1_Col1  T1_Col2  T1_Col3  T2_Col1  T2_Col2
-------  -------  -------  -------  -------
1        1        1        1        1 
2        1        NULL     1        1 
3        NULL     NULL     2        NULL
3        NULL     NULL     3        NULL
4        2        NULL     4        2

It’s important to notice that Cartesian product is applied both ways, null doesn’t make an exception in this case when the function is used. Row 1 in Table 2 is repeated twice since it has two parents but also Row 3 in Table1 is repeated twice since it has two children.

The query above could also be written using the standard Join syntax:

SQL

SELECT *  
FROM Table1 t1 inner join Table2 t2 
ON dbo.NullSafeEqual(t1.T1_Col2, t2.T2_Col2) = 1
ORDER BY t1.T1_Col1, t2.T2_Col1;

Performance

What about performance then? If we create another test table:

SQL

CREATE TABLE Table3 (
T3_Col1 int not null identity(1,1),
T3_Col2 int
CONSTRAINT pk_Table3 PRIMARY KEY (T3_Col1) 
);

And fill it with 200,000 rows of some random data and a few null rows:

SQL

SET NOCOUNT ON 
DECLARE @counter int = 0;
BEGIN 
    -- Some nulls
    INSERT INTO Table3 (T3_Col2) VALUES (NULL);
    INSERT INTO Table3 (T3_Col2) VALUES (NULL);

    -- non null data
    WHILE (@counter < 200000) BEGIN
        INSERT INTO Table3 (T3_Col2) 
        SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 ) 
                   + (DATEPART(ss, GETDATE()) * 1000 ) 
                   + DATEPART(ms, GETDATE())) * 100000;
        SET @counter = @counter + 1;
    END;

    -- Some nulls
    INSERT INTO Table3 (T3_Col2) VALUES (NULL);
    INSERT INTO Table3 (T3_Col2) VALUES (NULL);
END;

Let’s get a look at some rows:

SQL

SELECT * FROM Table3;

With my test run, the results started like:

T3_Col1  T3_Col2 
-------  -------
1        NULL
2        NULL
3        52343
4        52343
5        52349
6        52349
7        52349
8        52349
9        52349
10       52362

The next step is to create an index on T3_Col2:

SQL

CREATE INDEX X_Table3_T3Col2 ON Table3 (T3_Col2);

If a traditional select is used, for example, to fetch the nulls, the query would look like:

SQL

SELECT * FROM Table3 WHERE T3_Col2 IS NULL;

But more importantly, the execution plan shows that the index is used in order to fetch the data:

Now if the same is executed with the newly created function:

SQL

SELECT * FROM Table3 WHERE dbo.NullSafeEqual(T3_Col2, NULL) = 1;

The results are the same but there’s a big difference in the execution plan:

So actually SQL Server has to read through all the records and pass the value from T3_Col2 to the function in order to know if the condition is true. This has a performance penalty so be careful when using this with larger sets of data.

History

September 20, 2011: Created.
September 20, 2011: Added the "Before jumping into implementation" chapter based on a good discussion with Mel Padden.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)