Let’s SELECT Errors...

Wendelius

4.72/5 (6 votes)

25 Aug 2015CPOL4 min read

18.9K

This post discusses about how to add checks into a SQL statement in order to prevent wrong results in case of possible future changes.

Data problems are often hard to notice when operating with SQL. With other languages like C, C#, Java, etc., information is typically investigated and modified item-by-item. In SQL, all the operations are set based, meaning that a single statement often deals with several rows. For example, a simple SELECT statement investigates all the rows in involved tables, one way or another, and returns the desired results, hopefully...

Syntactical problems are easy to catch, you get an error. But what if your conditions are false, you don’t get an error but the results aren’t what were expected. When creating a program or modifying it, part of the testing is to ensure that the returned results are correct so this way we can deal with the initial situation. However, things get more complicated when the world around changes. What if the database structure is changed in the future in a way that would have an undesired effect on your existing SQL statements?

Let’s take an example. The following script creates an OrderItem table. It’s not a gem what comes to database design but sufficient to pinpoint the problem we’re discussing of:

SQL

CREATE TABLE OrderItem (
   OrderNumber int,
   Product     varchar(100),
   RowType     int CONSTRAINT ck_orderitem_rowtype 
                   CHECK (RowType IN (1,2)),   
                   -- 1=Included, 2=Excluded
   Amount      int,
   ListPrice   decimal
);

Product is the name of the item to be sold, Amount defines the quantity of items to be purchased and ListPrice is the price of a single item. RowType defines if this item is included in the order (value is one) or was it selected but then removed from the order (value of two).

Now, let’s add some data.

SQL

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Squared tyre', 1, 4, 100.0);

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Oil scent, 100 oz', 2, 1, 50.0);

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Haircut', 1, 1, 20.0);

Now, let’s imagine that the application fetches a list of items in the order. In order to show the prices correctly, the SELECT statement needs to take the RowType into account, something like the following:

SQL

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice 
                      -- Calculate the price
          WHEN 2 THEN 0  -- Excluded so no price
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

The results would be:

Status                Product            Amount   Price
-------------------   -----------------  ------   -----
Included into order   Haircut            1        20
Included into order   Squared tyre       4        400
Excluded from order   Oil scent, 100 oz  1        0

Also, the application could fetch the items to be delivered, with the following query:

SQL

SELECT oi.Product, 
       oi.Amount,
       oi.Amount * oi.ListPrice  AS Price
FROM  OrderItem oi
WHERE oi.OrderNumber = 1
AND   oi.RowType = 1
ORDER BY oi.RowType,
         oi.Product;

And the results would be:

Product        Amount   Price
------------   ------   -----
Haircut        1        20
Squared tyre   4        400

So everything is perfect, until…

One day, an extra row type is defined and added into the database. It was decided that customers buying lot of items were rewarded with a gift included in the purchase, naturally free of charge. The gift is a bunch of flowers which is also sold separately. Because of this, a new RowType 3 was introduced to distinguish free gift items from other items.

The following change was made to the database:

SQL

ALTER TABLE OrderItem 
DROP CONSTRAINT ck_orderitem_rowtype;

ALTER TABLE OrderItem 
ADD CONSTRAINT ck_orderitem_rowtype 
CHECK (RowType IN (1,2,3));   
-- 1=Included, 2=Excluded, 3=Gift

The order 1 we used as an example had such a big value that it would get a gift so let’s add it:

SQL

INSERT INTO OrderItem 
(OrderNumber, Product, RowType, Amount, ListPrice)
VALUES (1, 'Bunch of flowers', 3, 1, 5.0);

Now, let’s see what happens with our original queries. The first one would return:

Status                Product             Amount   Price
-------------------   -----------------   ------   -----
Included into order   Haircut             1        20
Included into order   Squared tyre        4        400
Excluded from order   Oil scent, 100 oz   1        0
NULL                  Bunch of flowers    1        NULL

And the second one:

Product        Amount   Price
------------   ------   -----
Haircut        1        20
Squared tyre   4        400

The first one is more obvious you get NULL values in the result set in places where they are not expected so if someone looks carefully, this may be noticed.

The second one is more dangerous, since the WHERE clause restricted the rows only to known RowTypes the gift isn’t included in the returned set of rows. Just by looking at the data, nothing looks suspicious so you need to know the data behind in order to spot the problem.

If you think about other languages, they have different kinds of mechanisms to handle known and unknown values. For example, a switch structure has a default block that is fired for all values not listed in case statements. This same idea can be applied to some extent to queries. However, in order to ensure that false data doesn’t pass through, we won’t do any default actions but deliberately generate errors.

Consider the following statement. It has a small modification in the CASE structure. RowTypes one and two are handled correctly but if an unknown RowType is spotted, the ELSE branch is executed and a division by zero error is thrown.

SQL

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice
          WHEN 2 THEN 0
          ELSE  0/0
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

When the statement is run, you get a message like the following:

Msg 8134, Level 16, State 1, Line 86
Divide by zero error encountered.

That would prevent a situation where an unknown RowType could cause false data to be returned. But what about the second one, it didn’t have any CASE structure in the SELECT. And even if a similar CASE would be added to the statement, it wouldn’t work since the WHERE clause takes care that no other values but 1 is returned.

Well, why not add the CASE into the WHERE clause:

SQL

SELECT oi.Product, 
       oi.Amount,
       oi.Amount * oi.ListPrice  AS Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
AND   1 = CASE oi.RowType 
             WHEN 1 THEN 1
             WHEN 2 THEN 0
             ELSE  0/0
          END
ORDER BY oi.RowType,
         oi.Product;

When the statement above is run, again, a division by zero error is returned.

So, after adding the new RowType and fixing the statements, the final versions could look something like these:

SQL

SELECT CASE oi.RowType 
          WHEN 1 THEN 'Included into order'
          WHEN 2 THEN 'Excluded from order'
          WHEN 3 THEN 'Gift'
       END As Status,
       oi.Product, 
       oi.Amount,
       CASE oi.RowType 
          WHEN 1 THEN oi.Amount * oi.ListPrice
          WHEN 2 THEN 0
          WHEN 3 THEN 0
          ELSE  0/0
       END As Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
ORDER BY oi.RowType,
         oi.Product;

And:

SQL

SELECT oi.Product, 
       oi.Amount,
       CASE oi.RowType
          WHEN 1 THEN oi.Amount * oi.ListPrice  
          WHEN 3 THEN 0
          ELSE 0/0
       END AS Price
FROM OrderItem oi
WHERE oi.OrderNumber = 1
AND   1 = CASE oi.RowType 
             WHEN 1 THEN 1
             WHEN 2 THEN 0
             WHEN 3 THEN 1
             ELSE  0/0
          END
ORDER BY oi.RowType,
         oi.Product;

Next time a new RowType is added, both of the statements will react by causing an error and will hopefully get caught already in testing.

So is this the way to always write SELECT statements? No, it isn’t. The modification in the SELECT clause is quite safe and can be used more widely but the second one, a CASE in the WHERE clause is something to be careful with. Complex structures in conditions may lead into a situation where perfectly valid indexes aren’t used because of an expression. So it’s important to always check how the statement is behaving by investigating the execution plan.

Also, it’s not feasible to use this kind of structure in every place. But in critical statements, it may be justified to add deliberate errors in case of unknown values.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)