What is a Null Value?
In databases a common issue is what value or placeholder do you use to represent a missing values. In SQL, this is solved with null. It is used to signify missing or unknown values. The keyword NULL is used to indicate these values. NULL really isn’t a specific value as much as it is an indicator. Don’t think of NULL as similar to zero or blank, it isn’t the same. Zero (0) and blanks “ “, are values.
In most of our beginning lessons we’ve assumed all tables contained data; however, SQL treats missing values differently. It is important to understand how missing values are used and their effect on queries and calculations.
Database Tables and Values
In many situations every row and column will contain data, but there cases where it makes sense for some columns to not contain a value.
Customer
CustomerID | Name | City | State | Age | Sex |
2 | Bob Peterson | New York | NY | 55 | M |
3 | Sue Linkle | Omaha | NE | | F |
4 | Chris | Green | MI | | |
5 | Lori Otterman | Oslo | | | |
6 | Acme Plumbing | Austin | TX | | |
Consider the above Customer table. Several columns contain missing or unknown values. Reasons why this can happen include a value is:
- missing – Perhaps a customer, such as Sue, doesn’t divulge her age to your customer service representative.
- unknown – An employee’s termination date is usually some event in the unforeseen future.
- doesn’t apply – If the customer is a business, then Sex doesn’t apply.
You could argue that for text values you could use blanks, such as one space ‘ ‘, or even an empty value, which is two single quotes ‘’ to represent a missing value. Yet this strategy doesn’t work well for numbers or dates. If the customer’s age is unknown, what numeric value would you use? Would it make sense to use zero or a negative number?
I think doing that causes more confusion and would make it very easy to skew results. For instance, if you were going to calculate the average age of your female customers, the zeros, the ones you were using to represent missing values, would cause the actual average age to be lower than you would expect.
SQL reserves the NULL keyword to denote an unknown or missing value. In SQL NULL’s are treated differently than values. Special consideration is required when comparing null value or using them in expressions.
NULL Value in Comparisons:
When it isn’t possible to specially code your data using “N/A” you can use the special keyword NULL to denote a missing value. NULL is tricky. NULL isn’t a value in the normal sense. For instance no two NULL are equal to each other. Surprisingly NULL = NULL is FALSE!
SQL covers this dilemma though. You can use the IS NULL and IS NOT NULL comparisons to test for NULL values.
For example, the following query from the AdventureWorks2012 database
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE Color IS NULL
will return all products whose color is missing; whereas
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE Color IS NOT NULL
will return all products having a color value.
NULLS in Expressions
As you can expect nulls have an adverse effect in expressions. Since NULL denotes an undefined value, its participation in most expressions renders the expression unknown, or NULL, as well.
Check out the following arithmetic expression. The first line evaluates to a number the others to NULL:
- 20 + (5 * 4) = 20 + 20 = 40
- NULL + (5 * 4) = NULL + 20 = NULL
- 20 + (NULL * 4) = 20 + NULL = NULL
NULL has the same effect on text. Expressions:
- ‘Happy’ + ‘ ‘ + ‘Holidays’ = ‘Happy Holidays’
- NULL + ‘ ‘ + ‘Holidays’ = NULL
Try running the following query in the AdventureWorks2012 database:
SELECT ProductID,
Color + ' ' +Name
FROM Production.Product
Did you notice you either get value like “Black Chainring” or NULL? You may expect to see just the product‘s name if the color is missing, but since the color is NULL, the expression is also NULL.
Without help, it would be tough to create expressions, as the NULL’s would “ruin” most results!
In this case you can use the COALESCE function to assist. This function will return the first non-NULL value from a list of parameters. For example
COALESCE(NULL, 'A', 'B')
Returns ‘A’, since it is the first non-NULL value found.
We can use COALESCE to replace a NULL with another value. In this way we can then continue to build an expression that will return a result free of NULL value.
Our improved SQL statement is:
SELECT ProductID,
COALESCE(Color + ' ','') + Name
FROM Production.Product
How does the work? I’ve highlighted some important bits in the statement:
- If color is the value “Black,” then the COALESCE function will return “Black “ with the training space. The product name is then appended to form the full value.
- If the color is NULL, then COALESCE seeks out the first non-null value. This happens to be the empty string ”. Using this trick, it makes it really easy to return either the combination of two columns, or just the second column.
NULL Values in Where Clauses
Where clauses are used to limit the rows returned from a query. Generally only rows meeting the where clause are returned. Rows whose where clause evaluates to FALSE are removed from the result.
In similar fashion if the where clause evaluates to NULL, the row is eliminated.
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE LEN(COLOR) < 100
Will return all rows where a color is specified. LEN is a function that returns the number of characters in a value; LEN(‘Black’) returns 5. Since this is less than 100 it would be included in the result.
In cases where COLOR is NULL, then LEN(COLOR) returns NULL. Since NULL < 100 is always false, the row is removed from the result.
NULL in Boolean Expressions (Intermediate)
Since NULL represents a missing or unknown value, its effect on Boolean expressions is curious. It status as an unknown value puts a spin on the outcome.
I think the results for AND are what you would expect, but the results you would expect for OR are not. In many cases when a value is unknown, the uncertainty of the value translates through the expression.
Boolean AND Operator
The AND operator returns a TRUE only if all conditions are also TRUE. The following truth table shows all combinations of values the condition (A AND B). Keep in mind we use NULL to denote unknown values.
Condition A | Condition B | Result |
TRUE | FALSE | FALSE |
TRUE | TRUE | TRUE |
TRUE | NULL | NULL |
FALSE | FALSE | FALSE |
FALSE | TRUE | FALSE |
FALSE | NULL | NULL |
NULL | FALSE | NULL |
NULL | TRUE | NULL |
NULL | NULL | NULL |
The one of the inputs being unknown, there is no possibility the expression can be TRUE, as we can’t be certain both inputs are TRUE. Conversely, we don’t know whether the unknown value is FALSE. This leaves to concluding the result, in general is unknown.
Boolean OR Operator
The OR operator returns a TRUE when one or more conditions are also TRUE. Here is the Truth table for the OR operator.
Condition A | Condition B | Result |
TRUE | FALSE | TRUE |
TRUE | TRUE | TRUE |
TRUE | NULL | TRUE |
FALSE | FALSE | FALSE |
FALSE | TRUE | TRUE |
FALSE | NULL | NULL |
NULL | FALSE | NULL |
NULL | TRUE | TRUE |
NULL | NULL | NULL |
You’ll see that in every case one of the conditions is true, so is the end result. This is even the case when the other input is NULL. All that matters is one input is TRUE, the other input, albeit unknown, is irrelevant.
If you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?