Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / SharePoint

Get Ready to Learn SQL Server 21: Using Subqueries in the WHERE Clause

4.60/5 (4 votes)
20 Feb 2015MIT7 min read 11.7K  
In this post, we discuss subqueries in the WHERE clause.

This is the third in a series of articles about subqueries. In this article, we discuss subqueries in the WHERE clause. Other articles discuss their uses in other clauses.

Using Subqueries in the WHERE Clause

It is very common to use subqueries in the WHERE clause. A common use is to test for existence using EXISTS or IN. In some cases, it may make sense to rethink the query and use a JOIN, but you should really study both forms via the query optimizer before making a final decision.

The comparison modifiers ANY and ALL can be used with greater than, less than, or equals operators. Doing so provides a means to compare a single value, such as a column, to one or more results returned from a subquery.

Let’s now explore these in detail.

Exist and Not Exists

The EXISTS condition is used in combination with a subquery. It returns TRUE whenever the subquery returns one or more values.

In its simplest form, the syntax for the EXISTS condition is:

SQL
WHERE EXISTS (sub query)

Suppose we need to return all sales orders written by sales people with sales year to date greater than three million dollars. To do so, we can use the EXISTS clause as shown in this example:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  EXISTS (SELECT 1
               FROM   sales.SalesPerson
               WHERE  SalesYTD > 3000000
                      AND SalesOrderHeader.SalesPersonID 
                        = Sales.SalesPerson.BusinessEntityID)

When this SQL executes, the following comparisons are made:

  1. The WHERE clause returns all records where the EXISTS clause is TRUE.
  2. The EXIST clause uses a correlated sub query. The outer query is correlated to the inner query by SalesPersonID.
  3. Only SalesPersons with SalesYTD greater than three million are included in the results.
  4. The EXISTS clause returns TRUE if one or more rows are returned by the sub query.

The EXISTS condition is a membership condition in the sense it only returns TRUE if a result is returned. Conversely, if we want to test for non-membership we can use NOT EXISTS.

NOT EXISTS returns TRUE if zero rows are returned. So, if we want to find all sales orders that were written by salespeople that didn’t have 3,000,000 in year-to-date sales, we can use the following query:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  NOT EXISTS (SELECT 1
               FROM   sales.SalesPerson
               WHERE  SalesYTD > 3000000
                      AND SalesOrderHeader.SalesPersonID 
                        = Sales.SalesPerson.BusinessEntityID)

WHAT Happens to NULL?

When the subquery returns a null value what does EXISTS return: NULL, TRUE, or FALSE?

To be honest, I was surprised.

I was sure it would return NULL, but to my surprise I learned it returns TRUE. Therefore, if your subquery returns a NULL value, the EXISTS statement resolves to TRUE. In the following example, all the SalesOrderHeader rows are returned as the WHERE clause essentially resolved to TRUE:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  EXISTS (SELECT NULL)

As we study the IN operator, we’ll see this behavior is unique to the EXISTS clause.

IN and NOT IN

We first studied the IN operator back in the lesson, How to Filter Your Query Results. When used in subqueries, the mechanics of the IN and NOT IN clause are the same. Here is a summary from that article.

IN and NOT IN Review

The IN operator is considered a membership type. The membership type allows you to conduct multiple match tests compactly in one statement. For instance, consider if you have a couple spelling variations for the leader of the company such as ‘Owner’, ‘President’, and ‘CEO.’ In cases like this, you could use the in operator to find all matches:

SQL
ContactTitle IN ('CEO', 'Owner', 'President')

The above will math or return turn if the contact title is either ‘CEO’, ‘Owner’, or ‘President.’ To use the IN comparison operator, separate the items which to test for with commas and be sure to enclose them in parenthesis. The full SQL statement for our example is:

SQL
SELECT CompanyName, ContactName, ContactTitle
FROM Customers
WHERE ContactTitle IN ('CEO', 'Owner', 'President');

Note: The above query isn't meant for the Adventure works database.

Using IN with Sub Queries

When used with subqueries, the list of values is replaced with a subquery. The advantage to using a sub query in this case is that it helps to make your queries more data driven and less brittle.

What I mean is you don’t have to hard code values.

If for instance you’re doing a query to find sales order by top sales people, the non-sub query way to use the IN statement is:

SQL
SELECT SalesOrderID,
       OrderDate,
       AccountNumber,
       CustomerID,
       SalesPersonID,
       TotalDue
  FROM Sales.SalesOrderHeader
  WHERE SalesPersonID IN (279, 286, 289)

whose results are

Image 1

But now, since we know about subqueries, we can use the following to obtain the same list:

SQL
SELECT SalesOrderID,
       OrderDate,
       AccountNumber,
       CustomerID,
       SalesPersonID,
       TotalDue
  FROM Sales.SalesOrderHeader
  WHERE SalesPersonID IN (SELECT BusinessEntityID
                            FROM Sales.SalesPerson
                           WHERE Bonus > 5000)

The advantage is that as sales persons sell more or less, the list of sales person IDs returned adjusts.

Just like with other queries, you can create a correlated sub query to be used with the IN clause. Here is the same query we used with the EXIST clause.

It returns all sales orders written by sales people with sales year to date greater than three million dollars, but now we use the IN clause:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  SalesPersonID IN (SELECT SalesPerson.BusinessEntityID
               FROM   sales.SalesPerson
               WHERE  SalesYTD > 3000000
                      AND SalesOrderHeader.SalesPersonID 
                        = Sales.SalesPerson.BusinessEntityID)

As IN returns TRUE if the tested value is found in the comparison list, NOT IN returns TRUE if the tested value is not found. Taking the same query from above, we can find all Sales orders that were written by salespeople that didn’t write 3,000,000 in year-to-date sales, we can write the following query:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  SalesPersonID NOT IN (SELECT SalesPerson.BusinessEntityID
               FROM   sales.SalesPerson
               WHERE  SalesYTD > 3000000
                      AND SalesOrderHeader.SalesPersonID 
                        = Sales.SalesPerson.BusinessEntityID)

WHAT Happens to NULL with IN?

When the comparison list only contains the NULL value, then any value compared to that list returns false.

For instance:

SQL
SELECT SalesOrderID,
       RevisionNumber,
       OrderDate
FROM   Sales.SalesOrderHeader
WHERE  SalesOrderID  IN (SELECT NULL)

returns zero rows. This is because the IN clause always returns false. Contrast this to EXISTS, which returns TRUE even when the subquery returns NULL.

Comparison Modifiers

Comparison operators such as greater than, less than, equal, and not equal can be modified in interesting ways to enhance comparisons done in conjunction with WHERE clauses.

Rather than using >, which only makes sense when comparing to a single (scalar) value, you can use > ANY or > ALL to compare a column value to a list results returned from sub query.

Using the > ANY Modifier

The comparison operator > ANY means greater than one or more items in the list. This is the same as saying it is greater than MIN value of the list. So the expression...

SQL
Sales > ANY (1000, 2000, 2500)

...returns TRUE if Sales are greater than 1000 as this expression is equivalent to:

SQL
Sales > MIN(1000, 2000, 2500)

Which simplifies to:

SQL
Sales > 1000

Note: You may see some queries using SOME. Queries using SOME return the same result as those using ANY. Simply said > ANY is the same as > SOME.

Let’s do an example using the Adventure works database. We’re going to find all products which may have a high safety stock level. To do so, we’ll look for all products that have a SafetyStockLevel that is greater than the average SafetyStockLevel for various DaysToManufacture.

The query to do this is:

SQL
SELECT ProductID,
       Name,
       SafetyStockLevel,
       DaysToManufacture
FROM   Production.Product
WHERE  SafetyStockLevel > ANY (SELECT   AVG(SafetyStockLevel)
                               FROM     Production.Product
                               GROUP BY DaysToManufacture)

When this subquery is run, it first calculates the Average SafetyStockLevel. This returns a list of numbers. Then for each product row in the outer query SafetyStockLevel is compared. If it is greater than one or more from the list, then include it in the results.

Like me, you may at first think that > ANY is redundant, and not needed. It is equivalent to > MIN(…), right?

What I found out is that though it is equivalent in principle, you can’t use MIN. The statement:

SQL
SELECT ProductID,
       Name,
       SafetyStockLevel,
       DaysToManufacture
FROM   Production.Product
WHERE  SafetyStockLevel > MIN((SELECT   AVG(SafetyStockLevel)
                               FROM     Production.Product
                               GROUP BY DaysToManufacture))

won’t run. It returns the error, “Cannot perform an aggregate function on an expression containing an aggregate or a subquery.”

Using the > ALL Modifier

The > ALL modifier works in a similar fashion except it returns the outer row if its comparison value is greater than every value returned by the inner query.

The comparison operator > ALL means greater than the MAX value of the list. Using the example above, then

SQL
Sales > ALL (1000, 2000, 2500)

is equivalent to:

SQL
Sales > MAX(1000, 2000, 2500)

Which returns TRUE if Sales > 2500

In this example, we’ll return all SalesPeople that have a bonus greater than ALL salespeople whose year-to-date sales were less than a million dollars.

SQL
SELECT p.BusinessEntityID,
       p.FirstName,
       p.LastName,
       s.Bonus,
       s.SalesYTD
FROM   Person.Person AS p
       INNER JOIN Sales.SalesPerson AS s
       ON p.BusinessEntityID = s.BusinessEntityID
WHERE  s.Bonus > ALL (SELECT Bonus
                      FROM   Sales.SalesPerson
                      WHERE  Sales.SalesPerson.SalesYTD 
                             < 1000000)

Summary of Various Comparison Modifiers

You can use comparison modifiers with other operators, such as equals. Use the chart below to get a better understanding through the examples. I’ve listed all the combinations, even those that don’t make too much sense.

When reviewing the example, assume the sub query returns a list of three numbers: 1,2,3.

Image 2

Some combinations of these comparison modifiers are downright goofy. For instance, I can’t imagine using “= ALL” or “<> ANY.” The others make sense, and as we have shown, you can really use MAX or MIN as legal equivalent statements. ANY and ALL do have their places!

Out of all of the items we discussed today, I’ve used EXISTS and NOT EXISTS the most with subqueries. I use IN quite a bit, but usually with a static list, not with subqueries.

License

This article, along with any associated source code and files, is licensed under The MIT License