This is the fifth in a series of articles about subqueries. In this article, we discuss subqueries in the HAVING
clause. Other articles discuss their uses in other clauses.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012
database.
Using Subqueries in the HAVING Clause
You can use sub queries in the HAVING
clause to filter out groups of records. Just as the WHERE
clause is used to filter rows of records, the HAVING
clause is used to filter groups. Because of this, it becomes very useful in filtering on aggregate values such as averages, summations, and count.
The power of using a subquery in the HAVING
clause is now you don’t have to hard-code values within the comparisons. You can rely on the subquery’s results to do so for you.
For example, it is now possible to compare the average of a group to the overall average. We’ve always been able to use the average of the group in the HAVING
clause, but we had no way to compute the overall average. Now, using subqueries, this is possible.
In this example, we’re selecting employee job titles having remaining vacation hours greater than the overall average for all employees.
Here is the query written without the subquery:
SELECT JobTitle,
AVG(VacationHours) AS AverageVacationHours
FROM HumanResources.Employee
GROUP BY JobTitle
HAVING AVG(VacationHours) > <span style="color: rgb(255, 0, 0);">50</span>
I’ve highlighted the value in red that will be replaced by a subquery.
Now here is the complete statement including the subquery:
SELECT JobTitle,
AVG(VacationHours) AS AverageVacationHours
FROM HumanResources.Employee
GROUP BY JobTitle
HAVING AVG(VacationHours) > <span style="color: rgb(255, 0, 0);">(SELECT AVG(VacationHours)</span>
<span style="color: rgb(255, 0, 0);"> FROM HumanResources.Employee)</span>
This query is executed as:
- Compute the remaining average vacation hours for all employees (subquery)
- Group records by
JobTitle
and compute the average vacation hours. - Only keep groups whose average vacation hours are greater than the overall average.
Correlated Subqueries in HAVING Clause
As with any other subquery, subqueries in the HAVING
clause can be correlated with fields from the outer query.
Suppose we further group the job titles by marital status and only want to keep those combinations of job titles and martial statuses whose vacation hours are greater than those for their corresponding overall marital status?
In other words, we want to answer a question similar to “do married accountants have, on average, more remaining vacation, than married employees in general?”
One way to find out is to use the following query:
SELECT JobTitle,
MaritalStatus,
AVG(VacationHours)
FROM HumanResources.Employee AS E
GROUP BY JobTitle, MaritalStatus
HAVING AVG(VacationHours) >
(SELECT AVG(VacationHours)
FROM HumanResources.Employee
WHERE HumanResources.Employee. MaritalStatus = E.MaritalStatus)
There are a couple of things to point out. First, notice that I aliased the Employee
as “E
” in the outer query. This allows me to reference the outer table within the inner query.
Also, with the correlated query, only fields used in the GROUP BY
can be used in the inner query. For instance, for kicks and grins, I tried replacing MaritalStatus
with Gender
and got an error.
SELECT JobTitle,
MaritalStatus,
AVG(VacationHours)
FROM HumanResources.Employee AS E
GROUP BY JobTitle, MaritalStatus
HAVING AVG(VacationHours) >
(SELECT AVG(VacationHours)
FROM HumanResources.Employee
WHERE HumanResources.Employee. Gender = E. Gender)
Is a broken query. If you try to run it, you’ll get the following error:
Column ‘HumanResources.Employee.Gender
’ is invalid in the HAVING
clause because it is not contained in either an aggregate function or the GROUP BY
clause.
Summary
One advantage of using a subquery in the HAVING
clause is to avoid hard coding values, such as an overall average, which can change and is easily computed.
As with other queries, it is possible to build correlated subqueries in the HAVING
clause. This can be useful when the subquery is dependent on the outer query’s column values, and may make it easier to initially understand a query; however, care should be taken! As with all SQL, there are usually many ways to write a query to return the same result. If performance is a concern, then use query plans to understand performance and explore alternatives.