In this puzzle, we’re going to learn how to rewrite a subquery using inner joins. Knowing about a subquery versus inner join can help you with interview questions and performance issues. Though subqueries have unique abilities, there are times when it is better to use other SQL constructs such as join
s.
By reading this article, you’ll learn about several types of subqueries, and how each can be switched to another form, such as a join
.
Solving puzzles is a great way to learn SQL. Nothing beats practicing what you’ve learned. Once you have figured out the puzzle, post your answer in the comments so we all can learn from one another.
SQL Puzzle Question
A remedy for query confusion…
A coworker just learned about subqueries and wrote some SQL to retrieve employee names and birthdates from the AdventureWorks
database. The problem is, they want to change it and now it’s hard to read!
Can you help them simplify the following SQL?
SELECT E.HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID),
(SELECT LastName
FROM Person.Person P2
WHERE P2.BusinessEntityID = E.BusinessEntityID),
E.BirthDate
FROM HumanResources.Employee E
WHERE (SELECT PersonType
FROM Person.Person T
WHERE T.BusinessEntityID = E.BusinessEntityID) = 'EM'
ORDER BY HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID)
What statement would you write to make it easier to read, and perhaps run more efficiently?
Subquery Versus Inner Join Answer
Before we begin, let’s talk about the existing query… what is it?
You’ll see the query combines data from two different tables. It is doing so using subqueries in both the FROM and WHERE clauses. These are highlighted below in bold.
SELECT E.HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID),
(SELECT LastName
FROM Person.Person P2
WHERE P2.BusinessEntityID = E.BusinessEntityID),
E.BirthDate
FROM HumanResources.Employee E
WHERE (SELECT PersonType
FROM Person.Person T
WHERE T.BusinessEntityID = E.BusinessEntityID) = 'EM'
ORDER BY HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID)
Also, you see that each subquery’s WHERE
clause is restricting the rows returned to those equal to the Employee.BusinessEntityID
. This is what people call a correlated subquery.
I also want to point out that the queries in the FROM
must return a single value (scalar). If they don’t, then an error is thrown.
As you can imagine, this is dangerous, as it can be difficult to guarantee a query return at most one row. I know it was safe in this case I know this as the matching condition occurs between each table’s primary keys.
Subquery versus Inner Join – Converting the Query
If I was writing this query, I would use an INNER JOIN. Here is the query I would write:
SELECT E.HireDate,
P.FirstName,
P.LastName,
E.BirthDate
FROM HumanResources.Employee E
INNER JOIN Person.Person P
ON P.BusinessEntityID = E.BusinessEntityID
WHERE P.PersonType = 'EM'
ORDER BY E.HireDate,
P.FirstName
Which is Easier to Read?
This shouldn’t be much of a debate, the INNER JOIN
is much shorter, and I think to the point. The join
clause speaks for itself. You know it is relating two tables together; whereas, with the subquery, it is so apparent.
The INNER JOIN
version is easier to maintain.
Also, with the subquery method, you’ll see much of the code repeated. This may not seem like a big deal, now, but if you ever have to change the query, it is, as now when you make a change, you need to be sure to make the same change in several locations.
SubQuery or Inner Join? Which is More Efficient?
Here is the query plan for the sub query version:
I’ve highlighted the effect of the four subqueries. For this statement, each query results in a nested loop. These aren’t good.
It means if you have ten rows in two tables each, then you need to on average, iterate through the second table 50 times (100/2) for each row in the first to find a match. This means, instead of a seek taking two or three operations to find a match, the match could take upward of 100 * 50 = 500 seeks to find.
Nested loops are a fact of life, but less is better.
And here is the version for the INNER JOIN
:
SET SHOWPLAN_ALL ON
SELECT E.HireDate,
P.FirstName,
P.LastName,
E.BirthDate
FROM HumanResources.Employee E
INNER JOIN Person.Person P
ON P.BusinessEntityID = E.BusinessEntityID
WHERE P.PersonType = 'EM'
ORDER BY E.HireDate,
P.FirstName
Here, you see there is only one nested loop. For sure, that is better than four.
After observing both the SQL and query plans for each set of statements, you can see that INNER JOIN
is superior in several ways; however, check out that simplified plan!
The query’s true task is to combine columns from two tables; this is what INNER JOIN
S excel at. Sure, there are times subqueries make sense, and can be used to do things you can’t with join
s, but in this case, it doesn’t make sense to use one.