Outer Joins
The series starts with the article Introduction to Database Joins. All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012
database. You can get started using these free tools using my Guide Getting Started Using SQL Server. In this article, we are going to cover outer joins.
An outer join is used to match rows from two tables. Even if there is no match, rows are included. Rows from one of the tables are always included, for the other, when there are no matches, NULL
values are included.
Types of Outer Joins
There are three types of outer join
s:
Left Outer Join
– All rows from the left table are included, unmatched rows from the right are replaced with NULL
values. Right Outer Join
– All rows from the right table are included, unmatched rows from the left are replaced with NULL
values. Full Outer Join
– All rows from both tables are included, NULL
values fill unmatched rows.
Let’s dig a little deeper and explore the left outer join
.
Left Outer Join
Check out the following data model. This is taken from the AdventureWorks2012
database. In this model, there is 1 Person to 0 or 1 Employees.
To construct a list of all Person LastNames
, yet also show JobTitle
if the Person
is an Employee
, we need a way of joining the two tables and include Person
rows in the result, even if they don’t match Employee
.
This type of join is called a left outer join
, as all the rows for the table from the left side of the JOIN
keyword are included regardless of the match. The basic syntax for a left outer join
is:
SELECT columnlist
FROM table
LEFT OUTER JOIN othertable ON join condition
The SQL for the join
in the diagram above is:
SELECT person.Person.BusinessEntityID,
Person.Person.LastName,
HumanResources.Employee.NationalIDNumber,
HumanResources.Employee.JobTitle
FROM person.Person
LEFT OUTER JOIN
HumanResources.Employee
ON person.BusinessEntityID = Employee.BusinessEntityID
Here are the first results from the query:
Notice how there are NULL
values listed in the second row for NationalIDNumber
and JobTitle
. This is because there are no employees matching BusinessEntityID 293
.
Right Outer Join
Let’s take another look at the diagram, but this time, we are doing a right outer join
. As you may have guessed, there isn’t too much difference in the SQL statement between a left outer join
and a right outer join
. The basic syntax for a right outer join
is:
SELECT columnlist
FROM table
RIGHT OUTER JOIN othertable ON join condition
Below is our sample query written as a right outer join
:
SELECT person.Person.BusinessEntityID,
Person.Person.LastName,
HumanResources.Employee.NationalIDNumber,
HumanResources.Employee.JobTitle
FROM person.Person
RIGHT OUTER JOIN
HumanResources.Employee
ON person.BusinessEntityID = Employee.BusinessEntityID
The key difference is that now we are going to return all records from the Person
table, which is the table to the right of the join
keyword. If a matching Employee
record isn’t found, then NULL
will be returned for BusinessEntityID
and LastName
.
Here are the results from the query.
I scrolled through all the results and was surprised to not see any null
values.
Do you know why?
The answer lies in the data model. There is a 0..1 to 1 relationship between Employee
and Person
. This means that for every Employee
, there is one Person
. Given this, for the right join
, there won’t exist any non-matching rows. With this type of relationship, you could have also used an inner join
.
Left versus Right Outer Joins
There is no difference in functionality between a left outer join and a right outer join.
The statement...
SELECT person.Person.BusinessEntityID,
HumanResources.Employee.NationalIDNumber
FROM person.Person
LEFT OUTER JOIN
HumanResources.Employee
ON person.BusinessEntityID = Employee.BusinessEntityID
...returns the same result as:
SELECT person.Person.BusinessEntityID,
HumanResources.Employee.NationalIDNumber
FROM HumanResources.Employee
RIGHT OUTER JOIN
person.Person
ON person.BusinessEntityID = Employee.BusinessEntityID
Of course, this wouldn’t be the case if I had only changed the join from LEFT
to RIGHT
and not switched the table names.
I typically use left outer join
s more than I do right outer join
s. I think this is because when I draw relationships I do so left to right. Also, I traverse tables in my head from left to right.
This then fits in well with SQL as the “left” table is in the FROM
statement.
I’m curious to know what you use. I’m really curious to know whether a right join
seem more intuitive to you if you’re native speaker of Arabic or some other “right to left” language.
Full Outer Join
A full outer join
is the combination of results from a left
and right outer join
. The results returned from this type of join
include all rows from both tables. Where matches occur, values are related. Where matched from either table don’t, then NULL
is returned instead.
The basic syntax for a full outer join
is:
SELECT columnlist
FROM table
FULL OUTER JOIN othertable ON join condition
Let’s take a look at a different portion of the AdventureWork2012
database. This time, we are going to focus on the relationships between SalesOrderHeader
and CurrencyRate
tables.
The model is shown below:
Suppose we want to know all the currencies we can place orders in and which orders were placed in those currencies?
SELECT sales.SalesOrderHeader.AccountNumber,
sales.SalesOrderHeader.OrderDate,
sales.CurrencyRate.ToCurrencyCode,
sales.CurrencyRate.AverageRate
FROM sales.SalesOrderHeader
FULL OUTER JOIN
sales.CurrencyRate
ON sales.CurrencyRate.CurrencyRateID =
sales.SalesOrderHeader.CurrencyRateID
Here is a portion of the results showing where some sales have match to a currency and some that haven’t. The reason there are sales that don’t match is that these are sales in USD.
Further down in the results, you see currencies with no matching sales. This reflects the fact that no sales were made in those currencies.
Note: I was surprised to see USD listed, see row 42463, since I would think a majority of the sales would be in this currency. My thought is that rather than reference the currency rate for these transactions, the SalesOrderHeader
value for CurrencyRateID
was set to null
for all USD transactions. I think this is inconsistent, and isn’t the way I would do it, but it isn’t my database…
Advanced Example
So far, we’ve looked at the three types of outer joins, but haven’t explored some more advanced concepts such as joining multiple table and using more than one condition in our join clauses.
We covered these concepts when we explored inner joins, so what I’ll be showing you, shouldn’t be too new, but I think it still makes sense to review, since in some cases mixing full joins with inner joins may produce unexpected or unintended results.
Let’s turn our focus to the production schema and explore products and categories. Let’s produce a list of all product categories and the product models contained within.
Product
has a one to many relationship with ProductModel
and ProductSubcategory
. Since it lies between these two tables, there is an implicit many to many relationship between ProductModel
and ProductSubcategory
. Because of this, it is a good candidate for outer joins as there may be product models with no assigned products and ProductSubcategory
entries with no product.
To overcome this situation, we will do an outer join to both the ProductModel
and ProductCategory
tables.
Here is the SQL:
SELECT PC.Name AS Category,
PSC.Name AS Subcategory,
PM.Name AS Model,
P.Name AS Product
FROM Production.Product AS P
FULL OUTER JOIN
Production.ProductModel AS PM
ON PM.ProductModelID = P.ProductModelID
FULL OUTER JOIN
Production.ProductSubcategory AS PSC
ON PSC.ProductSubcategoryID = P.ProductSubcategoryID
INNER JOIN
Production.ProductCategory AS PC
ON PC.ProductCategoryID = PSC.ProductCategoryID
ORDER BY PC.Name, PSC.Name
There are several items to note:
- I used table aliases to make the SQL more readable.
- There is more than one
full outer join
clause. - The
ProductCategory
table is also part of an outer join
.
Originally, when I wrote the SQL for this query, I had an inner join between ProductSubcategory
and ProductCategory
, but I wasn’t seeing NULL
values for unmatched records I would expect.
Once I changed the join to a full outer join
, I saw the results I expected. The reason this occurs is subtle.
After checking the data, I confirmed that all categories are assigned subcategories. Given this, you would think an inner join would work; however, consider that as the entire statement is executed and rows are returned, the ProductSubcategoryID
value is NULL
whenever a product fails to match a product subcategory.
Null
values, by definition, aren’t equal to one another, so the inner join fails. Given this, when these values are then matched to ProductCategory
, they aren’t included in the result unless the join to ProductCategory
is an outer join
.
In fact, the join doesn’t have to be a full outer join
, a left join
works just as well:
SELECT PC.Name AS Category,
PSC.Name AS Subcategory,
PM.Name AS Model,
P.Name AS Product
FROM Production.Product AS P
FULL OUTER JOIN
Production.ProductModel AS PM
ON PM.ProductModelID = P.ProductModelID
FULL OUTER JOIN
Production.ProductSubcategory AS PSC
ON PSC.ProductSubcategoryID = P.ProductSubcategoryID
LEFT OUTER JOIN
Production.ProductCategory AS PC
ON PC.ProductCategoryID = PSC.ProductCategoryID
ORDER BY PC.Name, PSC.Name
Uses for Outer Joins
Because outer join
s not only the matching rows but also those that don’t, they are a really good way to find missing entries in tables. This is great when you need to do diagnosis on your database to determine if there are data integrity issues.
For example, suppose we were concerned that we may have some ProductSubcategory
entries that don’t match Categories
. We could test by running the following SQL:
SELECT PSC.Name AS Subcategory
FROM Production.ProductCategory AS PSC
LEFT OUTER JOIN
Production.ProductSubcategory AS PC
ON PC.ProductCategoryID = PSC.ProductCategoryID
WHERE PSC.ProductCategoryID is NULL
The outer join
returns the unmatched row values as NULL
values. The where
clause filters on the non-null
values, leaving only nonmatching Subcategory
names for us to review.
Outer join
s can also be used to ask questions such as:
- “What sales persons have never made a sale?”
- “What products aren’t assigned to a product model?”
- “Which departments don’t have any assigned employees?”
- “List all sales territories not assigned sales people.”