Introduction
One of the biggest issues beginning SQL writers have is being able to write queries that use more than one table. In this series of articles, we are going to show you how to write a query that combines, or joins, data from more than one table. Once you have gone through the examples, you will understand how to write the basic commands to make this happen and why data is separated in the first place.
This first article introduces the concept of joining tables. The focus is going to be more on the type of joins, not necessarily their syntax. The later articles focus on the various types of joins. Through the narrative and examples, you’ll become very comfortable with each one.
In my prior articles, you learned about the need to normalize to make it easier to maintain the data. Though this makes it easier to maintain and update the data, it makes it very inconvenient to view and report information. Typically, the information you need to see has to be cross referenced across several tables for you to see the full picture.
Left in separate tables, it’s tough to put it all together and understand what the data means.
Through the use of join
s, we can stitch the data back together to make it easy for a person to use and understand.
So Why Combine Data?
Before we begin, let’s look into why you have to combine data in the first place. SQLite and other databases such as Microsoft SQL server and MySQL are relational databases. These types of databases make it really easy to create tables of data and a facility to relate (join or combine) the data together.
When databases are developed, care is taken to ensure redundant data is minimized. The databases go through a process called normalization that help reduce each database table to a single meaning or purpose.
For instance, if I had a table containing all the students and their classes, then wanted to change a student’s name, I would have to change it multiple times, once for each class the student enrolled in.
Normalizing, separate the data into a Student
and Classes
table. This makes it really easy to update the student
name, but the price for this is that we have to piece the data back together to answer most of the questions we ask the database.
That is exactly why we need join
s.
Database joins are used to match rows between tables to allow us to stitch the database back together to make it easy to read and use. In most cases, we’re matching a column value from one table with another.
Mechanics of a Join
When broken down, the mechanics of a join
are pretty straightforward. To perform a join
, you need two items: two tables and a join
condition. The tables contain the rows to be combined, and the join
condition the instructions to match rows together.
Take a look at the following Venn diagram. The circles represent the tables and where they overlap rows satisfying the join
condition.
You may be wondering what makes up a join
condition. In many cases, a join
condition is just matching one or more fields in one table to those in another. This type of join
is sometimes called an equijoin since the only type of comparison used is the equals sign.
Join
s aren’t limited to just exact matches such as the equijoin. You’ll see in later articles where it is useful to use other comparison operators such as the greater than sign.
A common join
pattern you’ll come across is joining the primary key of one table to its foreign key. I suppose this is self-evident, but I think it is important to highlight it. Can you think why this is the case?
What process do we use to break up our data?
If you guessed normalization, you are correct. Through that process, we break up dependencies within tables to eliminate update anomalies among other things, but in order to keep relationships, we introduce foreign keys.
Let’s take an example from the sample database. Consider the following data model involving the Employees
and Orders
table. In this model, each employee
can place zero or more orders.
The EmployeeID
is the primary key in the Employees
table and foreign key in the Orders
table. For each employee
, there can be none, one, or perhaps many orders.
Here is a list of all the employees
. To keep it simple, only the LastName
is shown.
In the sample database, you could write the following statement to get these results:
SELECT EmployeeID, LastName
FROM Employees
And here are the Orders
.
You can see this data using this select
statement:
SELECT OrderID, EmployeeID, ShippedDate
FROM Orders
To create a report of employee LastName
and the ShippedDate
of the orders they placed, we need to combine information from both tables. To do so, we would create a join
condition between the two tables on EmployeeID
.
When we work with select
statements involving more than one table, we need a way to keep really clear which field is from which table. This is done by placing the table name in front of the column name. A period is used to separate the two.
Using this convention, the join
condition used is:
Employees.EmployeeID = Orders.EmployeeID
Check the following diagram. We join the table together we are looking for rows where the EmployeeID
matches. So, for every order, where the EmployeeID = 4
, the database will match the Employees
table and match the corresponding row. In this case, that is the employee
whose last name is “Baker
.”
This type of join
is called an inner join
and I’ll go through the explanation in detail in a later article, but if you are dying to try it now, here is the command!
SELECT Employees.LastName, Orders.ShippedDate
FROM Employees
INNER JOIN Orders ON Employees.EmployeeID = Orders.EmployeeID
There are several type of joins we can use to combine tables together. They are introduced in the next section.
Types of Database Joins
The joins we’ll cover in this section are characterized by which rows are returned when the join
condition is met. We already know that an inner joins return rows from each table that meets the join
table, but there are also joins that return rows from the tables even if the join
condition isn’t met. These are called outer join
s.
Inner Joins
Inner join
s return rows where the join
condition is met. This is the most common join
. In its most common example, where a primary key is being matched to a foreign key, only combinations of rows matching the keys are returned.
Outer Joins
Outer join
s are used when, in addition to returning the results of the inner join
, you want to return rows from one table that don’t match in another. In the example above, an employee can have placed zero or more orders. If we only use an inner join
, then our results only include employees that have place orders. Those that haven’t are included in the results.
There may be cases where we want to list all employees regardless of whether they placed an order. In this case, an outer join comes in handy. Regardless of the match with the Orders
table, an employee
record’s selected columns are listed in the results.
There are three types of outer joins: Left
, Right
, and Full outer join
s.
Left Outer Join
– Return all rows from the “left” table, and matching rows from the “right” table. If there are no matches in the right table, return Null
values for those columns. Right Outer Join
– Return all rows from the “right” table, and matching rows from the “left” table. If there are no matches in the left table, return Null
values for those columns. Full Join
– Return all rows from an inner join
, when no match is found, return null
s for that table.
What’s Next?
In the following articles, we dig into the various join types, explore joins involving more than one table, and further explain join conditions, especially what can be done with non-equijoin conditions.
Also, I think it is important to understand what happens under the covers. So as part of this series, we’ll explore the impact joins have on database performance, and why it is important to understand whether indexes can help reduce query times.
More tutorials are to follow! Remember! If you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?