Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

A Case Study on How to use Joins, Subqueries, and Unions to Combine Data in SQL

5.00/5 (4 votes)
13 May 2015MIT8 min read 19.2K  
This is a case study on how to use Joins, Subqueries, and Unions to combine data in SQL.

Introduction

A reader recently asked me to help them solve a problem combining data from multiple tables into a single result. What was interesting was that the solution used the three ways to combine data that we have recently covered, namely: JOINS, set operators, and subqueries.

Given the comprehensive nature of this problem, I thought it would be a great case study for you to read.

NOTE: This example is not based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. However, I strongly recommend you get the tool and start using it. Many of the lessons and articles use it. You can get started using these free tools using my Guide Getting Started Using SQL Server.

The questions asked were:

“I have a main data table that lists events. Each event has a series of fields including duration, reason code and status.

The database has a table of statuses and there is a common field between the main table and this allows me to show the status name rather than the code. 

There are four statuses (ready, delay, spare and down) and there are a range of reasons associated with each status. It is possible for the same code to exist in two statuses (e.g. delay maintenance and down maintenance).

For some reason, the source database has four tables of reasons (one for each status). Each of these has a field which links to the main reason code. What is missing is a status field in those tables.

What I’m looking for is something along the lines of ‘’when the main table record status is ‘delay’, then retrieve the name in the ‘delay’ table which corresponds to the main table delay code. When the status is ‘down’, do the same thing but look up the name in the ‘down’ table.

Any pointers (if you can make any sense of the problem description) are gratefully received.”

So the goal is to create a result that uses descriptive values rather than codes. Let’s walk through the solution together.

Image 1
Case Study Goal

Here is an overall view of the various tables and the matching challenge, which is to “conditionally match” on one of the four “ready” tables depending on the value of Status:

Image 2

Challenges Matching “Conditionally”

Getting the status name is pretty straight forward. For this, we can use an INNER JOIN; however, displaying the Reason Name is tougher.

Once you put your mind to it, you’ll realize that that answer hinges around getting a better grip on the various “reason” tables. If they could be treated uniformly, then we could simply join to them once, avoid duplicate entries, and solve the problem.

When I originally thought about this, I was thinking about seeing if I could use SELECT CASE to solve the problem, and then it hit me! We can do this using UNIONS. If you notice, all the tables have a very similar structure.

Given this, we can create a derived table as the result of combining the four “reason” tables which can then be joined to the original table using both the status code and ready code. This avoids having to “conditionally match” to each of four tables.

In my mind, the answer is broken up into three distinct sub solutions:

  1. Joining to the Status table to get the status name.
  2. Using UNION to combine the various “reason” tables; therefore, avoiding the “conditional match” issue.
  3. Incorporating the result from step two into the final query via a subquery (derived table).

Let’s go through each of these sub problems in order.

1. Using INNER JOIN to Return StatusName

Since we need to obtain the StatusName from the status table corresponding to the status code in events, we can use an INNER JOIN. As you may recall, an Inner join returns rows where the join condition is met. This is the most common join.

In its most common example, where a primary key is being matched to a foreign key, only combinations of rows matching the keys are returned.

This is exactly what you have. The StatusCode is the primary key of the status table and we can use that to match that to the foreign key value StatusCode from Event.

Since status code is unique within the status table, you don’t have to worry about multiple rows being returned by the join.

The following diagram summarizes what is being done:

Image 3

Inner Join to Status Table

Here is the SQL statement used to obtain the status name, I colored the INNER JOIN parts in red.

SQL
SELECT EventID,
       Duration,
       S.Name
FROM Event E
       INNER JOIN Status S
       ON E.StatusCode = S.StatusCode

Of course, there is more to do, we also need to get the reason name, but this is a good start.

2. Utilizing UNION to Combine Reason Table Rows

UNION is called a set operator. The UNION operator is used to combine rows from several tables into a single result. Whereas a join is meant to combine columns from different tables into a single row, the UNION operator is adding rows from each table.

In the following diagram, you can see the four separate “reason” tables.

Image 4

Using UNION to Combine Reason Data

By successfully applying union clauses, we’ll combine all the rows. The other thing you’ll notice is we’re adding a static column which corresponds to the status code. Static column is just a fancy way of saying we’re adding a column that is a fixed value.

By doing this, we can associate each table’s rows with the proper status code. Remember, this was one of our big hurdles; we didn’t have a good way to conditionally match to each of the four tables based on status code.

By adding the status code to the result, we can eventually use both the status and reason codes to match to the correct row.

Here is the code used to create the unified results.

SQL
SELECT 'R', ReasonCode, Name
FROM ReadyReason
UNION
SELECT 'D1', ReasonCode, Name
FROM DelayReason
UNION
SELECT 'S', ReasonCode, Name
FROM SpareReason
UNION
SELECT 'D2', ReasonCode, Name
FROM DownReason

3. Using Subqueries to Include a Derived Table in Final Result

Once the union is created, we are now able to use it to match and pull in the reason names. As you can see in the diagram below, the matching becomes much easier. We no longer have to inspect the status code, then decide which one of the four tables to use before matching on reason code to get the name.

Instead, we can now use a standard INNER JOIN to match both the status code and reason code to the result of the union.

But that does bring up a question. How do we introduce that result into our query?

Fortunately, there is a way to do so using subqueries. A subquery is just a query defined within another. There are many places we can use sub queries, such as in the SELECT or WHERE clause. We can also use them in FROM clause. When used here, they typically refer to derived tables. The result of our union is a derived table.

Derived tables are enclosed in parenthesis, like subqueries, but they are also given a name.

Image 5

UNION in Derived Table

If you look closely at the SQL, you see the UNION result is given the name SR.

You’ll also notice there is a column alias for the static field. This is so we can refer to it in the join condition.  It makes the SQL easier to read and follow.

In the sample below, I’ve color coded the UNION green and its use in the INNER JOIN blue..

SQL
SELECT EventID,
       Duration,
       SR.ReasonName
FROM Event E
       INNER JOIN
       (
        SELECT 'R' as StatusCode, ReasonCode, Name
        FROM ReadyReason
        UNION
        SELECT 'D1', ReasonCode, Name
        FROM DelayReason
        UNION
        SELECT 'S', ReasonCode, Name
        FROM SpareReason
        UNION
        SELECT 'D2', ReasonCode, Name
        FROM DownReason
       ) SR
       ON E.StatusCode = SR.StatusCode AND
       E.ReasonCode = SR.ReasonCode

Remember the form of an inner join is:

SQL
INNER JOIN {table} ON {match conditions}

What we have done is replace {table} with our derived table.

In this case, it works very well.

Final Query

To create the final result, we combine the three sub solutions together. From the section above, you can see that each means to do so is relatively simple. Sure, there is syntax to contend with, but I think overall the ideas are straightforward.

Of course, when it is glopped, it's all tougher it can look overwhelming, but hopefully you have seen that if you take a step-by-step approach to building your SQL, that becomes less so.

Image 6

Final query using INNER JOIN, UNION, and subquery

Here is the text for the final SQL color coded to highlight each sub solution.

The first INNER JOIN is in red, the UNION in Green, and the INNER JOIN using the derived table in blue.

SQL
SELECT EventID,
       Duration,
       S.Name as [Status Name],
       SR.ReasonName as [Reason Name]
FROM Event E
       INNER JOIN Status S
       ON E.StatusCode = S.StatusCode
       INNER JOIN
       (
        SELECT 'R', ReasonCode, Name
        FROM ReadyReason
        UNION
        SELECT 'D1', ReasonCode, Name
        FROM DelayReason
        UNION
        SELECT 'S', ReasonCode, Name
        FROM SpareReason
        UNION
        SELECT 'D2', ReasonCode, Name
        FROM DownReason
       ) SR
       ON E.StatusCode = SR.StatusCode AND
       E.ReasonCode = SR.ReasonCode

This concludes the case study.

I hope you found that helpful. Did you see that, though the end result looked like a mouthful, it was really made up from three simpler concepts?

Many of my students raise questions about this process, that is how to write complex queries such as these.

For instance, they are not sure where to start. This is where I can help. I’ve got some really great ideas to help get you going with confidence so you can start writing queries your first day.

Others wonder how to put it all together. As you can see from the case study, there is a progression. I like to take things step-by-step.

When solving complicated problems, I like to build on success. Did you see how I started with a simple solution, added a step, and continued?

If you like my style of teaching, I would recommend you sign up for my newsletter below so you don’t miss out on any future training opportunities!

License

This article, along with any associated source code and files, is licensed under The MIT License