Contents
Building SQL queries have always been considered the developers' worst nightmare; with all those nested queries and different Joins. In this article, I try to illustrate a somewhat graphical and easy way I use to build nested Join queries, as they are the most common types of queries developers come across. It solved my problems with Join queries, and hopefully, it will solve yours.
The easiest way to explain this is by an example, so I chose the very known Northwind database shipped with Microsoft SQL Server. Shortly, we will see: the sample result we want to achieve, how the "Employee Sales by Country" Stored Procedure in the Northwind database achieves it, and how our query will achieve it.
The result we need can wear many human-expression statements. You will see that you can come up with your own statement, but to narrow things down, I choose the following:
"All Orders, shipped between the StartDate and the EndDate (dates are parameters), showing the OrderId, its Shipping Date, its Subtotal, the Employee's (who made the order) First Name, Last Name, and Country".
A sample of what the result would be is shown here:
The query to achieve such a result is a little bit complex; this should never frighten you and stop you from reading to the end of this article. The whole issue here is about how to make complex statements easy to build, so when you see the statements, just hold your breath and bare.
Northwind database has a ready Stored Procedure that achieves the sample result called "Employee Sales by Country"; my goal of involving it here is to compare our query to it; Complexity and Performance wise.
create procedure "Employee Sales by Country"
@Beginning_Date DateTime,
@Ending_Date DateTime
AS
SELECT
Employees.Country,
Employees.LastName,
Employees.FirstName,
Orders.ShippedDate,
Orders.OrderID,
"Order Subtotals".Subtotal AS SaleAmount
FROM Employees
INNER JOIN
(Orders
INNER JOIN
"Order Subtotals"
ON Orders.OrderID = "Order Subtotals".OrderID
)
ON Employees.EmployeeID = Orders.EmployeeID
WHERE
Orders.ShippedDate Between @Beginning_Date And @Ending_Date
GO
But if you notice, this query already uses the View "Order Subtotals", a ready statement. To show how the query really looks like, let us substitute the "Order Subtotals" View with its real query. The following is the result:
SELECT
Employees.Country,
Employees.LastName,
Employees.FirstName,
Orders.ShippedDate,
Orders.OrderID,
OrderSubtotals.Subtotal AS SaleAmount
FROM Employees INNER JOIN
(Orders INNER JOIN (
SELECT
"Order Details".OrderID,
Sum(CONVERT(money,("Order Details".UnitPrice*
Quantity*(1-Discount)/100))*100) AS Subtotal
FROM "Order Details"
GROUP BY "Order Details".OrderID
) OrderSubtotals ON Orders.OrderID = OrderSubtotals.OrderID)
ON Employees.EmployeeID = Orders.EmployeeID
WHERE Orders.ShippedDate Between @Beginning_Date And @Ending_Date
Executing the above code in the Query Analyzer generates the expected result (you only have to substitute the parameters with their proper values).
Now, it's our turn to create our query, the goal is to achieve the same result. So, let's follow the steps and see if we will succeed.
First, we specify which tables (along with their needed columns) we need to come up with the wanted result (we should not forget to get the needed Primary keys and the foreign keys by which we will connect to the other tables). I guess this is pretty easy and straightforward.
- Orders:
- OrderId (PK)
- EmployeeId (FK to connect to Employees)
- ShippedDate
- OrderDetails
- OrderId (FK to connect to Orders)
- UnitPrice
- Quantity
- Discount
- SubTotal (a result column calculated by the columns: UnitPrice, Quantity, and Discount)
- Employees
- EmployeeId (PK)
- FirstName
- LastName
- Country
After we have determined all the tables and their columns needed, we draw a map of queries that shows us the path to the final result; the best way to describe this is by imagining it as group of creaks joining each other at some point till they form a large river at the end.
We start with the most nucleic queries (that have the least dependency on others), and then we build over it till we have the final result. The map consists of blocks (queries) and lines (to form Joins).
The first nucleic query that comes to my mind is the Orders' Subtotals query; this query is composed of just one table; it makes calculations on some of its fields, adding a new column aliased by "Subtotal":
As you may have noticed, I labeled the query with Q1 since we will keep building on it (adding creeks). The only queries to be given alias names are: the first query (Q1), and all the queries that are a result of a Join; any intrinsic query (other than the first one) won't be given a name.
Now, we need to append to this query the closest one to it; for me, I would join it with Order information (the Shipped Date, and the Employee Id who made the order) through the OrderId field, which resembles:
The results:
Everything is great for now, not much left; we only need to join this with the Employee's information (First Name, Last Name, and Country) through the EmployeeID, that resembles:
The results:
By this, we have the complete query map, and we are ready to start creating the queries.
Now, for each joint, we apply the following simple rule till we come up with the final result (refer to the map to know what is intrinsic and what is Qx):
Qx+1=
SELECT
intrinsic.fields,
Qx.fields
FROM intrinsic
JOIN
( body of Qx ) as Qx
ON Qx.[FK] = intrinsic.[PK]
and here we start applying the above rule:
Q1 (base query) =
SELECT
"Order Details".OrderID,
Sum(CONVERT(money,("Order Details".UnitPrice*Quantity*
(1-Discount)/100))*100) AS Subtotal
FROM "Order Details"
GROUP BY "Order Details".OrderID
Q2 =
SELECT
Orders.OrderID OrderId,
Orders.EmployeeID,
Orders.ShippedDate,
Q1.Subtotal
FROM
Orders
JOIN
(
SELECT "Order Details".OrderID, Sum(CONVERT(money,
("Order Details".UnitPrice*Quantity*(1-Discount)/100))*100) AS Subtotal
FROM "Order Details"
GROUP BY "Order Details".OrderID
) AS Q1 ON Orders.OrderId = Q1.OrderID
Now, we do the same for the final result.
Q3 (final ) =
SELECT
Employees.EmployeeID,
Employees.Country,
Employees.LastName,
Employees.FirstName,
Q2.OrderId,
Q2.ShippedDate,
Q2.Subtotal
FROM
Employees
JOIN
(
SELECT
Orders.OrderID OrderId,
Orders.EmployeeID,
Orders.ShippedDate,
Q1.Subtotal
FROM
Orders
JOIN(
SELECT
"Order Details".OrderID,
Sum(CONVERT(money,("Order Details".UnitPrice*
Quantity*(1-Discount)/100))*100) AS Subtotal
FROM
"Order Details"
GROUP BY "Order Details".OrderID
) Q1 ON Orders.OrderId = Q1.OrderID
) AS Q2 ON Employees.EmployeeId = Q2.EmployeeId
WHERE Q2.ShippedDate BETWEEN '1/1/1900' AND '1/1/2006'
To this point, we are finished; the query is done.
But, just to finalize this; comparing the query we came up with to Northwind's "Employee Sales by Country" Stored Procedure, we will notice:
- Complexity: our query has more "
SELECT
" lines within; more to write indeed, but more organized. - Performance: I made a quick comparison between the two in the Query Analyzer, and the Execution Plan for the two was as follows:
Northwinds SP "Employee Sales by Country"
Our query
Showing an exact match of the Execution Plan.
We have seen in this article how to build the next Join Queries in just three easy steps:
- Specify the involved tables and their columns
- Draw the map
- Create the queries
Further enhancement to this can be done, and with more thinking...an active developer can create a piece of software that automates this process, making the developers' lives easier.