In today’s lesson, you’re going to learn about grouping results returned from your queries using the GROUP BY
clause.
The objectives of today’s lesson are to:
- Learn to how to group results using
GROUP BY
- Use aggregate functions to perform calculations
- Understand how to filter groups using the
HAVING
clause
GROUP BY Clause
The SQL GROUP BY
Clause is used to output a row across specified column values. It is typically used in conjunction with aggregate functions such as SUM
or Count
to summarize values. In SQL, groups are unique combinations of fields. Rather than returning every row in a table, when values are grouped, only the unique combinations are returned.
The GROUP BY
Clause is added to the SQL Statement after the WHERE
Clause. Here is an example where we are listing OrderID
, excluding quantities greater than 100
.
SELECT OrderID
FROM OrderDetails
WHERE Quantity <= 100
GROUP BY OrderID;
There are a couple of things to note. First, the columns we want to summarize are listed, separated by commas, in the GROUP BY
clause. Second, this same list of columns must be listed in the select
statement; otherwise the statement fails.
When this statement is run, not every filtered row is returned. Only unique combinations of OrderID
are included in the result. This statement is very similar to:
SELECT DISTINCT OrderID
FROM OrderDetails
WHERE Quantity <= 100;
But there is a key difference. The DISTINCT
modifier stops at outputting unique combination of rows, whereas with the GROUP BY
statement, we can calculate values based on the underlying filtered rows for each unique combination.
In other words, using our example, with the GROUP BY
, we can calculate the number or OrderDetails
per order as follows:
SELECT OrderID, COUNT(OrderID) as NumOrderDetails
FROM OrderDetails
GROUP BY OrderID;
COUNT
is an example of an aggregate function, these are what really give the GROUP BY
statement its special value.
Aggregate Functions
Some functions, such as SUM
, are used to perform calculations on a group of rows, these are called aggregate functions. In most cases, these functions operate on a group of values which are defined using the GROUP BY
clause. When there isn’t a GROUP BY
clause, it is generally understood the aggregate function applies to all filtered results.
Some of the most common aggregate functions include:
AVG (expression) | Calculate the average of the expression |
COUNT (expression) | Count occurrences of non-null values returned by the expression |
COUNT (*) | Counts all rows in the specified table |
MIN (expression) | Finds the minimum expression value |
MAX (expression) | Finds the maximum expression value |
SUM (expression) | Calculate the sum of the expression |
These functions can be used on their own on in conjunction with the GROUP BY
clause. On their own, they operate across the entire table; however, when used with GROUP BY
, their calculations are “reset” each time the grouping changes. In this manner, they act as subtotals.
General Syntax of an Aggregate Function
When using the aggregate function, you can either compute the result on all values or distinct values. For instance, to count all OrderDetails
records, we could use the expression:
SELECT COUNT(OrderID)
FROM OrderDetails;
To count the distinct of orders making up the details, we would use the following:
SELECT COUNT(DISTINCT OrderID)
FROM OrderDetails;
Using Aggregate Functions with GROUP BY
AVG and SUM
The SUM
function totals up the values returned, in a similar fashion AVG
calculates the average.
Let’s see if we can calculate the total order amount from the OrderDetails
. From previous lessons, we know how to calculate the total amount for each detail as:
SELECT OrderID, ProductID, UnitPrice * Quantity as TotalPrice
FROM OrderDetails;
Since we can apply aggregate function to expressions, we can set up a grouping on OrderID
to calculate the total price per order as:
SELECT OrderID, SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID;
We can even sort by the total to get the top orders first:
SELECT OrderID, SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
ORDER BY TotalPrice DESC;
In a similar fashion, we can calculate the average order detail amount as:
SELECT OrderID,
AVG(UnitPrice * Quantity) as AverageOrderAmount
FROM OrderDetails
GROUP BY OrderID;
For the curious, since an average is calculated as the sum of the sample divided by the sample count, then using AVG
in the above statement is the same as:
SELECT OrderID,
SUM(UnitPrice * Quantity) / COUNT(OrderID) as AverageOrderAmount
FROM OrderDetails
GROUP BY OrderID;
We covered a lot in this section. Here are some key points to remember:
- An aggregate function can evaluate an expression such as
SUM(A + B)
- You should alias aggregate functions, so the column names are meaningful
- When working with aggregate functions and
GROUP BY
, it is sometimes easier to think about the details first, that is write a simple SELECT
statement, inspect the results, then add in the fancy stuff.
COUNT
The COUNT
function is used when you need to know how many records exist in a table or within a group. COUNT(*)
will count every record in the grouping; whereas COUNT(expression)
counts every record where expression’s result isn’t null
. You can also use Distinct
with COUNT
to find the number of unique values within a group.
To find the number of OrderDetail
Lines per order:
SELECT OrderID, COUNT(OrderDetailID)
FROM OrderDetails
GROUP BY OrderID;
To find the number of unique orders per product:
SELECT ProductID, COUNT(DISTINCT OrderID)
FROM OrderDetails
GROUP BY ProductID;
MIN and MAX
Use MIN
and MAX
to find the smallest and largest values, respectively, within a table or group.
For example, to find the smallest and largest product quantities ordered within an order
, try:
SELECT OrderID,
MIN(Quantity) as MinQuantity,
MAX(Quantity) as MaxQuantity
FROM OrderDetails
GROUP BY OrderID;
You can also find the MIN
or MAX
value of a calculation. Here, we find the highest product amount ordered within a product:
SELECT OrderID,
MAX(UnitPrice * Quantity) as MaxAmount
FROM OrderDetails
GROUP BY OrderID;
HAVING Clause
The HAVING
clause is used to filter groups according to the results of the aggregate functions. This makes it possible to solve problems such as select all orders that have more than two order detail lines.
That example looks like:
SELECT OrderID, COUNT(OrderDetailID)
FROM OrderDetails
GROUP BY OrderID
HAVING COUNT(OrderDetailID) > 2;
If we wanted to find all orders greater than $1000
, we would write:
SELECT OrderID,
SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
HAVING TotalPrice > 1000
ORDER BY TotalPrice DESC;
This query is the same as one from the previous section with the addition of the HAVING
clause. We could have written the HAVING
clause as:
HAVING SUM(UnitPrice * Quantity) > 1000
But, since the column was already aliased, we used it instead.
To hammer home HAVING
, I want to show one last example. Here you’ll see the HAVING
statement includes an aggregate function that isn’t in the SELECT
list.
SELECT OrderID,
SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
HAVING AVG(UnitPrice * Quantity) > 500
ORDER BY TotalPrice DESC;
In the above query, we’re getting the total price for orders where the average OrderDetail
amount is greater than $500.00
.
Final Statement about HAVING
Though they perform a similar function, there is a key distinction between the WHERE
clause and HAVING
. The WHERE
clause filter individual records; whereas, the HAVING
clause filters on the groups.
To keep it straight in my head, I like to think of the WHERE
clause doing its work before any groupings take place, and then the HAVING
clause taking over after the groups are formed.
Exercises
It’s important to practice! Use the sample database to answer these questions.
- What is the average quantity ordered in the
OrderDetails
table? - Display the
Min
, Max
, and Average Quantity
ordered for each product in OrderDetails
. - Return total sales, by product for all orders, but only include products included on 7 or more
OrderDetails
.
Answers to Exercises
Congratulations! You just learned how to use the GROUP BY
and HAVING
clauses to summarize and filter on summarized information. More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me.
I’m here to help you. What other topics would you like to know more about?