Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

SQL GROUP BY Guide

0.00/5 (No votes)
7 Mar 2022CPOL5 min read 3.7K  
How to use SQL GROUP BY Clause is to consolidate like values into a single row
This post is a guide to the Group By clause in SQL. It can be used to consolidate like values into a single row.

Use the SQL GROUP BY Clause is to consolidate like values into a single row.  The group is a set of columns. The group by returns a single row from one or more within the query having the same column values. Its main purpose is this work alongside functions, such as SUM or COUNT, and provides a means to summarize values.

What is a SQL Group?

In SQL, groups are unique column value combinations. When a query has a GROUP BY, rather than returning every row that meets the filter condition, values are first grouped together. The rows returned are the unique combinations within the columns.

The overall syntax for GROUP BY is:

SQL
SELECT   colum1, column2, column3, .. 
FROM     table1 
GROUP BY column1, column2, column3,… 

For the lesson, we’ll focus on the HumanResources.Employee table.

In this example, I’ve ordered the table by JobTitle and Gender.

Image 1

Here, you can see the beginnings of the group. The final step is to add a SQL GROUP BY clause to the statement to create the grouping.

SQL
SELECT JobTitle, MaritalStatus, COUNT(*) NumberEmployees
FROM HumanResources.Employee
GROUP BY JobTitle, MaritalStatus
ORDER BY JobTitle, MaritalStatus

Whose result you see here:

Image 2

I want you to notice several things:

I removed columns that were part of the group or summarization, such as COUNT(*). Only columns listed in the GROUP BY are allowed in the select. In addition to these, you are able to also use aggregation functions, such as SUM, COUNT, MIN, MAX and AVG.

I kept the ORDER BY in the statement since GROUP BY doesn’t guarantee the groups are sorted.

Now that you have a good grasp over how it works, let’s look at some more items related to GROUP BY.

SQL GROUP BY with Sum

Let’s do some summarizations! For a change, we’ll summarize SalesOrderDetail records. For each SalesOrder, let’s calculate the OrderTotal, the Average SalesOrderDetail Line Total for a SalesOrder, and the number of SalesOrderDetail lines within an order.

We’ll use the group by with SUM, AVG, and COUNT functions to help.

SQL
SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderIDORDER BY SalesOrderID;

Since there are one or more lines per Sales Order, we’ll group by the SalesOrderID. Check out the results:

Image 3

Here are some tips to think about when using these functions. You can:

  1. Use more than one function in your SQL.
  2. Include the functions within Expressions.
  3. Alias the expression.

You can also create new expressions using aggregate functions. To demonstrate this, let’s calculate the Average Line Total knowing the Average = the Sum of Line Totals / the Count.

Here is the query to make it so:

SQL
SELECT SalesOrderID
     ,SUM(LineTotal) AS OrderTotal
     ,COUNT(LineTotal) AS NumberOfLines
     ,SUM(LineTotal) / COUNT(LineTotal) AS AverageLineTotal
     ,AVG(LineTotal) AverageFunctionLineTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
ORDER BY SalesOrderID;

And the results. Notice how the “calculated” AverageLineTotal equal the Function’s value.

Image 4

Ordering Grouping Results

Groups aren’t returned in sorted order. In order to do this, you need to include the ORDER BY clause. Add the ORDER BY clause to the end of the SQL statement.

Modify this statement so it sorts the groups by SalesOrderID and CarrierTrackingNumber.

SQL
/* Answer */
SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines 
FROM Sales.SalesOrderDetail 
GROUP BY SalesOrderID, CarrierTrackingNumber 
ORDER BY SalesOrderID, CarrierTrackingNumber 

HAVING and GROUP BY

The HAVING clause filters groups created using GROUP BY. The HAVING clause determines whether a group is included within the result.

Unlike the WHERE clause which filters individual records, the HAVING clause filter groups; however, keep in mind that SQL containing both a WHERE and HAVING processes both!

In this case, the WHERE clause is first processed, and those records filtered. The HAVING is then applied to groups as specified in the GROUP BY.

Typically HAVING is used with GROUP BY, but if a GROUP BY clause isn’t included, then HAVING operates on the entire result set as one implicit group.

Consider our previous example. Here, we calculate the OrderTotals.

SQL
SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
ORDER BY SalesOrderID;

Suppose we need to show large orders, those totalling more than $100,000.00. How can we do so?

Use the HAVING clause to filter on the SUM(LineTotal).

Here is the query to show large orders:

SQL
SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID;

What’s nice about HAVING

HAVING clause’s key advantage is its ability to filter GROUPS using aggregate functions. This is something you cannot do within a SELECT statement.

Hint: If you are having a hard time understand HAVING and WHERE’s role within SELECT, I recommend understanding the SELECT statement order of execution first.

Image 5

GROUP BY with Multiple Tables

GROUP BY can also include columns from multiple tables. In the following query, let’s calculate the Sales by ProductName. To do so, we'll join SalesOrderDetail to Product. We’ll join on product.

SQL
SELECT p.Name ProductName,
    SUM(s.OrderQTY) TotalQuantitySold,
    SUM(s.LineTotal) TotalSales,
    SUM(s.OrderQTY * p.StandardCost) TotalStandardCost,
    SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin
FROM Sales.SalesOrderDetail s
    INNER JOIN Production.Product p ON  s.ProductID = p.ProductID
GROUP BY p.Name
ORDER BY p.Name

Check out how columns from the two tables participate within the TotalStandardCost calculation.

Image 6

Sort by SUM()

Using the same example as above, notice I can sort by one of the totals. In this case, we’ll sort by TotalQuantitySold.

SQL
SELECT p.Name ProductName,
    SUM(s.OrderQTY) TotalQuantitySold,
    SUM(s.LineTotal) TotalSales,
    SUM(s.OrderQTY * p.StandardCost) TotalStandardCost,
    SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin
FROM Sales.SalesOrderDetail s
    INNER JOIN Production.Product p ON  s.ProductID = p.ProductID
GROUP BY p.Name
ORDER BY SUM(s.OrderQTY)

Since this column is aliased, let’s make the query easier to read, by sorting by the alias name.

Image 7

GROUP BY with an Expression

You can group on an expression. Grouping is allowed as long as the same expression is within the SELECT list.

SQL
SELECT Year(OrderDate) OrderYear, Sum(TotalDue) YearlySales
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate)
ORDER BY OrderYear

In this example, let’s group and summarize SalesOrder totals by year

Image 8

And there is nothing stopping us from grouping one or more expressions! Let’s further group sales by Month.

SQL
SELECT Year(OrderDate) OrderYear,
       Month(OrderDate) OrderMonth,
       SUM (TotalDue) MonthlySales
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
ORDER BY OrderYear, OrderMonth

Before I forget, check out how I sorted the results. I used the column alias to order rather than repeating the Expression. Both are valid, but it sure is nice to keep our expression to a minimum! I think it’s easier to read. In case you’re wondering, this is all possible due to SQL Order of Operations.

Here are the results. I color coded the columns to better show you the multiple expressions. Like the previous example, each expression used within the GROUP BY is also listed as a column.

Image 9

SQL GROUP BY versus DISTINCT

In some ways, GROUP BY is similar to using the DISTINCT Clause.

SQL
SELECT DISTINCT JobTitle, MaritalStatus
FROM HumanResources.Employee
ORDER BY JobTitle, MaritalStatus

Returns the same results as this SQL using SQL GROUP BY:

SQL
SELECT JobTitle, MaritalStatus
FROM HumanResources.Employee
GROUP BY JobTitle, MaritalStatus
ORDER BY JobTitle, MaritalStatus

The key difference between DISTINCT and GROUP BY is that when using GROUP BY, you can also summarize values.

Here, we included the COUNT() function to determine how many rows make up each group:

SQL
/* Answer */
SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines 
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID, CarrierTrackingNumber;

Conclusion

When using SQL GROUP BY, keep these points in mind:

  • The SQL GROUP BY does not guarantee the groups are returned in sorted order.
  • You can use aggregate functions, such as SUM and COUNT to summarize data.
  • The non summarized columns in your SELECT list must match those listed in the GROUP BY.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)