In SQL, a window function refers to a function, such as sum or average, which acts upon a result set’s rows relative to the current row.
There are a lot of details to cover, but you’ll see we cover those in later articles. For now, we’ll dig into the how window functions work and really get to know more about frames.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks database. You can get started using these free tools using my Guide Getting Started Using SQL Server.
Introduction to SQL Window Functions
Window functions are, for the most part, familiar. What makes them special is they are set up to operate on a set of rows relative to the query processor’s current row. This set is called the window frame and it is set up using the OVER clause.
The OVER clause can get tricky, as you introduce ORDER BY
and PARTITION BY
parameters, but we’ll do this one at a time. By the end of this article you’ll see how each of these affects the window function calculation, and be ready to tackle other analytic functions and partition options.
The following diagram pulls together all the concepts we’ll go over in this article. As you read subsequent sections, I encourage you to come back to this illustration, so you understand how it fits into the overall picture.
SQL Window Functions Overview
SQL OVER Clause
As we mentioned before, it is the OVER
clause that makes an aggregate a SQL window function. Here you’ll see a simple window function to calculate to total SalesYTD
for all sales persons.
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,SUM(SalesYTD) OVER() as TotalSalesYTD
FROM Sales.SalesPerson S
Which when run outputs:
SQL Window Function Results
Notice the TotalSalesYTD
is the same value as if we had run the query
SELECT SUM(SalesYTD)
FROM Sales.SalesPerson
In fact, if we wished, we could have created this same result using this subquery:
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,(SELECT SUM(SalesYTD)
FROM Sales.SalesPerson) as TotalSalesYTD
FROM Sales.SalesPerson S
However, you’ll see as we delve into SQL window functions, their capabilities for aggregating data quickly outpace what’s achieved using subqueries alone.
SQL OVER with PARTITION BY
The SQL PARTITION BY divides the rows into groups by which the window function calculates its result. The window function is computed across these rows.
It is similar to how the GROUP BY
clause works; however, the result of the GROUP BY
clause is a summary row. With a window function, the original row remains intact. The partition is logical, and used only for calculation purposes.
Here is a continuation of our previous example. In this case, we’ve partitioned the result by TerritoryID
.
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,SUM(SalesYTD) OVER(PARTITION BY TerritoryID) as TotalSalesYTD
FROM Sales.SalesPerson
Notice that TotalSalesYTD
now represents a TotalSalesYTD
for each Territory
.
Could we have done this same query using a subquery? Sure! We could have used a correlated sub query similar to the one below
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,(SELECT SUM(SalesYTD)
FROM Sales.SalesPerson
WHERE COALESCE(TerritoryID, -1) = COALESCE(S.TerritoryID, -1)
) AS TotalSalesYTD
FROM Sales.SalesPerson AS S
ORDER BY TerritoryID;
But as you can see the subquery is becoming more cumbersome to write; whereas, the SQL window function is more elegant.
You may have figured this out, but if no partition is specified, then the entire result is considered the partition. This is important to know as we move on to the next topic, which has to do with how we order rows within the partition for calculation.
SQL OVER with ORDER BY
When the ORDER BY clause is used within the window, it sets up the order rows are processed within the partition. Keep in mind, that if no PARTITION BY
is specified, the partition includes every row.
All rows from the start of the partition to the current row or rows equal to the current rows ORDER BY
value are considered part of a window frame. Many window functions, such as SUM, operate within this frame, meaning, it becomes straight forward to setup running totals, averages, etc. according to the window’s ORDER BY.
Let’s look at an example, where we’ll calculate a running total or TotalSalesYTD
by SalesPerson
. We’ll do this by ordering the window by the sales person’s BusinessEntityID
:
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID) as TotalSalesYTD
FROM Sales.SalesPerson
Here is the result
Window Function Results with OVER
Notice the the TotalSalesYTD
gets progressively larger and eventually equals the TotalSalesYTD
previously calculated.
Sorting Window Function Results
So now you may be wondering what happens if we try to order the results by something different than BusinessEntityID
?
Will ordering by TerritoryID
ruin the running total calculation?
Actually it won’t! The reason being the SQL window function’s calculations occur during the SELECT
phase of order of execution, which is before the ordering of the result. If you’re unsure of this, then check out this side-by-side comparison:
SQL Window Function and Sorted Results
Notice, that though the results on the left are sorted differently than those on the right, the TotalSalesYTD
corresponding to each BusinessEntityID
remain the same. You can see where I point this out for two cases (see highlighted rows).
The key take away is that when SQL window functions perform their calculations they are doing so on a logical ordering of rows, which is independent of the final result. These rows are referred to as a frame, as they are a subset of the window.
Using Two Window Functions
Let’s continue the example by combining our window functions together into a single query. Below you’ll see where you have two window functions: one which computer the grand total, and another the running total:
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID)
as RunningTotalSalesYTD
,SUM(SalesYTD) OVER() as GrandTotalSalesYTD
FROM Sales.SalesPerson
ORDER BY BusinessEntityID
This illustrates that you can have two frames within the same query.
Two Window Functions
The RunningTotalSalesYTD
’s frames is defined by the ordering of the current row via BusinessEntityID
. The GrandTotalSalesYTD
frame, since there is no ORDER BY
, covers the entire partition.
Now that we have a better understanding of SQL window function concepts, let’s see how PARTITION BY
and ORDER BY
work together within the OVER
clause.
SQL OVER with PARTITION BY and ORDER BY
Using PARTION BY
and ORDER BY
in concert within the OVER
clause provides a powerful combination. Doing so, allows us to create frames to perform calculations such as, running totals, within groups.
Consider the query used to calculate a running total, by sales person, within each sales territory.
SELECT BusinessEntityID
,TerritoryID
,SalesYTD
,SUM(SalesYTD) OVER(PARTITION BY TerritoryID ORDER BY BusinessEntityID)
as RunningTotalSalesYTDTerritory
FROM Sales.SalesPerson
In the results you can see the running total resets when a new TerritoryID
(partition) is encountered.
ORDER BY and Partition BY
Let’s wrap up by going back over some of the key concepts.
SQL Window Function Key Concepts
Here are some of the key concepts we went over in this article. I know there was a lot covered, but it is important you understand the concepts and jargon before you move on.
OVER — This is your signal the preceding function is a window function. Items contained within the OVER
clause, such as ORDER BY
or PARTITION BY
dictate how the window is framed. Meaning, how is the window function computed.
ORDER BY — it’s what makes a function become a running total. It determines the order the rows are processed to the function within the frame.
PARTITION – It is what makes the groupings. The partion can be one or more columns. One a new partition ends the window function result is reset.
Some observations:
If you don’t have an ORDER BY then you get a total not a running total.
You can have different windows in the same query. This makes it really easy to include running totals by day, month and year for example!
One key point to remember is the frame defined by the order by and partition by are completely separated from the sort order. It has nothing to do with the windows you define within your window functions. They don’t affect the results, as the result are first calculated.