Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

SQL Window Functions

0.00/5 (No votes)
7 Mar 2022CPOL6 min read 5.1K  
The purpose of this article is to introduce you to SQL window functions and some of the high-level concepts.

In SQL, a window function refers to a function, such as sum or average, which acts upon a result set’s rows relative to the current row.

There are a lot of details to cover, but you’ll see we cover those in later articles. For now, we’ll dig into the how window functions work and really get to know more about frames.

All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks database. You can get started using these free tools using my Guide Getting Started Using SQL Server.

Introduction to SQL Window Functions

Window functions are, for the most part, familiar. What makes them special is they are set up to operate on a set of rows relative to the query processor’s current row. This set is called the window frame and it is set up using the OVER clause.

The OVER clause can get tricky, as you introduce ORDER BY and PARTITION BY parameters, but we’ll do this one at a time. By the end of this article you’ll see how each of these affects the window function calculation, and be ready to tackle other analytic functions and partition options.

The following diagram pulls together all the concepts we’ll go over in this article. As you read subsequent sections, I encourage you to come back to this illustration, so you understand how it fits into the overall picture.

Image 1

SQL Window Functions Overview

SQL OVER Clause

As we mentioned before, it is the OVER clause that makes an aggregate a SQL window function. Here you’ll see a simple window function to calculate to total SalesYTD for all sales persons.

SQL
SELECT BusinessEntityID
      ,TerritoryID     
      ,SalesYTD     
      ,SUM(SalesYTD) OVER() as TotalSalesYTD
FROM  Sales.SalesPerson S

Which when run outputs:

Window Function Results

SQL Window Function Results

Notice the TotalSalesYTD is the same value as if we had run the query

SQL
SELECT SUM(SalesYTD)
FROM   Sales.SalesPerson

In fact, if we wished, we could have created this same result using this subquery:

SQL
SELECT BusinessEntityID     
      ,TerritoryID     
      ,SalesYTD     
      ,(SELECT SUM(SalesYTD) 
          FROM Sales.SalesPerson) as TotalSalesYTD
FROM Sales.SalesPerson S

However, you’ll see as we delve into SQL window functions, their capabilities for aggregating data quickly outpace what’s achieved using subqueries alone.

SQL OVER with PARTITION BY

The SQL PARTITION BY divides the rows into groups by which the window function calculates its result. The window function is computed across these rows.

It is similar to how the GROUP BY clause works; however, the result of the GROUP BY clause is a summary row. With a window function, the original row remains intact. The partition is logical, and used only for calculation purposes.

Here is a continuation of our previous example. In this case, we’ve partitioned the result by TerritoryID.   

SQL
SELECT BusinessEntityID     
      ,TerritoryID     
      ,SalesYTD     
      ,SUM(SalesYTD) OVER(PARTITION BY TerritoryID) as TotalSalesYTD
FROM   Sales.SalesPerson

Notice that TotalSalesYTD now represents a TotalSalesYTD for each Territory.

Could we have done this same query using a subquery? Sure! We could have used a correlated sub query similar to the one below

SQL
SELECT BusinessEntityID
        ,TerritoryID
        ,SalesYTD
        ,(SELECT SUM(SalesYTD)
          FROM   Sales.SalesPerson
          WHERE  COALESCE(TerritoryID, -1) = COALESCE(S.TerritoryID, -1)
         ) AS TotalSalesYTD
FROM Sales.SalesPerson AS S
ORDER BY TerritoryID;

But as you can see the subquery is becoming more cumbersome to write; whereas, the SQL window function is more elegant.

You may have figured this out, but if no partition is specified, then the entire result is considered the partition. This is important to know as we move on to the next topic, which has to do with how we order rows within the partition for calculation.

SQL OVER with ORDER BY

When the ORDER BY clause is used within the window, it sets up the order rows are processed within the partition. Keep in mind, that if no PARTITION BY is specified, the partition includes every row.

All rows from the start of the partition to the current row or rows equal to the current rows ORDER BY value are considered part of a window frame. Many window functions, such as SUM, operate within this frame, meaning, it becomes straight forward to setup running totals, averages, etc. according to the window’s ORDER BY.

Let’s look at an example, where we’ll calculate a running total or TotalSalesYTD by SalesPerson. We’ll do this by ordering the window by the sales person’s BusinessEntityID:

SQL
SELECT BusinessEntityID     
      ,TerritoryID     
      ,SalesYTD     
      ,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID) as TotalSalesYTD
FROM   Sales.SalesPerson

Here is the result

Image 3

Window Function Results with OVER

Notice the the TotalSalesYTD gets progressively larger and eventually equals the TotalSalesYTD previously calculated.

Sorting Window Function Results

So now you may be wondering what happens if we try to order the results by something different than BusinessEntityID?

Image 4

Will ordering by TerritoryID ruin the running total calculation?

Actually it won’t! The reason being the SQL window function’s calculations occur during the SELECT phase of order of execution, which is before the ordering of the result. If you’re unsure of this, then check out this side-by-side comparison:

Image 5

SQL Window Function and Sorted Results

Notice, that though the results on the left are sorted differently than those on the right, the TotalSalesYTD corresponding to each BusinessEntityID remain the same. You can see where I point this out for two cases (see highlighted rows).

The key take away is that when SQL window functions perform their calculations they are doing so on a logical ordering of rows, which is independent of the final result. These rows are referred to as a frame, as they are a subset of the window.

Using Two Window Functions

Let’s continue the example by combining our window functions together into a single query. Below you’ll see where you have two window functions:  one which computer the grand total, and another the running total:

SQL
SELECT BusinessEntityID     
      ,TerritoryID     
      ,SalesYTD     
      ,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID)
       as RunningTotalSalesYTD     
      ,SUM(SalesYTD) OVER() as GrandTotalSalesYTD
FROM   Sales.SalesPerson
ORDER BY BusinessEntityID

This illustrates that you can have two frames within the same query.

Image 6

Two Window Functions

The RunningTotalSalesYTD’s frames is defined by the ordering of the current row via BusinessEntityID. The GrandTotalSalesYTD frame, since there is no ORDER BY, covers the entire partition.

Now that we have a better understanding of SQL window function concepts, let’s see how PARTITION BY and ORDER BY work together within the OVER clause.

SQL OVER with PARTITION BY and ORDER BY

Using PARTION BY and ORDER BY in concert within the OVER clause provides a powerful combination. Doing so, allows us to create frames to perform calculations such as, running totals, within groups.

Consider the query used to calculate a running total, by sales person, within each sales territory.

SQL
SELECT BusinessEntityID     
      ,TerritoryID     
      ,SalesYTD     
      ,SUM(SalesYTD) OVER(PARTITION BY TerritoryID                           ORDER BY BusinessEntityID)
        as RunningTotalSalesYTDTerritory
FROM  Sales.SalesPerson

In the results you can see the running total resets when a new TerritoryID (partition) is encountered.

Image 7

ORDER BY and Partition BY

Let’s wrap up by going back over some of the key concepts.

SQL Window Function Key Concepts

Here are some of the key concepts we went over in this article. I know there was a lot covered, but it is important you understand the concepts and jargon before you move on.

OVER — This is your signal the preceding function is a window function. Items contained within the OVER clause, such as ORDER BY or PARTITION BY dictate how the window is framed. Meaning, how is the window function computed.

ORDER BY — it’s what makes a function become a running total. It determines the order the rows are processed to the function within the frame.

PARTITION – It is what makes the groupings. The partion can be one or more columns. One a new partition ends the window function result is reset.

Some observations:  

If you don’t have an ORDER BY then you get a total not a running total.

You can have different windows in the same query. This makes it really easy to include running totals by day, month and year for example!

One key point to remember is the frame defined by the order by and partition by are completely separated from the sort order. It has nothing to do with the windows you define within your window functions. They don’t affect the results, as the result are first calculated.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)