There are several ways to calculate a running total in SQL. In this article, we will cover two methods: Joins, and Window Functions.
We’ll first look at how to calculate the running total using an INNER JOIN
. By doing so, you’ll not only learn more about join
conditions, but see how to take the result and summarize it, to get the running total.
Once you’ve seen how to do it “old school,” we’ll use the OVER
clause to calculate running totals using a window
function. This method is newer and more concise to use.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the WideWorldImporters
database. You can get started using these free tools using my Guide Get Started Using SQL Server 2016.
What is a Running Total?
Our goal is to calculate a running total that resets whenever the TransactionDate
changes. We’ll total the TransactionAmount
. For each subsequent invoice within the transaction date, the RunningTotal
should equal the prior InvoiceID
’s running total plus the current TransactionAmount
.
You can see this in action in the following example. The Running Total for Invoice 3, is the prior RunningTotal
of 3110.75
plus the Invoice 3’s Transaction Amount of 103.50
.
Calculate A Running Total in SQL using an INNER JOIN
We’ll first calculate the running total using INNER JOIN
s. This method reveals more of the mechanics of calculating a running total than using PARTITION
. As such, it gives you another opportunity to understand INNER JOINS
and apply those concepts to another use case.
We solve this problem in three steps:
- Get rows for the running total
- Setup details for the running total using
inner join
s - Calculate the running total by summarizing data.
Let’s get started!
Step 1 – Get Rows for Running Total
To calculate the running total, we’ll query the CustomerTransactions
table. We’ll include the InvoiceID
, TransactionDate
, and TransactionAmount
in our result. Of course, the running total is calculated from the TransactionAmount
.
Here is the query to get the basic data.
SELECT InvoiceID
,TransactionDate
,TransactionAmount
FROM Sales.CustomerTransactions
WHERE TransactionTypeID = 1
ORDER BY TransactionDate
Here is the data we’ll be working with.
This step really is meant to get you acquainted with the basic information. You don’t really need to do it, but I sometimes like to run the basic query just to see the data and make sure there aren’t any anomalies or special situations I need to accommodate.
Step 2 – Setup Details for Running Total using Inner Joins
In this step, we’ll get the details setup so we can calculate the running total. We’ll do this by getting, for each InvoiceID
, the transaction amount and all transaction amounts before it.
To do this, we’ll join the CustomerTransactions
table to itself.
If we do this with no join
condition, we would get every combination of transactions, this is not what we want.
To ensure we get the proper combination of rows from each table, we’ll add two join
conditions. One to get each invoice and those prior to it (Bold).
The second ensures we only include invoices on the same transaction date (Italics).
SELECT T1.InvoiceID
,T2.InvoiceID
,T1.TransactionDate
,T1.TransactionAmount
,T2.TransactionAmount
FROM Sales.CustomerTransactions T1
INNER JOIN Sales.CustomerTransactions T2
ON T1.InvoiceID >= T2.InvoiceID
AND T1.TransactionDate = T2.TransactionDate
WHERE T1.TransactionTypeID = 1
ORDER BY T1.InvoiceID, T1.TransactionAmount
Let’s see how this operates.
The easiest condition to understand is where we match TransactionDate
. This ensures the invoices match have a common transaction date. If this was the only join
we did, we would be calculating a sub total for all transactions within a date.
Since we want to calculate the running total, we need to somehow obtain for each InvoiceID
the TransactionAmount
for the invoice and all invoices before it. In other words, return all matching rows where the invoice is greater than or equal to the corresponding invoices we are trying to total.
If you look at the result above, you’ll see that for each invoice listed in the first column (T1.InvoiceID
), It is greater than or equal to InvoiceID
s in the second column (T2.InvoiceID
).
This is a result of the join
condition T1.InvoiceID >= T2.InvoiceID
.
The result of this join
and the join
conditions is that we now have the raw materials to calculate the running total.
Notice how the first, third, and fourth columns repeat. We can use this to our advantage to summarize the result to arrive at the running total.
Step 3 – Calculate Running Total by Summarizing Rows
With the detailed information at hand, the final step is to summarize the rows. Doing so allows us to calculate the running totals.
Here is the query we use to perform the summary:
SELECT T1.InvoiceID
,T1.TransactionDate
,T1.TransactionAmount
,Sum(T2.TransactionAmount) RunningTotal
FROM Sales.CustomerTransactions T1
INNER JOIN Sales.CustomerTransactions T2
ON T1.InvoiceID >= T2.InvoiceID
AND T1.TransactionDate = T2.TransactionDate
WHERE T1.TransactionTypeID = 1
GROUP BY T1.InvoiceID
,T1.TransactionDate
,T1.TransactionAmount
ORDER BY T1.InvoiceID
,T1.TransactionAmount
Notice how we group by T1.InvoiceID
, T1.TransactionDate
, and T1.TransactionAmount
. These are the values which were repeated in our detailed data in step 2.
The Running Total is derived from T2.TransactionAmount
. Recall these values are TransactionAmount
from all Invoices prior to the invoice displayed. In other words, the invoice displayed is greater than or equal to them.
This allows us to build up a Running total.
Each subsequent invoice in the list is calculating its RunningTotal
value by summing up all TransactionAmount
from its Invoice and those prior to it.
Now that you have seen a traditional way to arrive at the running total, and perhaps gained a greater appreciation of how to use join
s and join
conditions to solve it, let’s look at one of the newer features of SQL, partitions, and see how they can be used to achieve the same result.
Calculate A Running Total in SQL using an OVER Clause
The OVER
clause is a very powerful statement. It allows you to define a set of rows, within a result set that an operation affects.
Much like OFFSET
and FETCH
allow us to retrieve a specific range of rows from a result set, the OVER
clause allows us to do a similar operation, relative to the current row, for a specific column.
Using OVER, we can define a window over a specified set of rows, to which we can apply functions, such as sum
.
For you to understand the concept, we’ll break this down into two steps:
- Partition data using the
OVER
clause. - Order Partitions with
Order
.
Let’s roll.
Step 1 – Partition Data using OVER Clause
When we say we want to create a running total for all Invoices within a TransactionDate
, we want to partition our data by TransactionDate
. To partition the data, we can use the over clause.
In the following statement, notice we SUM
the TransactionAmount
and after the SUM
, there is an OVER
clause.
Also notice there is no GROUP BY
clause. This is surprising, typically aggregate functions, such as SUM
, require a GROUP BY
clause; why is this the case?
Since we are using the OVER
clause, the SUM
is considered a window function – it operates upon any rows defined in the OVER
clause.
Here is the window function we’ll use:
SUM(TransactionAmount) OVER(PARTITION BY TransactionDate) RunningTotal
What make this a windows function is the OVER
clause. Checkout the part PARTITION BY
TransactionDate
. This means that the SUM
operates on all rows with the same TransactionDate
. This defines the window of rows the SUM
function affects.
Here is the query so far:
SELECT InvoiceID
,TransactionDate
,TransactionAmount
,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate) RunningTotal
FROM Sales.CustomerTransactions T1
WHERE TransactionTypeID = 1
ORDER BY InvoiceID
,TransactionAmount
Step 2 – Order Partitions with Order BY
Up to this point, we have partitioned the data and are able to calculate a subtotal for all TransactionAmount
values within a TransactionDate
. The next step is to now calculate the subtotal.
To do this, we can use ORDER BY
within the OVER
clause to define the “scope” of the window function. The ORDER BY
specified the logical order the window function operates.
Here is the window function we’ll use:
SUM(TransactionAmount) OVER(PARTITION BY TransactionDate ORDER BY InvoiceID) RunningTotal
The difference between this window function and that from the first step, is ORDER BY InvoiceID
. This specifies the logical order to process within the partition.
Without the ORDER BY
, the logical order is to wait until we are at the end of the window to calculate the sum. With the ORDER BY
specified, the logical order is to calculate a sum for each row including previous TransactionAmount
values within the window.
SELECT InvoiceID
,TransactionDate
,TransactionAmount
,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate ORDER BY InvoiceID) RunningTotal
FROM Sales.CustomerTransactions T1
WHERE TransactionTypeID = 1
ORDER BY InvoiceID
,TransactionAmount
Here is the result from running the query.
When you ran this query, did you notice how much faster it ran than the one using INNER JOIN
s? I was surprised. I know the INNER JOIN
operation consumes a lot of resources as the combinations or rows become large, but I would have thought it would have the same case for the solution using OVER
.
I would encourage you to look at each query’s query plan. You’ll start to learn quite a bit about SQL when you start to do this.