MDX query. Industrial version.
Scope
Article covers: MDX query, recursive formula, ratio, TopSum(), Filter(), Order(), Sum(), Head(), Rank(), IIF()
.
Introduction
TopSum(set, threshold, sorting and thresholding expression) Returns: set
TopSum()
selects the top tuples of a set whose sum meets a threshold, i.e. returns the subset of set, after sorting it, such that the sum of the cells (or numeric value expression, if supplied) is at least value. This function always breaks the hierarchy.
Thus, it accumulates and sorts using the same value, value which is passed as third parameter.
A standard function to order set by one thing and accumulate another is absent. This article provides an approach how to do that.
The side product is accumulated sum formula.
Background
This article doesn't assume any prior experience with MDX and Microsoft Analysis Services. But you should be able to run a query in SQL Server Analysis Services query window and use on-line MSDN documentation.
Sample
We make a sample using standard datawarehouse 'Adventure Works DW' to not bore ourselves by downloading/restoring/deploying of sample database/datawarehouse. Generally, 'Adventure Works DW' is installed on a computer where Business Intelligence projects are developed as test area, although it is always available for downloading on the Internet.
We get a query to get the total product cost and respective gross revenue as a base.
With Member [Measures].[Gross Revenue] as
'([Measures].[Internet Sales Amount] - [Measures].[Internet Total Product Cost])
/[Measures].[Internet Total Product Cost]'
, FORMAT_STRING = '0.00%'
select
{[Measures].[Internet Total Product Cost], [Measures].[Gross Revenue]} on COLUMNS
, [Product].[Product].[Product].Members on ROWS
from [Adventure Works]
Then let's suppose that [Internet Total Product Cost] is cost of product to buy, and [Gross Revenue] is criteria to make decision about what product to buy first (most criteria value indicates most urgent buying). And each day we have got some cash limit to spend on purchasing. Such a situation is quite possible in procurement department, for instance. So the task is to determine this list of products to buy from the whole amount of requests.
MDX Query
The ideal solution is to order set, then get sum of all members from first till current in this set, and then compare with threshold.
With Member [Measures].[Gross Revenue] as
'[Measures].[Internet Sales Amount]/[Measures].[Internet Total Product Cost]-1'
, FORMAT_STRING = '0.00%'
Set [Ordered by Gross Revenue] as
'Order([Product].[Product].[Product].Members, [Measures].[Gross Revenue], DESC)'
Set [TopByThreshold] as
'Filter (
[Ordered By Gross Revenue] as [SA],
Sum (
Head (
[SA],
Rank (
[SA].Current,
[SA]
)
),
[Measures].[Internet Total Product Cost]
) = 2500000
)'
select
{[Measures].[Internet Total Product Cost], [Measures].[Gross Revenue]} on COLUMNS
, [TopByThreshold] on ROWS
from [Adventure Works]
At first, we sort product by [Measures].[Gross Revenue]
in descending order, from most necessary to least.
Set [Ordered by Gross Revenue] as
'Order([Product].[Product].[Product].Members, [Measures].[Gross Revenue], DESC)'
Then we select interesting products using determined ordered set and calculated accumulated sum of products cost.
Filter (
[Ordered By Gross Revenue] as [SA],
"Accumulated product cost" = 2500000
)
What is going on in the summarize part of the query?
We get rank (number) of current product in our ordered by criteria set.
Rank (
[SA].Current,
[SA]
)
By the way, this function is the 2nd level of nesting and at that point, the named sets are not available, therefore we have to determine an alias to send context of sorted set to lower levels. Instead, we caught error message 'The CURRENT function cannot be invoked in this context because the 'Ordered By Gross Revenue' set is not in the scope.' This problem is solved with aid of alias ([SA] - set alias). Additionally, alias 'keeps' all properties of set, including current position.
We get all top products from first in ordered set till current with the aid of Head
function. Rank indicates last interesting product by its number.
Head (
[SA],
"Quantity of interesting products at current point"
)
Finally we summarize products accumulated sum of selected products.
Sum (
"All products from one with highest criteria to current",
[Measures].[Internet Total Product Cost]
)
The query above is a quick metaphrase. That's just a proof of solution existence. Its execution takes 3 seconds. The next step is optimization.
Industrial Version
The main point of performance loss is 'accumulative sum'. Let's accelerate it using recursive member.
Member [Measures].[Accumulated Cost] as
'IIF([Product].[Product].CurrentMember IS [Ordered by Gross Revenue].Item(0)
,
[Measures].[Internet Total Product Cost],
[Measures].[Internet Total Product Cost] +
([Measures].[Accumulated Cost], [Ordered by Gross Revenue].Item(
Rank([Product].[Product].CurrentMember, [Ordered by Gross Revenue]) - 2).Item(0)
)
)'
The logic is: if this is the first member in a sorted set ([Ordered by Gross Revenue].Item(0)
then we suppose first product cost as accumulative sum of interesting products cost, else we take cost of current product plus accumulative sum of previous to current in sorted set product. We use long notation:
[Ordered by Gross Revenue].Item(
Rank([Product].[Product].CurrentMember, [Ordered by Gross Revenue]) - 2).Item(0)
to refer to the previous product instead of the more readable and shorter one:
[Product].[Product].CurrentMember.PrevMember
because .PrevMember
refers to previous member in [Product].[Product].Members
set instead of [Ordered by Gross Revenue]
set.
Rank
returns one-based index, .Item()
takes zero-based index instead. Thus, we subtract 2 to refer to the previous member (product).
Finally, the query takes view as follows:
With
Member [Measures].[Gross Revenue] as
'[Measures].[Internet Sales Amount]/[Measures].[Internet Total Product Cost]-1'
, FORMAT_STRING = '0.00%'
Set [Ordered by Gross Revenue] as
'Order([Product].[Product].[Product].Members, [Measures].[Gross Revenue], DESC)'
Member [Measures].[Accumulated Cost] as
'IIF([Product].[Product].CurrentMember IS [Ordered by Gross Revenue].Item(0)
,
[Measures].[Internet Total Product Cost],
[Measures].[Internet Total Product Cost] +
([Measures].[Accumulated Cost], [Ordered by Gross Revenue].Item(
Rank([Product].[Product].CurrentMember, [Ordered by Gross Revenue]) - 2).Item(0)
)
)'
Set [TopUntilThreshold] as
'Filter (
[Ordered By Gross Revenue],
[Measures].[Accumulated Cost] = 2500000
)'
select
{[Measures].[Internet Total Product Cost], [Measures].[Gross Revenue]} on COLUMNS
, [TopUntilThreshold] on ROWS
from [Adventure Works]
Performance of this version is less than one second. It is acceptable.
Instead of Conclusion
TopSum()
does not support accumulating by value differently of sorting value and vice versa. But, honestly, cases where this is necessary are rare. I spent more time to find an example than to solve this problem.
History
- 19th November, 2009: Initial post