Custom Aggregates in Oracle

Wendelius

4.91/5 (8 votes)

19 Mar 2011CPOL5 min read

39.2K

154

How to create custom aggregates in Oracle using PL/SQL

Download source - 2.12 KB

Introduction

Even though Oracle has a wide variety of aggregates, not everything is implemented. For example, a basic product is lacking from the list. Oracle Data Cartridge helps to extend the functionality in the database and allows to create for example custom aggregates using PL/SQL. This article shows few simple examples of custom aggregates.

The First Aggregate, Product

Schema for Aggregates (Optional)

Before creating the aggregates, I chose to create a new schema. This is not mandatory but I felt that it would simplify administration if the types and functions that reside in a separate schema:

SQL

--------------------------------------------
-- Schema for aggregates
--------------------------------------------
CREATE USER CustomAggregates
   IDENTIFIED BY Calculations
   DEFAULT TABLESPACE Users
   TEMPORARY TABLESPACE TEMP;
 
GRANT CONNECT TO CustomAggregates;
GRANT RESOURCE TO CustomAggregates;

When running the user creation, change the default and temporary tablespaces suitable to your environment.

PL/SQL Implementation

Creating an aggregate requires to create a type including the header and the body. The header includes the variables needed to store intermediate values in the context and the functions which Oracle will call at runtime. The mandatory functions are:

ODCIAggregateInitialize: This function is called to create a new context for a set of values. The purpose of this static method is to return a new, initialized instance for the type.
ODCIAggregateIterate: Each value is passed to this method which is responsible for making the necessary calculations, etc.
ODCIAggregateMerge: This method is used when two contexts are merged into a single one. Merging happens if the original set of values are iterated using several different contexts.
ODCIAggregateTerminate: When all values have been handled and contexts are merged, this function returns the result.

Merging is best explained with a small (simplified) diagram:

The query processor may divide a set of values to a smaller subsets. When the aggregation is done, each group has its own instance of the type to handle the subset. Each instance is first initialized and the iteration is done for each value in the set. After iterations have been done, the context is merged to another context. Finally, when all the contexts have been merged, the aggregation is terminated and the result is returned to the consumer. For this reason, the aggregate must be designed to support operating on partial sets at iteration time.

The type definition for calculating the product looks like the following:

SQL

--------------------------------------------
-- Product
--------------------------------------------
-- Type definition
CREATE OR REPLACE TYPE CustomAggregates.ProductType AS OBJECT (
   mProduct    NUMBER,   -- product of elements
   mElements   NUMBER,   -- number of elements
     
   STATIC FUNCTION ODCIAggregateInitialize 
      (sctx        IN OUT CustomAggregates.ProductType)
      RETURN NUMBER,
   MEMBER FUNCTION ODCIAggregateIterate    
      (self        IN OUT CustomAggregates.ProductType,
       value       IN     NUMBER)       
      RETURN NUMBER,
   MEMBER FUNCTION ODCIAggregateMerge      
      (self        IN OUT CustomAggregates.ProductType,
       ctx2        IN     CustomAggregates.ProductType)
      RETURN NUMBER,
   MEMBER FUNCTION ODCIAggregateTerminate  
      (self        IN     CustomAggregates.ProductType,
      returnValue OUT    NUMBER, 
      flags       IN     NUMBER)
      RETURN NUMBER
);

The header defines two variables used for product calculation and all the mandatory functions. The body implements the logic:

SQL

-- Type implementation
CREATE OR REPLACE TYPE BODY CustomAggregates.ProductType IS
   STATIC FUNCTION ODCIAggregateInitialize 
      ( sctx        IN OUT CustomAggregates.ProductType) 
      RETURN NUMBER IS
   BEGIN
      sctx := ProductType( TO_NUMBER(NULL), 0);
      RETURN ODCIConst.Success;
   END;
   MEMBER FUNCTION ODCIAggregateIterate    
      ( self        IN OUT CustomAggregates.ProductType, 
        value       IN     NUMBER)                      
      RETURN NUMBER IS
   BEGIN
      IF (self.mElements = 0) THEN
         self.mProduct := value;
      ELSE
         self.mProduct := self.mProduct * value;
      END IF;
      self.mElements := self.mElements + 1;
      RETURN ODCIConst.Success;
   END;
   MEMBER FUNCTION ODCIAggregateMerge      
      ( self        IN OUT CustomAggregates.ProductType, 
        ctx2        IN     CustomAggregates.ProductType) 
      RETURN NUMBER IS
   BEGIN
     IF (self.mElements = 0) THEN
        -- no elements in this set, product is the product in the set to be merged
        self.mProduct := ctx2.mProduct;
     ELSIF (ctx2.mElements = 0) THEN
        -- no elements in ctx2 so let the product be as it is
        NULL;
     ELSE
        -- multiply products
        self.mProduct  := self.mProduct * ctx2.mProduct;
     END IF;
     self.mElements := self.mElements + ctx2.mElements;
     RETURN ODCIConst.Success;
   END;
   MEMBER FUNCTION ODCIAggregateTerminate  
      ( self        IN     CustomAggregates.ProductType, 
        returnValue OUT    NUMBER, 
        flags       IN     NUMBER)                      RETURN NUMBER IS
   BEGIN
      returnValue := self.mProduct;
      return ODCIConst.Success;
   END;
END;

In the ODCIAggregateInitialize, a new instance of ProductType is created. The mProduct variable is initialized with a null and the mElements with 0.

All the methods return either ODCIConst.Success or ODCIConst.Error depending on whether the operation has succeeded. The ODCIAggregateIterate method multiplies each new value with the existing product and increases the number of elements by 1. ODCIAggregateMerge multiplies products from both contexts if they have had values, otherwise it chooses the product from the context having values. And the ODCIAggregateTerminate function simply sets the return value.

The last step is to create the aggregate itself with CREATE FUNCTION statement:

SQL

-- Aggregate function
CREATE OR REPLACE FUNCTION CustomAggregates.Product (value NUMBER) RETURN NUMBER 
   PARALLEL_ENABLE AGGREGATE USING CustomAggregates.ProductType;

The function is defined with PARALLEL_ENABLE to let the optimizer know that calculation can be done in parallel. The AGGREGATE USING clause defines the type which implements the aggregate.

Let’s test this:

SQL

-- Test run 1
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 4 AS Value FROM DUAL UNION ALL
       SELECT 2 AS Value FROM DUAL UNION ALL
       SELECT 5 AS Value FROM DUAL) a;

The result is:

RESULT
----------
40

Well, that was somewhat expectable. What happens if there’s a null in the values.

SQL

-- Test run 2, null included
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 4    AS Value FROM DUAL UNION ALL
       SELECT 2    AS Value FROM DUAL UNION ALL
       SELECT NULL AS Value FROM DUAL UNION ALL
       SELECT 5    AS Value FROM DUAL) a;

The result is the same:

RESULT
----------
40

As the ANSI standard defines that nulls are ignored in aggregates such as SUM, MIN etc., they are not sent to the ODCIAggregateIterate at all. This is why nulls were not handled differently in the function. Note that this behaviour is different in Oracle 11g where nulls are actually passed to custom aggregates.

Let's have a final test with an empty set:

SQL

-- Test run 3, empty set
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 1 AS Value FROM DUAL WHERE 1=0) a;

The result is null:

RESULT
----------

Few More Aggregates, Geometric Mean and Harmonic Mean

These aggregates are very similar as the product. Geometric mean is defined:

Now, since this calculation cannot be done in whole in iteration (in this form), we calculate the product in iteration and merge steps and the final result is calculated in the terminate function. So the difference is in the ODCIAggregateTerminate function:

SQL

MEMBER FUNCTION ODCIAggregateTerminate
   ( self        IN     CustomAggregates.GeometricMeanType,
     returnValue OUT    NUMBER,
     flags       IN     NUMBER)
   RETURN NUMBER IS
BEGIN
   IF ( self.mElements = 0) THEN
      returnValue := NULL;
   ELSE
      returnValue := POWER(self.mProduct, 1 / self.mElements);
   END IF;
   return ODCIConst.Success;
END;

Harmonic mean is a bit different:

So, now instead of calculating the product, we summarize (1/value):

SQL

self.mSum := self.mSum + (1 / value);

Obviously, the merge only summarizes both contexts and the terminate returns:

SQL

returnValue := self.mElements / self.mSum;

Run the types and functions for both means from the script and then let’s test both of these:

SQL

-- Test run 1
SELECT CustomAggregates.GeometricMean(a.Value) AS Result
FROM ( SELECT 34 AS Value FROM DUAL UNION ALL
       SELECT 27 AS Value FROM DUAL UNION ALL
       SELECT 45 AS Value FROM DUAL UNION ALL
       SELECT 55 AS Value FROM DUAL UNION ALL
       SELECT 22 AS Value FROM DUAL UNION ALL
       SELECT 34 AS Value FROM DUAL) a;

Results:

RESULT
----------
34,54511

And then the harmonic mean:

SQL

-- Test run 1
SELECT CustomAggregates.HarmonicMean(a.Value) AS Result
FROM ( SELECT 34 AS Value FROM DUAL UNION ALL
       SELECT 27 AS Value FROM DUAL UNION ALL
       SELECT 45 AS Value FROM DUAL UNION ALL
       SELECT 55 AS Value FROM DUAL UNION ALL
       SELECT 22 AS Value FROM DUAL UNION ALL
       SELECT 34 AS Value FROM DUAL) a;

Is giving:

RESULT
----------
33,0179837

Analytic Functions

Custom aggregates can be used with analytic clauses. For example, if we divide the previous data to two different categories and we want to have the geometric mean for each category, the query looks like:

SQL

-- Test run 2, analytic functions
SELECT DISTINCT
       CustomAggregates.GeometricMean(a.Value) 
          OVER (PARTITION BY a.Cat) AS Result
FROM ( SELECT 1 AS Cat, 34 AS Value FROM DUAL UNION ALL
       SELECT 1 AS Cat, 27 AS Value FROM DUAL UNION ALL
       SELECT 1 AS Cat, 45 AS Value FROM DUAL UNION ALL
       SELECT 2 AS Cat, 55 AS Value FROM DUAL UNION ALL
       SELECT 2 AS Cat, 22 AS Value FROM DUAL UNION ALL
       SELECT 2 AS Cat, 34 AS Value FROM DUAL) a;

When this is run, the result is:

RESULT
----------
34,5213758
34,5688606

Using Other Datatypes as Parameters, Words-aggregate

Number is not the only datatype that can be used when creating aggregates. The last example is an aggregate which lists all the distinct words in varchar2 fields.

The implementation is:

SQL

-- Type implementation for Words
CREATE OR REPLACE TYPE BODY CustomAggregates.WordsType IS
   STATIC FUNCTION ODCIAggregateInitialize 
      ( sctx        IN OUT CustomAggregates.WordsType) 
      RETURN NUMBER IS
   BEGIN
      sctx := WordsType( ';', ';');
      RETURN ODCIConst.Success;
   END;
   MEMBER FUNCTION ODCIAggregateIterate    
      ( self        IN OUT CustomAggregates.WordsType, 
        value       IN     VARCHAR2)                   RETURN NUMBER IS
      nStart    NUMBER := 1;
      nPosition NUMBER := 1;
      sPortion  VARCHAR2(32767);
   BEGIN
      -- loop the string and search for delimiters
      WHILE nPosition <= LENGTH(value) LOOP
         IF SUBSTR(value, nPosition, 1) 
         IN (' ', '.', ';', '/', ':', ',', '!', '?', '(', ')') THEN
               sPortion := SUBSTR(value, nStart, nPosition - nStart + 1);
               sPortion := LTRIM( 
                  RTRIM( sPortion, ' ,.-;:_?=)(/&%¤#"!'), ' ,.-;:_?=)(/&%¤#"!');
               IF LENGTH(sPortion) > 0 THEN
                  IF INSTR(mWords, self.mListDelimiter 
                          || sPortion || self.mListDelimiter)= 0 THEN
                     mWords := mWords || sPortion || self.mListDelimiter;
                  END IF;
               END IF;
               nStart := nPosition + 1;
         END IF;
         nPosition := nPosition + 1;
      END LOOP;
      sPortion := SUBSTR(value, nStart, nPosition - nStart + 1);
      sPortion := LTRIM( RTRIM( sPortion, ' ,.-;:_?=)(/&%¤#"!'), ' ,.-;:_?=)(/&%¤#"!');
      IF LENGTH(sPortion) > 0 THEN
         IF INSTR(mWords, self.mListDelimiter || sPortion || _
		self.mListDelimiter) = 0 THEN
            mWords := mWords || sPortion || self.mListDelimiter;
         END IF;
      END IF;
      RETURN ODCIConst.Success;
   END;
   MEMBER FUNCTION ODCIAggregateMerge      
      ( self        IN OUT CustomAggregates.WordsType, 
        ctx2        IN     CustomAggregates.WordsType) 
      RETURN NUMBER IS
   BEGIN
     RETURN self.ODCIAggregateIterate(ctx2.mWords);
   END;
   MEMBER FUNCTION ODCIAggregateTerminate  
      ( self        IN     CustomAggregates.WordsType, 
        returnValue OUT    VARCHAR2, 
        flags       IN     NUMBER)                     
      RETURN NUMBER IS
   BEGIN
      returnValue := RTRIM(LTRIM_
	(self.mWords, self.mListDelimiter), self.mListDelimiter);
      return ODCIConst.Success;
   END;
END;

Basically, the iterate function goes char by char and searches for delimiters. If a word is found, it’s checked against the existing word list and if it’s not present, it will be added. The result of this aggregate is a list of words delimited by semicolon (;).

To test this, let’s take a few simple character strings:

SQL

-- Test run 1, case sensitive
SELECT CustomAggregates.Words(a.Value) AS Result
FROM ( SELECT 'This is the first string'      AS Value FROM DUAL UNION ALL
       SELECT 'And this is the second string' AS Value FROM DUAL) a;

The result is:

RESULT
--------------------------------------------------------------------------------
This;is;the;first;string;And;this;second

So the result is case sensitive since the word This is listed twice. If we want to get the list case insensitively, execute:

SQL

-- Test run 2, case insensitive
SELECT CustomAggregates.Words(LOWER(a.Value)) AS Result
FROM ( SELECT 'This is the first string'      AS Value FROM DUAL UNION ALL
       SELECT 'And this is the second string' AS Value FROM DUAL) a;

And the result is:

RESULT
--------------------------------------------------------------------------------
this;is;the;first;string;and;second

And finally with a little more complex input:

SQL

-- Test run 3, Sentences
SELECT CustomAggregates.Words(LOWER(a.Value)) AS Result
FROM ( SELECT 'This is the first sentence. And the second: Is this'    AS Value 
          FROM DUAL UNION ALL
       SELECT '"quote" from somewhere; And the second sentence again!' AS Value 
          FROM DUAL UNION ALL
       SELECT 'Cursing #!#%# not allowed :)'                           AS Value 
         FROM DUAL) a;

Now the result is:

RESULT
--------------------------------------------------------------------------------
this;is;the;first;sentence;and;second;quote;from;somewhere;again;cursing;not;allowed

Final Words

That’s it for now and I hope you found something new. I’d be grateful if you would have the extra time for comments and votes. Thank you.

History

19^th March, 2011: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)