Generic Memoization Infrastructure

Evgeny Zavalkovsky

4.79/5 (17 votes)

1 Mar 2016CPOL5 min read

19.2K

154

`Memoization' of a computation result makes computation faster by trading space for time. Here you will see extremely simple and easy to use Memoization Infrastructure.

Download source - 1.6 KB

Introduction

Memoization - In computing, memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again © Wikipedia (http://en.wikipedia.org/wiki/Memoization).

Memoization is an essential part of Dynamic Programming. The idea is simple - once you computed something, store the result, and next time, when you will be asked to compute the same, just use the result. Here we need to clarify two basic conditions: Computation result is absolutely deterministic, and depends solely on input parameters.

Pseudo Code

Usually, simple idea may be realized by the simple design and simple implementation. In case of memorization, that's also truth. In Wikipedia (http://en.wikipedia.org/wiki/Memoization), we even may find a pseudo code that implements automatic memoization.

C++

function memoized-call (F is a function object parameter)
    if F has no attached array values then
       allocate an associative array called values;
       attach values to F;
    end if;

    if F.values[arguments] is empty then
       F.values[arguments] = F(arguments);
    end if;

    return F.values[arguments];
 end function

But what about implementation in C++? Things are much more complex here, because in a real life, we are operating with real types. Associative array looks very simple when you write pseudo code. However, when you write C++ code, you have to know the types. Type of the Key and type of the Value. In our case, Key is a set of all parameters that are used as an input in computation, and Value is a computation result.

So, how will we build the infrastructure for Memoization? The answer is almost immediate - we will use Templates. The straightforward way is just to provide list of the types, that will consist of Type of the computation result (Value in our associative array) and one, or more Types of input parameters (Key in our associative array). Another "nice to have" feature is single API for memoized and not memoized version. So far, so good, but API for our infrastructure, will require to provide all these types, that in my opinion, will be a little bit annoying and non user friendly. Moreover, we will need to provide, as an additional parameter, function that will implement the original computation logic (to support single API feature).

May we do better? We will try :) And we will succeed!

API and Example

Let's assume that we have some template class Memoize that should be used like in the following example.

Here we have struct MagicCircle with some method Func, that gets as parameters Coordinates and String, and after very long and complex mathematical calculations return Radius and Color. Don't try to find any logic in the math calculations, that's not the point here. We just want to demonstrate, using that example, with a non trivial input types (std::pair and std::string) and non trivial return type (some struct), how easy is memoization infrastructure usage.

C++

#pragma once

#include <cstdint>
#include <utility>    
#include <string>
#include <limits>
#include <random>

#include "Memoize.h"

enum class Color { RED, GREEN, BLUE };
using Coordinates = std::pair < uint32_t, uint32_t >;

struct MagicColorResult
{
    uint32_t radius;
    Color color;
};

struct MagicCircle : public Memoize< MagicCircle >
{
    static MagicColorResult Func(Coordinates coord, std::string str)
    {
        // Do not try to understand internals of this method
        // That just represents some long and complex calculations.
        
        /************************************************************/
        /**/ MagicColorResult res;
        /**/
        /*  LONG AND COMPLEX MATHEMATICAL CALCULATIONS .....
        /*     ................................................
        /*     ................................................
        /*     ................................................
        /*     ................................................
        /*     ................................................
        /*     ................................................
        */
        /**/ std::random_device rd;
        /**/ std::mt19937 gen(rd());
        /**/ std::uniform_int_distribution<uint32_t>
        /**/ dis(std::numeric_limits<uint32_t>::lowest(),
        /**/     std::numeric_limits<uint32_t>::max()
        /**/    );
        /**/
        /*   After long and complex math we set the resut; */
        /**/ res.radius = dis(gen);
        /**/ res.color  = Color::GREEN;
        /************************************************************/

        return res;
    }
};

So, let's see what we have here:

In definition of struct MagicCircle, we inherit from class Memoize, when as a template parameter we provide MagicCircle itself.
We implement static method Func, that gets some(many) parameters, and returns computation result of some type.

Those are all requirements for Memoization Infrastructure usage. To inherit from class Memoize (and provide template parameter), and to implement static method Func.

That's it.

We don't care what and how many parameters Func will get as an input, and we don't care what the output type is.

Now, let's look at the usage:

MagicColorResult res1 = Memoize<MagicCircle>::Func(Coordinates{7,2}, std::string(" Original Dummy "));
MagicColorResult res2 = Memoize<MagicCircle>::Func(Coordinates{7,2}, std::string(" Another Dummy  "));
MagicColorResult res3 = Memoize<MagicCircle>::Func(Coordinates{7,2}, std::string(" Original Dummy "));

For calling MagicCircle computation, using Memoization infrastrucutre, we just need to call not the MagicCircle::Func( ... ) method itself, but the Memoize<MagicCircle>::Func( ... ). And the magic will happened: For the two first calls, with the "first time coming" parameters actual calculation will be executed. But for the third call, with parameters identical to the first call, no computation will be performed, and memoized result will be returned.

Implementation

That's the time to move to the actual implementation of Memoize.

C++

#pragma once

#include <tuple>
#include <map>

template < typename T > struct Memoize
{
    template <typename... Args> static auto Func(Args&&... args) 
                          -> decltype(T::Func(std::forward<Args>(args)...))
    {
        using INPUT = std::tuple<Args...>;

        using RESULT = decltype(T::Func(std::forward<Args>(args)...));

        static std::map< INPUT, RESULT > repository;

        auto key = std::make_tuple(std::forward<Args>(args)...);

        auto it = repository.find(key);

        if (it != repository.end())
        {
            return it->second;
        }

        else
        {
            auto result = T::Func(std::forward<Args>(args)...);

            repository[key] = result;

            return result;
        }
    }
};

As we saw, Memoize gets, as a template parameter, class with a single requirement - this class must have static method Func.

Internally, Memoize define his own static method Func, with input parameters type(s), and output parameter type, that are identical to the original Func, from the template parameter class. To achieve that, we use Variadic Template (C++11) for input parameters, and decltype (C++11) for output type. No type is passed explicitly. Compiler does this job for us - all types( input and output ) for Memoize::Func are deducted by compiler from derived class Func definition.

The implementation itself is pretty similar to the pseudo code. We store results of calculation, indexed by combination of input parameters, in associative array (std::map in our case). On Func call, we look if we already calculated exactly the same parameters, by looking in associative array indexed by combination of input parameters. If we found the result, we just return that, w/o actual calculation. If we didn't - we call the original Func, which processes the actual computation, and stores the result.

The only detail that may be interested here is indexing itself. For indexing, we use std::tuple, based on input types. Just to remind, order operation on tuple is defined as lexicographic order on tuple types. So it is important that every particular type from input parameters will have comparison operator.

Recursive Example

As I wrote, Memoization is an essential part of Dynamic Programming. Here is an example how to use Memoize infrastructure for Dynamic Programming Factorial computation.

C++

#pragma once

#include <cstdint>

#include "Memoize.h"

class Factorial : public Memoize<Factorial>
{
public:
    static uint32_t Func(int i)
    {
        if (i == 0)
        {
            return 1;
        }

        return i * Memoize::Func(i - 1);
    }
};

Please note that for recursive call, we should call Memoize::Func implementation of Func method, and Func implementation from Factorial class itself.

And external usage is similar to the previous example:

C++

Memoize<Factorial>::Func(10);
Memoize<Factorial>::Func(11);

For the second call, only one calculation, 10*11, will be executed.

Remarks

As we see, some advanced C++11 features have been used in this implementation. Please be aware that not all compilers have a full C++11 support. This code has been implemented and tested using MSVC 2014 CTP compiler.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)