Generic Lazy Evaluation in C++

Ali AslRousta

4.69/5 (16 votes)

19 Nov 2013CPOL6 min read

51.8K

206

How to implement a generic lazy evaluation class using C++11 features.

Download lazy.zip - 1 KB

Introduction

It's been a long time I was busy coding scientific computation toolsets for academic purposes. For research matters and fast coding, C# and F# made a great choice of implementation language. They are powerful and really fast and back-boned by the great .NET Framework. Parallelization is neat and clean, and there are a bunch of free components for data visualization. But a while ago, the problem arose. Customer was willing to order a scientific analysis software doing tons and tons of vector computations in a single iteration, and performance was a significant requirement. So I inclined to the elder of performance, C++, and its recently born child, C++11 standard. Everything was going well. As I expected, the most challenging matter was memory allocation, since beside memory leaks, sequences of news and deletes can cause a big descent. Memory re-use and pre-allocation came handy and decreased running time significantly. But there was something missing and I knew it: do not compute what's not needed and compute once and only once.

Background

Throughout this article, due to simplicity, I'm using an elementary example, vector class, but the reader must be aware of more complex problems. My first implementation of vector class was something like this:

C++

class vector {
  private:
    float _x, _y;
  public:
    vector(float x, float x) : _x(x), _y(y) { }
    float  get_x() const { return _x; }
    float  get_y() const { return _y; } 
    float  get_length() const { return sqrtf(_x * _x + _y * _y); }
    vector get_unit() const { return *this / get_length(); }
    // Some operator overloading for vector computations.
};

This is a simple implementation, but as can be seen, there's a problem: get_length and get_unit methods may be called multiple times for a single vector object during computations, and so, their values are computed over and over again. Since vector class is immutable, the easy solution is pre-computing values. So, I changed the vector class as follows:

C++

class vector {
  private:
    float _x, _y;
    float _length; 
    vector _unit; 
    void precompute() {
      _length = sqrtf(_x * _x + _y * _y);
      _unit = *this / _length; 
    }  
  public: 
    vector(float x, float y) : _x(x), _y(y) {
      precompute(); 
    }
    float   get_x() const { return _x; }
    float   get_y() const { return _y; }
    float   get_length() const { return _length; }
    vector& get_unit() const { return _unit; }
    // Some operator overloading for vector computations.
};

Now it works and it's much better. But there goes a question: is it necessary to compute all of them at once, and specially, at the class construction process? However, how can we automate pre-computing process and make the code more cleaner?

At this point, I was thinking of lazy evaluation in C# language and Lazy<T> class. But there's no such feature in C++ and not even in Standard Template Library. I found some close implementation of lazy evaluation in boost library, but I chose to not use such a big library for a small feature, and then, I decided to implement a small one myself.

Lazy Evaluation

I needed an equivalent for .NET Lazy class, which I was used to, and I needed it to be generic. As I thought, it was not that hard and I quickly coded lazy<T> class:

C++

template<typename T>
class lazy {
  public:
    lazy() : m_initiator(default_initiator), m_initialized(false) { }
    lazy(std::function<T()> initiator) : m_initiator(initiator), m_initialized(false) { }
    T& getValue() {
      if (!m_initialized) {
        m_value = m_initiator();
        m_initialized = true;
      }
      return m_value;
    }
    operator T() {
      return getValue();
    }
    T& operator* () {
      return getValue();
    }
    lazy<T>& operator= (const lazy<T>& other) {
      m_initiator   = other.m_initiator;
      m_initialized = false;
      return *this;
    }
  private:
    static T default_initiator() {
      throw std::runtime_error("No lazy evaluator given.");
    }
    std::function<T()> m_initiator;
    T    m_value;
    bool m_initialized;
};

This way, it let me define a lazily evaluated variable using a lambda expression as its evaluator:

C++

auto pi        = lazy<float>([]() { return 3.141592; });
auto area      = *pi * r * r;
auto perimeter = *pi * 2 * r;

The value of pi is not computed since area is computed. And perimeter uses its previously computed value and it's not computed again. Finally, I simply rewrote the vector class using lazy class.

C++

class vector {
  private:
    float _x, _y;
    lazy<float>  _length; 
    lazy<vector> _unit; 
    void initialize() {
      _length = lazy<float>([this]() { return sqrtf(_x * _x + _y * _y); });
      _unit   = lazy<vector>([this]() { return *this / _length; });
    }  
  public: 
    vector(float x, float y) : _x(x), _y(y) {
      initialize(); 
    }
    float   get_x() const { return _x; }
    float   get_y() const { return _y; }
    float   get_length()  { return *_length; }
    vector& get_unit()    { return *_unit; }
    // Some operator overloading for vector computations.
};

Note that you cannot use const keyword for get_length and get_unit methods, cause their values are changed during first access. It was very clean and beautiful and was something I was after. I was really proud of myself like a hero, but suddenly my hooray didn't last long. When I tried to compile it by Visual Studio build tool, the compiler gave me C2079 error: 'lazy<T>' uses undefined class 'vector'.

I was really confused. From my C# background, anything was done right, and the error itself did not give me enough information to solve it.

Solution

It took me some time of struggling with the code and I was really upset. I, even, decided to fall back and use the trivial plan through the entire project. I took the last chance and hopefully found the source of the issue by a hint given in a post in stackoverflow.com site. Can you see it? Well, it's not that trivial. It has something to do with stack allocation of variables and dependencies.

I really prefer static (stack) allocation on dynamic allocation in C++, and I assure you, new and delete are evils. So, I prefer to define member variables and class instances as automatic variables, not pointers. Now, let's see what happens. C++ compilers allocate an automatic class instance and its members as a whole on stack, and thus, they need some information about class size in bytes at compile time. As of vector class, let's compute its size. vector class has four members: two of type float, one of type lazy<float>, and one of type lazy<vector>. The size of two float members are well-known. For lazy<float> type, we further need to compute a generic lazy<T> class with T defined as to be float. That's not a problem, too. Now, we have lazy<vector> class left. Here's where the problem goes. The size of lazy<vector> depends on vector, and vice versa, the size of vector depends on lazy<vector>. So we have a double dependency here which simply goes to an infinite recursion. It's now easy to understand why languages like C# and Java do not face such a problem, because class instances in managed languages are always reference types and dynamically allocated.

When the problem is well-defined, the solution is easy. What Microsoft offers is to simply use pointers and dynamic allocation, and that means more news and deletes which I was not after. So, I chose the hard way, and needed to break one of the dependency lines by breaking vector class. What follows is the result:

C++

class __vector {
  public: 
    __vector(float x, float y) : m_x(x), m_y(y) { initialize(); } 
    float get_x() const { return m_x; }  
    float get_y() const { return m_y; } 
    float get_length() { return *m_length; } 
    // Some operator overloading for vector computations. 
  private: 
    void initialize() { 
      m_length = lazy<float>([this]() { return sqrtf(m_x * m_x + m_y * m_y); });
    }
    float       m_x, m_y;
    lazy<float> m_length;
};
         
class vector : public __vector {  
  public: 
    vector(float x, float y) : __vector(x, y) { initialize(); }  
    vector get_unit() { return vector(*m_unit); }
    // Some operator overloading for vector computations. 
  private: 
    void initialize() { 
      m_unit = lazy<__vector>([this]() { return *this / get_length(); }); 
    } 
    lazy<__vector> m_unit; 
};

Now, both lazy<vector> and vector classes depend on __vector class to compute their size. This code compiles and works successfully as expected.

Well, it's not as clean as I want, but, it's simple enough to consider, specially when you get used to it. By the way, do not blame C++ and remember: no pain, no gain!

Thread Safety

The above implementation of lazy<T> class is good a for single-threaded application. Due to overcome thread-safety issues, some minor changes have to be made. Here, I've used the STL mutex class for controlling access to private members.

C++

lazy<T>& operator= (const lazy<T>& other) { 
  m_lock.lock();
  m_initiator   = other.m_initiator;
  m_initialized = false;
  m_lock.unlock();
  return *this;
}

C++

T& get_value() {
  if (!m_initialized) {
    m_lock.lock();
    if (!m_initialized) {
      m_value       = m_initiator();
      m_initialized = true;
    }
    m_lock.unlock();
  }
  return m_value;
}

Points of Interest

Always listen to your heart. Never do anything in a way that you don't like or feel that it's wrong. Hear professionals' voice, but don't be afraid of overtaking cliches. Using open source components is the right way to do things, but one cannot learn walking before falling many times. These are the key points that make you a valuable programmer after becoming a good one.

I hope that this article is useful for you, but, always think of bigger problems. Feel free to share your opinions. Critics are most welcome.

History

First version written on November 14, 2013
Bug fixed thanks to Stefan_Lang on November 19, 2013
Thread Safety section added on November 19, 2013
Some typos and bugs fixed on November 20, 2013

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)