(untagged)

Cachalot DB as a Distributed Cache with Unique Features

Dan Ionescu (USINESOFT)

0.00/5 (No votes)

16 Apr 2019

LINQ extensions that allow to describe and to safely query cached data

Introduction

This article is about using Cachalot DB as a distributed cache (without persistence).

For a general presentation of Cachalot DB, see Part 1, Part 2, Part 3.

Cachalot DB is a fully open-source project available here:

https://github.com/usinesoft/Cachalot

Latest release including full documentation:

https://github.com/usinesoft/Cachalot/releases/latest

Serving Single Objects From a Cache

The most frequent use-case for a distributed cache is to store objects identified by one or more unique keys.

A database contains the persistent data and, when an object is accessed, we first try to get it from the cache and, if not available, load it from the database. Most usually, if the object is loaded from the database, it is also stored in the cache for later use.

Item = cache.TryGet(itemKey)
If Item found 
     return Item
Else
     Item = database.Load(itemKey)
     cache.Put(Item)
     return Item

By using this simple algorithm, the cache is progressively filled with data and its “hit ratio” improves over time.

This cache usage is usually associated with an “eviction policy” to avoid excessive memory consumption. When a threshold is reached (either in terms of memory usage or object count), some of the objects from the cache are removed.

The most frequently used eviction policy is “Least Recently Used” abbreviated LRU. In this case, every time an object is accessed in the cache, its associated timestamp is updated. When eviction is triggered, we remove the objects with the oldest timestamp.

Using cachalot as a distributed cache of this type is very easy.

First, disable persistence (by default it is enabled). On every node in the cluster, there is a small configuration file called node_config.json. It usually looks like this:

{
  "IsPersistent": true,
  "ClusterName": "test",
  "TcpPort": 6666,
  "DataPath": "root" 
}

To switch a cluster to pure cache mode, simply set IsPersistent to false on all the nodes. DataPath will be ignored in this case.

Example of client code with LRU eviction activated:

public class TradeProvider
    {
        private Connector _connector;
        
        public void Startup(ClientConfig config)
        {
            _connector = new Connector(config);
            var trades = _connector.DataSource<Trade>();
            // remove 500 items every time the limit of 500_000 is reached
            trades.ConfigEviction(EvictionType.LessRecentlyUsed, 500_000, 500);
        }
        
        public Trade GetTrade(int id)
        {
            var trades = _connector.DataSource<Trade>();
            var fromCache = trades[id];
            if (fromCache != null)
            {                
                return fromCache;
            }
            var trade = GetTradeFromDatabase(id); 
            trades.Put(trade);
            return trade;
        }
        public void Shutdown()
        {
            _connector.Dispose();
        }                
    }

Eviction is configured by data type. Each data type can have a specific eviction policy (or none).

Every decent distributed cache on the market can do this. But Cachalot can do much more.

Serving Complex Queries From a Cache

The single-object access mode is useful in some real-world cases like storing session information for web sites, partially filled forms, blog articles and much more.

But sometimes, we need to retrieve a collection of objects from a cache with a SQL-like query.

And we would like the cache to return a result only if it can guarantee that all the data concerned by the query is available in the cache.

The obvious issue here is: How do we know if all data is available in the cache?

First case: All Data in the Database Is Loaded Into the Cache

In the simplest (but not the most frequent) case, we can guarantee that all data in the database is also in the cache. This requires that RAM is available for all the data in the database.

The cache is either preloaded by an external component (for example, each morning) or it is lazily loaded when we first access it.

Two new methods are available in the DataSource class to manage this use-case.

A LINQ extension: OnlyIfComplete. When we insert this method in a LINQ command pipeline, it will modify the behavior of the data source. It returns an IEnumerable only if all data is available and it throws an exception otherwise.
A new method to declare that all data is available for a given data type: DeclareFullyLoaded (member of DataSource class)

Here is a code example extracted from a unit test:

var dataSource = connector.DataSource<ProductEvent>();
dataSource.PutMany(events);

// here an exception will be thrown
Assert.Throws<CacheException>(() =>     
     dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList()
);

// declare that all data is available
dataSource.DeclareFullyLoaded();

// here it works fine
var fixings = dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList();
Assert.Greater(fixings.Count, 0);

// declare that data is not available again
dataSource.DeclareFullyLoaded(false);

// an exception will be thrown again
Assert.Throws<CacheException>(() =>     
     dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList()
);

Second Case: A Subset of the Database Is Loaded Into the Cache

For this use-case, Cachalot provides an inventive solution:

Describe preloaded data as a query (expressed as LINQ expression)
When data is queried from the cache, determine if the query is a subset of the preloaded data

The two methods (of class DataSource) involved in this process are:

The same OnlyIfComplete LINQ extension
DeclareLoadedDomain method. Its parameter is a LINQ expression that defines a subdomain of the global data.

Example 1: In the case of a renting site like Airbnb, we would like to store in cache all properties in the most visited cities.

homes.DeclareLoadedDomain(h=>h.Town == "Paris" || h.Town == "Nice");

Then, this query will succeed as it is a subset of the specified domain.

var result = homes.Where( h => h.Town == "Paris" && h.Rooms >= 2)
  .OnlyIfComplete().ToList();

But this one will throw an exception.

result = homes.Where(h => h.CountryCode == "FR" && h.Rooms == 2)
    .OnlyIfComplete().ToList()

If we omit the call to OnlyIfComplete, it will simply return the elements in the cache that match the query.

Example 2: In a trading system, we want to cache all the trades that are alive (maturity date >= today) and all the ones that have been created in the last year (trade date > one year ago)

var oneYearAgo = DateTime.Today.AddYears(-1);
var today = DateTime.Today;

trades.DeclareLoadedDomain(
 t=>t.MaturityDate >= today || t.TradeDate > oneYearAgo
);

Then these queries will succeed as they are subsets of the specified domain.

var res =trades.Where(
   t=>t.IsDestroyed == false && t.TradeDate == DateTime.Today.AddDays(-1)
).OnlyIfComplete().ToList();

res = trades.Where(
   t => t.IsDestroyed == false && t.MaturityDate == DateTime.Today
).OnlyIfComplete().ToList();

But this one will throw an exception.

trades.Where(
    t => t.IsDestroyed == false && t.Portfolio == "SW-EUR"
).OnlyIfComplete().ToList()

Domain declaration and eviction policy are of course mutually exclusive on a datatype. Automatic eviction would make data incomplete.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here