Introduction
This article is about using Cachalot DB as a distributed cache (without persistence).
For a general presentation of Cachalot DB, see Part 1, Part 2, Part 3.
Cachalot DB is a fully open-source project available here:
Latest release including full documentation:
The most frequent use-case for a distributed cache is to store objects identified by one or more unique keys.
A database contains the persistent data and, when an object is accessed, we first try to get it from the cache and, if not available, load it from the database. Most usually, if the object is loaded from the database, it is also stored in the cache for later use.
Item = cache.TryGet(itemKey)
If Item found
return Item
Else
Item = database.Load(itemKey)
cache.Put(Item)
return Item
By using this simple algorithm, the cache is progressively filled with data and its “hit ratio” improves over time.
This cache usage is usually associated with an “eviction policy” to avoid excessive memory consumption. When a threshold is reached (either in terms of memory usage or object count), some of the objects from the cache are removed.
The most frequently used eviction policy is “Least Recently Used” abbreviated LRU. In this case, every time an object is accessed in the cache, its associated timestamp is updated. When eviction is triggered, we remove the objects with the oldest timestamp.
Using cachalot as a distributed cache of this type is very easy.
First, disable persistence (by default it is enabled). On every node in the cluster, there is a small configuration file called node_config.json. It usually looks like this:
{
"IsPersistent": true,
"ClusterName": "test",
"TcpPort": 6666,
"DataPath": "root"
}
To switch a cluster to pure cache mode, simply set IsPersistent
to false
on all the nodes. DataPath
will be ignored in this case.
Example of client code with LRU eviction activated:
public class TradeProvider
{
private Connector _connector;
public void Startup(ClientConfig config)
{
_connector = new Connector(config);
var trades = _connector.DataSource<Trade>();
trades.ConfigEviction(EvictionType.LessRecentlyUsed, 500_000, 500);
}
public Trade GetTrade(int id)
{
var trades = _connector.DataSource<Trade>();
var fromCache = trades[id];
if (fromCache != null)
{
return fromCache;
}
var trade = GetTradeFromDatabase(id);
trades.Put(trade);
return trade;
}
public void Shutdown()
{
_connector.Dispose();
}
}
Eviction is configured by data type. Each data type can have a specific eviction policy (or none).
Every decent distributed cache on the market can do this. But Cachalot can do much more.
The single-object access mode is useful in some real-world cases like storing session information for web sites, partially filled forms, blog articles and much more.
But sometimes, we need to retrieve a collection of objects from a cache with a SQL-like query.
And we would like the cache to return a result only if it can guarantee that all the data concerned by the query is available in the cache.
The obvious issue here is: How do we know if all data is available in the cache?
In the simplest (but not the most frequent) case, we can guarantee that all data in the database is also in the cache. This requires that RAM is available for all the data in the database.
The cache is either preloaded by an external component (for example, each morning) or it is lazily loaded when we first access it.
Two new methods are available in the DataSource
class to manage this use-case.
- A LINQ extension:
OnlyIfComplete
. When we insert this method in a LINQ command pipeline, it will modify the behavior of the data source. It returns an IEnumerable
only if all data is available and it throws an exception otherwise. - A new method to declare that all data is available for a given data type:
DeclareFullyLoaded
(member of DataSource
class)
Here is a code example extracted from a unit test:
var dataSource = connector.DataSource<ProductEvent>();
dataSource.PutMany(events);
Assert.Throws<CacheException>(() =>
dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList()
);
dataSource.DeclareFullyLoaded();
var fixings = dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList();
Assert.Greater(fixings.Count, 0);
dataSource.DeclareFullyLoaded(false);
Assert.Throws<CacheException>(() =>
dataSource.Where(e => e.EventType == "FIXING").OnlyIfComplete().ToList()
);
For this use-case, Cachalot provides an inventive solution:
- Describe preloaded data as a query (expressed as LINQ expression)
- When data is queried from the cache, determine if the query is a subset of the preloaded data
The two methods (of class DataSource
) involved in this process are:
- The same
OnlyIfComplete
LINQ extension DeclareLoadedDomain
method. Its parameter is a LINQ expression that defines a subdomain of the global data.
Example 1: In the case of a renting site like Airbnb, we would like to store in cache all properties in the most visited cities.
homes.DeclareLoadedDomain(h=>h.Town == "Paris" || h.Town == "Nice");
Then, this query will succeed as it is a subset of the specified domain.
var result = homes.Where( h => h.Town == "Paris" && h.Rooms >= 2)
.OnlyIfComplete().ToList();
But this one will throw an exception.
result = homes.Where(h => h.CountryCode == "FR" && h.Rooms == 2)
.OnlyIfComplete().ToList()
If we omit the call to OnlyIfComplete
, it will simply return the elements in the cache that match the query.
Example 2: In a trading system, we want to cache all the trades that are alive (maturity date >= today) and all the ones that have been created in the last year (trade date > one year ago)
var oneYearAgo = DateTime.Today.AddYears(-1);
var today = DateTime.Today;
trades.DeclareLoadedDomain(
t=>t.MaturityDate >= today || t.TradeDate > oneYearAgo
);
Then these queries will succeed as they are subsets of the specified domain.
var res =trades.Where(
t=>t.IsDestroyed == false && t.TradeDate == DateTime.Today.AddDays(-1)
).OnlyIfComplete().ToList();
res = trades.Where(
t => t.IsDestroyed == false && t.MaturityDate == DateTime.Today
).OnlyIfComplete().ToList();
But this one will throw an exception.
trades.Where(
t => t.IsDestroyed == false && t.Portfolio == "SW-EUR"
).OnlyIfComplete().ToList()
Domain declaration and eviction policy are of course mutually exclusive on a datatype. Automatic eviction would make data incomplete.