Introduction
A LRU Cache is a key-value based data container that is constrained by size and/or age, removing the least recently used objects first. This algorithm requires keeping track of the most recent time each object is accessed, which can be expensive to ensure the algorithm always discards the least recently used item.
Other LRU Algorithms
Most LRU Caching algorithms consist of two parts: a dictionary and a list. The dictionary guarantees quick access to your data, and the list, ordered by age of the objects, controls the lifespan of objects and determines which objects are to be removed first.
A simple LRU Cache implementation uses a doubly linked list; adding new items to the head, removing items from the tail, and moving any existing items to the head when referenced (touched). This algorithm is good for single threaded applications but becomes very slow in a multi-threaded environment. In a multi-threaded app, every time the linked list is modified, it must be locked. This requires thread locks on insert, read, and delete operations, causing the slowness.
Another algorithm is to mark each item’s age with a timestamp (DateTime
or incremented Integer
) value when touched. When the cache becomes full, the list of items must be sorted and truncated. This keeps insert and read operations very fast because no locks are required, but makes delete painfully slow (normally O(N * log N), for sorting).
Overview
The algorithm I choose, divides items into time slices I call AgeBag
s. Items are added to the current AgeBag
until the bag becomes full or the designated time span expires, then that AgeBag
is closed and the next AgeBag
becomes current. Items that are touched are marked with an index of the current bag. When the cache gets too full/old, the oldest AgeBag
is emptied, moving any nodes that have been touched to the correct AgeBag
and removing the rest of the nodes in the bag. This allows reads to be performed without locking, and makes deletes much faster because only items that are in the oldest AgeBag
are ever moved and no real sorting is ever performed.
In this system, even though the exact order of items is not maintained, this is still a true LRU Cache because everything that remains in the oldest bag (after moves are made for the touched items) is removed at the same time. Therefore, no item remains in the cache that is older than the removed items.
My implementation of LRUCache
contains two distinct sections which are coded as subclasses. The LifespanMgr
class tracks the item usage and determines which items to remove when the cache gets full/old. The Index
class provides Dictionary
key/value access to all objects stored in the cache. The Index
has delegates to calculate the key of any item added to the cache, and to load an item by key/value from an external source.
Instead of storing item nodes in the Index
, the Index
holds WeakReference
objects that point at the item nodes. By doing this, the Index
does not prevent items from being garbage collected. This also allows old items to be removed from the AgeBag
without having to remove the items from the index. Once a Node
is removed from the LifespanMgr
, it becomes eligible for garbage collection. The Node
is not removed from the Index
es immediately. If an Index
retrieves the node prior to garbage collection, it is reinserted into the current AgeBag
’s node list. If it has already been garbage collected, a new object gets loaded. If the Index
size exceeds twice the cache’s capacity, the Index
is cleared and rebuilt.
Locking
Before I get into the code, I should provide a brief explanation of all the locking mechanisms used by the LRUCache
:
The first and simplest is the Interlocked
class which ensures that increment and decrement operations occur without context switching. This is all that is required to make increment and decrement operations thread safe, and is much faster than other locking methods.
The next locking mechanism is the Monitor
class. This class is called whenever you use the C# lock
keyword. One method of note is Monitor.TryEnter(obj)
, which immediately returns a bool
specifying if a lock was acquired. Unlike the standard Enter()
method, after the first thread gets the lock, all other threads will skip the operation instead of waiting for the first thread to complete. This is good for cases where speed is more important than the action being locked. It is also ideal for cases when the locked code only needs to be performed by one thread.
if(Monitor.TryEnter(obj))
try {
}
finally {
Monitor.Exit(obj);
}
The last locking mechanism used is ReaderWriterLock
. At any given time, it allows either concurrent read access for multiple threads, or write access for a single thread. A ReaderWriterLock
provides better throughput than Monitor
because of its concurrent read access. Because using ReaderWriter
locks can get a little tricky, I use a wrapper class that simplifies locking a provided delegate.
Code Review
Because the LifespanMgr
is the most interesting part of the code, I will spend most of the remaining article discussing it; however, the code is highly commented so you shouldn’t have any trouble following the rest of the code.
public void Touch()
{
if( _value != null && ageBag != _mgr._currentBag )
{
if( ageBag == null )
lock( _mgr )
if( ageBag == null )
{
next = _mgr._currentBag.first;
_mgr._currentBag.first = this;
Interlocked.Increment( ref _mgr._owner._curCount );
}
ageBag = _mgr._currentBag;
Interlocked.Increment( ref _mgr._currentSize );
}
_mgr.CheckValid();
}
Each time an indexed item is added or referenced, LifespanMgr.Node.Touch()
is called which (re)inserts the node if needed, and points the node's ageBag
variable at the current AgeBag
. Also, in the touch
method, LifespanMgr.CheckValid()
is called.
public void CheckValid()
{
DateTime now = DateTime.Now;
if( (_currentSize > _bagItemLimit || now > _nextValidCheck)
&& Monitor.TryEnter( this ) )
try
{
if( (_currentSize > _bagItemLimit || now > _nextValidCheck) )
{
if( _current > 1000000 || (_owner._isValid != null
&& !_owner._isValid()) )
_owner.Clear();
else
CleanUp( now );
}
}
finally
{
Monitor.Exit( this );
}
}
CheckValid
only performs an action if the AgeBag
gets full or the designated time span expires. When either of those cases are met, an optional IsValid()
delegate is called which checks the health of the overall cache. If the cache is invalid or out of date, all items in the cache are removed and items will be reloaded the next time an index accesses them. If IsValid
returns true
, the LifespanMgr.Cleanup()
method is called.
public void CleanUp( DateTime now )
{
if( _current != _oldest )
lock( this )
{
DateTime maxAge = now.Subtract( _maxAge );
DateTime minAge = now.Subtract( _minAge );
int itemsToRemove = _owner._curCount - _owner._capacity;
AgeBag bag = _bags[_oldest % _size];
while( _current != _oldest && (_current-_oldest>_size - 5
|| bag.startTime < maxAge || (itemsToRemove > 0
&& bag.stopTime > minAge)) )
{
Node node = bag.first;
bag.first = null;
while( node != null )
{
Node next = node.next;
node.next = null;
if( node.Value != null && node.ageBag != null )
if( node.ageBag == bag )
{
++itemsToRemove;
node.ageBag = null;
Interlocked.Decrement( ref _owner._curCount );
}
else
{
node.next = node.ageBag.first;
node.ageBag.first = node;
}
node = next;
}
bag = _bags[(++_oldest) % _size];
}
OpenCurrentBag( now, ++_current );
CheckIndexValid();
}
}
Cleanup
does the grunt work of moving touched items out of the last bag and deleting the rest. It also calls CheckIndexValid()
to see if indexes need to be recreated, and OpenCurrentBag()
which sets up the next current AgeBag
.
Usage
The UserCache.cs file contains an example of how the cache can be used:
public UserData FindByUserID( int userid )
{
return _findByUserID[userid];
}
public UserData FindByUserName( string username )
{
return _findByUserName[username];
}
private UserCache() : base( 10000, TimeSpan.FromMinutes( 1 ),
TimeSpan.FromHours( 1 ), null )
{
_isValid = IsDataValid;
_findByUserID = AddIndex<int>( "UserID", delegate( UserData user )
{ return user.UserID; }, LoadFromUserID );
_findByUserName = AddIndex<string>( "UserName", delegate( UserData user )
{ return user.UserName; }, LoadFromUserName );
IsDataValid();
}
Summary
This implementation of LRUCache
attempts to provide a fast, reliable access to recently used data in a multithreaded environment.