Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Cachalot DB - Very Fast Transactional Database for .NET Applications - Part 2

0.00/5 (No votes)
16 Apr 2019 1  
More on data manipulation

Introduction

This is the second part of a series concerning Cachalot DB. The first part can be found here.

Compressing Object Data

The business objects are stored internally in a type-agnostic format. Index fields are stored as Int64 or string, and all the object data is stored as UTF-8 encoded JSON. The process of transforming a .NET object in the internal format is called “packing”. Packing is done client-side, the server only uses the indexes and manipulates the object as row data. It has no dependency on the concrete .NET datatype.

By default, the object data is not compressed but for objects that take more than a few kilobytes, compression may be very useful. For an object that takes 10 KB in JSON, compression ratio is around 1:10.

To enable compression, add a single attribute on the business data type.

[Storage(compressed:true)]
public class Home
{

   °°°

Using compressed objects is transparent for the client code. However, it has an impact on the packing time which is done on the client. When objects are retrieved, they are unpacked (which may imply decompression).

As a conclusion, compression may be very useful starting with medium size objects if you are ready to pay a small price, client-side only, for data insertion and retrieval.

Storing Polymorphic Collections in the Database

Polymorphic collections are natively managed. Type information is stored internally in the JSON and it is used to deserialize the proper concrete type.

A small example from a trading system:

In order to store a collection of events, we must expose all required indexes on the base type.

Null values are perfectly acceptable for index fields which allow to expose indexed properties which make sense only for a specific child type.

public abstract class ProductEvent
{
                [PrimaryKey(KeyDataType.IntKey)]
                public int Id { get; set; }

                [Index(KeyDataType.StringKey)]
                public abstract string EventType { get; }

                [Index(KeyDataType.IntKey, ordered:true)]
                public DateTime EventDate { get; set; }

                [Index(KeyDataType.IntKey, ordered: true)]
                public DateTime ValueDate { get; set; }      

      °°°
}

public abstract class NegotiatedProductEvent: ProductEvent
{
                °°°
}

public class IncreaseDecrease : NegotiatedProductEvent
{
      °°°
                 public override string EventType => "IncreaseDecrease";
}

This is an example of code which retrieves a collection of concrete events from a DataStore typed with an abstract base class.

var events = connector.DataSource<ProductEvent>();

var increaseEvents = events.Where(

         evt => evt.EventType == "IncreaseDecrease" &&

         evt.EventDate == DateTime.Today

).Cast<IncreaseDecrease>()

Conditional Operations and “Optimistic Synchronization”

A normal “put” operation adds an object or updates an existing one using the primary key as object identity.

More advanced use cases are implemented:

  1. Add an object only if it is not already there and tell me if it was really added
  2. Update an existent object only if the current version in the database satisfies a condition

The first one is available through the TryAdd operation on the DataSource class. If the object was already there, it is not modified, and it returns false. The test on the object existence and the insertion are executed as an atomic operation. The object cannot be updated or deleted by another client in-between.

That can be useful for data initialization, creating singleton objects, distributed locks, etc.

The second use case is especially useful for, but not limited to, the implementation of “optimistic synchronization”. The UpdateIf method on the DataSource class implements it.

If we need to be sure that nobody else modified an object while we were editing it (manually or algorithmically), there are two possibilities:

  • Lock the object during the edit operation. This is not the best option for a modern distributed system. A distributed lock is not suitable for massively parallel processing and if it is not released automatically (due to client or network failure), manual intervention by an administrator is required.
  • Use “optimistic synchronization”: do not lock but require that, when saving the modified object, the one in the database did not change since it was loaded. Otherwise, the operation fails, and we must retry (load + edit + save). This can be achieved in different ways:
    • Having a version on an object. When we save version n+1, we require that the object in the database is still at version n. In Cachalot DB, the syntax is items.UpdateIf(item, i=> i.Version == n-1)
    • Having a timestamp on an object. When we save a modified object, we require that the timestamp of the version in the database is identical to the one of the object before our update.
var oldTimestamp = item.Timestamp;
item.Timestamp = DateTime.Now;
items.UpdateIf(item, i=> i.Timestamp == oldTimestamp);

This can be even more useful when committing multiple object modifications in a transaction. If a condition is not satisfied on one object, rollback the whole transaction.

More on transaction in Part 3 ...

The fully open source code is available at:

Precompiled binaries and full documentation are available at:

The client code is available as nuget package at nuget.org.

To install: Install-Package Cachalot.Client

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here