(untagged)

What's New in RavenDB 4.2

Kamran A

23 May 2019

Learn about the new RavenDB 4.2 features like cluster-wide ACID transactions, distributed counters, the experimental graph API, and more.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

RavenDB 4.2 is a major milestone in the development of RavenDB and it brought some exciting new features focused on managing large amounts of data at cluster-scale.

In this article, I'll cover the flagship features: Graph queries offer a way to use Raven Query Language (RQL) to perform graph queries against documents, cluster-wide transactions bring distributed ACID guarantees to your data, and distributed counters enable massive-scale counter scenarios like Reddit-style voting.

Finally, JavaScript indexes are generally available and you can revert to previous revisions of documents without going offline.

Graph Query API

Perhaps the most exciting new feature is the Graph Query API which allows you to perform petabyte scale aggregation. What does "petabyte scale" even mean? It means working with extremely large datasets with many relationships between data points. For example, healthcare systems that deal with patient data and symptom finding — dealing with relationships between many variables that contribute to different conditions, solutions, and the body.

But it's hard to showcase queries of such a large domain so we'll use a simpler one you are probably familiar with: issue tracking.

Below is a visualization of a secret, permissions-based project that might have issues being assigned to members of different groups:

We'll see how we can issue graph queries in RavenDB against this model to efficiently find answers to some important questions.

The issue document is defined like this:

{
  "Name": "Design a logo for the project",
  "Users": [
    "users/2944"
  ],
  "Groups": [
    "groups/project-x"
  ]
}

The first question we want to answer is the simplest: How many issues does Sunny (users/2944) have access to?

Graph queries in RavenDB combine the native Raven Query Language (RQL) with syntax inspired by the neo4j Cypher graph query language. The query to answer the question above will look like this:

with {   from Users where id() = "users/2944" } as u
match    (Issues as i)-[Users]->(u)
select   i.Name as Issue, u.Name as User

If you are not familiar with the Cypher syntax, let's break down this bit:

(Issues as i)-[Users]->(u)

(Issues as i) is RQL for aliasing the issues collection to use in the select projection. The next bit, -[Users]-> is Cypher syntax representing a direct property relationship, "(i).Users" and then ends with the value to find in the collection, our (u) alias above which represents Sunny.

In English, the way to read the match statement would be, "Match the Issues (i) that contain the ID of Sunny (u) in their Users property collection". This can be expressed just as easily with a traditional collection query:

from Issues as i
where i.Users in ("users/2944")
select i.Name as Issue

That is straightforward and easily answered. What is tougher to answer and where things start to really get fun, is to ask What issues does Max have access to? because Max is not a direct reference via a document key on the issue document itself. Instead, he is a member of a group, which is referenced by the issue.

This was not possible in previous versions of RavenDB but now with the Graph API, it's just as easy to express as the first query:

with {   from Users where id() = "users/2944" } as u
match    (Issues as i)-[Groups]->(Groups as g)<-[Groups]-(u)
select   i.Name as Issue, u.Name as User

We are now using more sophisticated syntax. The match expression reads like "Match issues that share groups with Max." The ->(Groups as g)<- denotes a tie from the left-hand Issue to the right-hand User through the Groups property.

Since Max is part of the project-x group, and the issue is tied to that group as well, the issue is returned in the query.

What about for Nati, who is under a sub-group team-nati? How would we return issues he has access to, like issue 4335? We need to look at issues with a group once-removed.

with {   from Users where id() = "users/341" } as u
match    (Issues as i)-[Groups]->(Groups as midpoint)
            -[Parents]->(Groups as g)<-[Groups]-(u)
select   i.Name as Issue, u.Name as User

This query's match expression is traveling one relationship higher to the Group.Parents relationship to find parent groups that Nati is in and what their issues are.

This pattern of traveling upwards does not need to be hardcoded like this for single-level deep relationships, we can actually use the recursive helper to find the root group from a user:

with {   from Users where id() = $uid } as u
match    (Issues as i)-[Groups]->(Groups as direct)
            -recursive (0, all) { [Parents]->(Groups) }->
                (Groups as g)<-[Groups]-(u)
select   i.Name as Issue, u.Name as User

On the left-hand side of the (Groups as g) expression, we are leveraging recursion to find the issue's groups' parents, where we allow empty parents and follow all paths in the graph. From the left-side, we find the user's groups who match the right-handed groups.

We also parameterized the query to find any user $uid. If we start from Nati, the query will return issue 4335 since the path to Nati from the issue's group, project-x is project-x -> team-nati -> Nati.

If we search for Phoebe, she will also have access to the issue since her membership is project-x -> execs -> board -> Phoebe.

What about Snoopy? Will she have access according to our query? No. In the query above notice we only search a group's parents, and Snoopy's group r-n-d is a child of execs, not a parent like board. So according to our query, Snoopy will not have access.

If you want to learn more, check out the blog series on how the graph API was brought to life and the graph API documentation.

Cluster-wide ACID transactions

Another major feature that is new in 4.2 is cluster-wide ACID transactions. RavenDB has always supported single node ACID transactions. When you save a document to a cluster, the default mode is a multi-master model where the document is saved on at least one node then replicated to the alternate nodes. This is desirable in most situations where you need to have successful writes. However, this type of model has some error conditions that can cause conflicts making it hard to guarantee data is consistent within a cluster scenario.

Since 3.0 RavenDB has offered the ability to WaitForReplicationAfterSaveChanges() which will ensure that the write is replicated at least once before confirming the transaction but with cluster-wide transactions the save transaction will not go through unless the majority of nodes confirm the write.

As a familiar example, let's take a look at validating a new user's email address and ensuring it is unique:

public boolean IsEmailUnique(string email) {

  using (var session = store.OpenSession()) {
    
    // Try checking if the document exists for this email
    var existingEmail = session.Load<Email>(
      $"Emails/{email}");

    return existingEmail == null;
  }
}

Since document key-based operations like Load are transactional, we can leverage document keys as the unique constraint. In this case, we store a Email document with the email as the key. If the document exists, we know the email has been taken.

This type of lookup works well for single-node clusters because there is only one single database and no replication happening between multiple nodes. However, in a multi-node cluster there can be conditions under which two nodes may still be syncing (imagine two nodes creating the Email document at the same exact moment) and this code would return false resulting in a duplicate email being reserved.

As of RavenDB 4.2, you can now issue cluster-wide transactions that ensure consistency before finishing which guarantees robustness and resiliency at the cost of availability due to latency and round-trips. RavenDB implements cluster transactions using the Raft consensus algorithm.

Compare-Exchange Document Store Operations

For this example where we need to check whether a value exists in the cluster, we can use the GetCompareExchangeValueOperation document store operation:

public boolean IsEmailUnique(string email) {
    
    // Try checking if this email exists
    // Uses the compare-exchange value to ask the cluster
    var existingEmail = await store.Operations.Send(
      new GetCompareExchangeValueOperation<string>($"Emails/{email}"));

    return existingEmail.Value == null;
  }
}

Why aren't we using a session object? This is using a RavenDB store operation. Compare-exchange store operations happen outside the scope of a session transaction since a session transaction only takes place on a single node while the distributed cluster-wide compare-exchange operation takes place on all cluster nodes.

How would this CompareExchangeValue get created in the first place? We can create it when we save our new user.

Cluster-wide Session Transaction Mode

To enable cluster-wide session transactions you must set the TransactionMode to TransactionMode.ClusterWide when opening a new Session.

When creating a new user, we will attempt to reserve the user's email address using a compare-exchange value.

When cluster-wide transaction mode is enabled for this session, RavenDB ensures that the majority of nodes in the cluster have fully accepted and persisted the document (fsync to disk) before the transaction completes. The Raft consensus algorithm is used which you can see step-by-step in this visualization.

public void CreateUser(User user) 
{
  using (var session = store.OpenSession(
    new SessionOptions() 
    {
      TransactionMode = TransactionMode.ClusterWide
    })) 
  {
    
    // Store the user, which will populate the `User.Id`
    // identifier from RavenDB
    session.Store(user);

    // Now reserve this email for this user
    // Uses the compare-exchange value to ask the cluster
    session.Advanced.ClusterTransaction
      .CreateCompareExchangeValue($"Emails/{email}", user.Id);

    try 
    {
      // Attempt to save
      session.SaveChanges();
    } 
    catch (ConcurrencyException cex) 
    {
      // Email is already reserved
      // Cancel the transaction
      throw new EmailAlreadyInUseException(email);
    }
  }
}

Here we are using session.Advanced.ClusterTransaction.CreateCompareExchangeValue to create the compare-exchange value where the key is the string Emails/{email}. The compare-exchange value is the stored user document ID reference, which we can use to do user lookups by email in the future. It's worth noting the value can be any object T, it doesn't have to be a value type.

Note: Calling this method on the session does not create the compare-exchange value immediately but rather waits until session.SaveChanges is called, similar to how session.Store works.

Since we are using the cluster-wide transaction mode, RavenDB will throw an error when SaveChanges is called if the cluster detects a concurrency conflict with the compare-exchange value (another client may have attempted to reserve the email). In this case, RavenDB will throw a ConcurrencyException that we can use to handle the conflict.

You can learn more about how the mechanics of cluster-wide transactions work and using the Session cluster API.

Distributed counters

In RavenDB 4.2, distributed counters make massive-scale "incrementing counter" scenarios possible with minimal overhead without writing the full document to disk for every request and resolves distributed concurrent updates automatically. Imagine a site like Reddit where tens of millions of users can vote on links every day. When a handsome Corgi photo surfaces, it skyrockets to the top of /r/aww (as it should) within minutes with thousands of votes. Reddit has hundreds of thousands of posts per day with an average of 58 million votes per day.

Let's go ahead and simulate posting a new Handsome Corgi image, ready to absorb the world's karma within a few minutes:

void CreatePost(string title, string url, string postedByUserId) {
  using (var session = store.OpenSession()) {
    var post = new Post() {
      Title = title,
      Url = url,
      PostedBy = postedByUserId
    };

    // Store the document to generate its ID
    // and begin tracking it in the Session
    session.Store(post);

    // Get the counter references stored for this new document
    var postCounters = session.CountersFor(post);

    // Seed with 1 upvote
    postCounters.Increment("Votes");

    // Save the changes
    session.SaveChanges();
  }
}

Here we've created our new link and we are using the new CountersFor session API to manage distributed counters for the new document. Counters are not stored on the document itself so incrementing/decrementing them avoids locking the document.

Now that we've seeded the upvotes counter and posted our image, we can start accepting up and down votes from the world, let's say through the following two methods:

void Upvote(string postId) {
  using (var session = store.OpenSession()) {
    var postCounters = session.CountersFor(postId);

    postCounters.Increment("Votes");

    session.SaveChanges();
  }
}

void Downvote(string postId) {
  using (var session = store.OpenSession()) {
    var postCounters = session.CountersFor(postId);

    // You monster!!
    postCounters.Increment("Votes", -1);

    session.SaveChanges();
  }
}

In the code above we have two methods to upvote and downvote a post using a single Votes counter. You could choose to track upvotes/downvotes separately in their own counters but you can decrement a counter by passing a negative number. Since counters are distributed across the cluster, this can handle massive-scale scenarios while avoiding race conditions incrementing and decrementing the counters. Since counter values are managed separately from documents, updating counters is a low-cost, high-performance operation.

RavenDB also allows you to batch update counters in a single operation by sending a BatchCounterOperation command for even more advanced high-performance scenarios.

We are accepting votes now on our Corgi post so after an hour we decided to come back to bask in the glorious karma we've definitely received. RavenDB allows you to Include counters while loading documents, making it simple to use a single request to the database to pass data to the UI for rendering:

public PostWithVotes GetPost(string postId) {
  using (var session = store.OpenSession()) {

    // Include Votes counter so the data will be fetched
    // in a single request
    var post = session.Load<Post>(postId, includeBuilder =>
      includeBuilder.IncludeCounter("Votes"));

    // Loads the counters from the session (does not result in another
    // request to RavenDB)
    var postCounters = session.CountersFor(post);

    // Gets the accumulated value of the counter
    var votes = postCounters.Get("Votes");

    return new PostWithVotes() {
      Id = post.Id,
      Title = post.Title,
      Url = post.Url,
      PostedBy = post.PostedBy,
      Votes = votes
    }
  }
}

There is more that Counters offer like projecting counter values in queries, indexing counters by name, and using the Changes API to push real-time updates to clients.

What else is new?

JavaScript indexes are generally available

JavaScript indexes are no longer experimental and now have first-class support alongside the existing support for C# indexes:

JavaScript indexes are another way to express indexes and may provider an alternative way to perform more complex modeling logic using JavaScript versus C#'s LINQ syntax. JavaScript indexes also support referencing external scripts like Lodash or Moment.js to bring in additional global functions.

For an in-depth example, see my previous article on Data Modeling with Indexes in RavenDB where I feature JavaScript index examples. You can also read the JavaScript Indexes documentation for more information.

Revert revisions using time-travel restore

In 4.2, you can now revert documents to previous revisions without any downtime (an "online" operation). In traditional databases that offer a restore feature, you must take the database offline impacting users. Instead, RavenDB leverages the existing revisions feature to allow you to restore to a snapshot in time while maintaining read-access to the database.

This feature can be accessed under the Database Settings > Document Revisions.

To learn more you can read about how the point-in-time restore is performed and consult the Revert Revisions documentation.

Conclusion

In this article we explored several of the major new features RavenDB 4.2 has shipped with and touched on several more. If you want to get started, you can download the new version now or head over to the Learn RavenDB site!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here