Introduction
RavenDB 4.2 is a major milestone in the development of RavenDB and it brought some exciting new features focused on managing large amounts of data at cluster-scale.
In this article, I'll cover the flagship features: Graph queries offer a way to use Raven Query Language (RQL) to perform graph queries against documents, cluster-wide transactions bring distributed ACID guarantees to your data, and distributed counters enable massive-scale counter scenarios like Reddit-style voting.
Finally, JavaScript indexes are generally available and you can revert to previous revisions of documents without going offline.
Graph Query API
Perhaps the most exciting new feature is the Graph Query API which allows you to perform petabyte scale aggregation. What does "petabyte scale" even mean? It means working with extremely large datasets with many relationships between data points. For example, healthcare systems that deal with patient data and symptom finding — dealing with relationships between many variables that contribute to different conditions, solutions, and the body.
But it's hard to showcase queries of such a large domain so we'll use a simpler one you are probably familiar with: issue tracking.
Below is a visualization of a secret, permissions-based project that might have issues being assigned to members of different groups:
We'll see how we can issue graph queries in RavenDB against this model to efficiently find answers to some important questions.
The issue document is defined like this:
{
"Name": "Design a logo for the project",
"Users": [
"users/2944"
],
"Groups": [
"groups/project-x"
]
}
The first question we want to answer is the simplest: How many issues does Sunny (users/2944
) have access to?
Graph queries in RavenDB combine the native Raven Query Language (RQL) with syntax inspired by the neo4j Cypher graph query language. The query to answer the question above will look like this:
with { from Users where id() = "users/2944" } as u
match (Issues as i)-[Users]->(u)
select i.Name as Issue, u.Name as User
If you are not familiar with the Cypher syntax, let's break down this bit:
(Issues as i)-[Users]->(u)
(Issues as i)
is RQL for aliasing the issues collection to use in the select
projection. The next bit, -[Users]->
is Cypher syntax representing a direct property relationship, "(i).Users" and then ends with the value to find in the collection, our (u)
alias above which represents Sunny.
In English, the way to read the match statement would be, "Match the Issues (i) that contain the ID of Sunny (u) in their Users property collection". This can be expressed just as easily with a traditional collection query:
from Issues as i
where i.Users in ("users/2944")
select i.Name as Issue
That is straightforward and easily answered. What is tougher to answer and where things start to really get fun, is to ask What issues does Max have access to? because Max is not a direct reference via a document key on the issue document itself. Instead, he is a member of a group, which is referenced by the issue.
This was not possible in previous versions of RavenDB but now with the Graph API, it's just as easy to express as the first query:
with { from Users where id() = "users/2944" } as u
match (Issues as i)-[Groups]->(Groups as g)<-[Groups]-(u)
select i.Name as Issue, u.Name as User
We are now using more sophisticated syntax. The match
expression reads like "Match issues that share groups with Max." The ->(Groups as g)<-
denotes a tie from the left-hand Issue
to the right-hand User
through the Groups
property.
Since Max is part of the project-x group, and the issue is tied to that group as well, the issue is returned in the query.
What about for Nati, who is under a sub-group team-nati? How would we return issues he has access to, like issue 4335? We need to look at issues with a group once-removed.
with { from Users where id() = "users/341" } as u
match (Issues as i)-[Groups]->(Groups as midpoint)
-[Parents]->(Groups as g)<-[Groups]-(u)
select i.Name as Issue, u.Name as User
This query's match expression is traveling one relationship higher to the Group.Parents
relationship to find parent groups that Nati is in and what their issues are.
This pattern of traveling upwards does not need to be hardcoded like this for single-level deep relationships, we can actually use the recursive
helper to find the root group from a user:
with { from Users where id() = $uid } as u
match (Issues as i)-[Groups]->(Groups as direct)
-recursive (0, all) { [Parents]->(Groups) }->
(Groups as g)<-[Groups]-(u)
select i.Name as Issue, u.Name as User
On the left-hand side of the (Groups as g)
expression, we are leveraging recursion to find the issue's groups' parents, where we allow empty parents and follow all paths in the graph. From the left-side, we find the user's groups who match the right-handed groups.
We also parameterized the query to find any user $uid
. If we start from Nati, the query will return issue 4335 since the path to Nati from the issue's group, project-x is project-x -> team-nati -> Nati.
If we search for Phoebe, she will also have access to the issue since her membership is project-x -> execs -> board -> Phoebe.
What about Snoopy? Will she have access according to our query? No. In the query above notice we only search a group's parents, and Snoopy's group r-n-d is a child of execs, not a parent like board. So according to our query, Snoopy will not have access.
If you want to learn more, check out the blog series on how the graph API was brought to life and the graph API documentation.
Cluster-wide ACID transactions
Another major feature that is new in 4.2 is cluster-wide ACID transactions. RavenDB has always supported single node ACID transactions. When you save a document to a cluster, the default mode is a multi-master model where the document is saved on at least one node then replicated to the alternate nodes. This is desirable in most situations where you need to have successful writes. However, this type of model has some error conditions that can cause conflicts making it hard to guarantee data is consistent within a cluster scenario.
Since 3.0 RavenDB has offered the ability to WaitForReplicationAfterSaveChanges()
which will ensure that the write is replicated at least once before confirming the transaction but with cluster-wide transactions the save transaction will not go through unless the majority of nodes confirm the write.
As a familiar example, let's take a look at validating a new user's email address and ensuring it is unique:
public boolean IsEmailUnique(string email) {
using (var session = store.OpenSession()) {
var existingEmail = session.Load<Email>(
$"Emails/{email}");
return existingEmail == null;
}
}
Since document key-based operations like Load
are transactional, we can leverage document keys as the unique constraint. In this case, we store a Email
document with the email as the key. If the document exists, we know the email has been taken.
This type of lookup works well for single-node clusters because there is only one single database and no replication happening between multiple nodes. However, in a multi-node cluster there can be conditions under which two nodes may still be syncing (imagine two nodes creating the Email
document at the same exact moment) and this code would return false
resulting in a duplicate email being reserved.
As of RavenDB 4.2, you can now issue cluster-wide transactions that ensure consistency before finishing which guarantees robustness and resiliency at the cost of availability due to latency and round-trips. RavenDB implements cluster transactions using the Raft consensus algorithm.
Compare-Exchange Document Store Operations
For this example where we need to check whether a value exists in the cluster, we can use the GetCompareExchangeValueOperation
document store operation:
public boolean IsEmailUnique(string email) {
var existingEmail = await store.Operations.Send(
new GetCompareExchangeValueOperation<string>($"Emails/{email}"));
return existingEmail.Value == null;
}
}
Why aren't we using a session
object? This is using a RavenDB store operation. Compare-exchange store operations happen outside the scope of a session transaction since a session transaction only takes place on a single node while the distributed cluster-wide compare-exchange operation takes place on all cluster nodes.
How would this CompareExchangeValue
get created in the first place? We can create it when we save our new user.
Cluster-wide Session Transaction Mode
To enable cluster-wide session transactions you must set the TransactionMode
to TransactionMode.ClusterWide
when opening a new Session
.
When creating a new user, we will attempt to reserve the user's email address using a compare-exchange value.
When cluster-wide transaction mode is enabled for this session, RavenDB ensures that the majority of nodes in the cluster have fully accepted and persisted the document (fsync
to disk) before the transaction completes. The Raft consensus algorithm is used which you can see step-by-step in this visualization.
public void CreateUser(User user)
{
using (var session = store.OpenSession(
new SessionOptions()
{
TransactionMode = TransactionMode.ClusterWide
}))
{
session.Store(user);
session.Advanced.ClusterTransaction
.CreateCompareExchangeValue($"Emails/{email}", user.Id);
try
{
session.SaveChanges();
}
catch (ConcurrencyException cex)
{
throw new EmailAlreadyInUseException(email);
}
}
}
Here we are using session.Advanced.ClusterTransaction.CreateCompareExchangeValue
to create the compare-exchange value where the key
is the string Emails/{email}
. The compare-exchange value
is the stored user document ID reference, which we can use to do user lookups by email in the future. It's worth noting the value can be any object T
, it doesn't have to be a value type.
Note: Calling this method on the session does not create the compare-exchange value immediately but rather waits until session.SaveChanges
is called, similar to how session.Store
works.
Since we are using the cluster-wide transaction mode, RavenDB will throw an error when SaveChanges
is called if the cluster detects a concurrency conflict with the compare-exchange value (another client may have attempted to reserve the email). In this case, RavenDB will throw a ConcurrencyException
that we can use to handle the conflict.
You can learn more about how the mechanics of cluster-wide transactions work and using the Session
cluster API.
Distributed counters
In RavenDB 4.2, distributed counters make massive-scale "incrementing counter" scenarios possible with minimal overhead without writing the full document to disk for every request and resolves distributed concurrent updates automatically. Imagine a site like Reddit where tens of millions of users can vote on links every day. When a handsome Corgi photo surfaces, it skyrockets to the top of /r/aww (as it should) within minutes with thousands of votes. Reddit has hundreds of thousands of posts per day with an average of 58 million votes per day.
Let's go ahead and simulate posting a new Handsome Corgi image, ready to absorb the world's karma within a few minutes:
void CreatePost(string title, string url, string postedByUserId) {
using (var session = store.OpenSession()) {
var post = new Post() {
Title = title,
Url = url,
PostedBy = postedByUserId
};
session.Store(post);
var postCounters = session.CountersFor(post);
postCounters.Increment("Votes");
session.SaveChanges();
}
}
Here we've created our new link and we are using the new CountersFor
session API to manage distributed counters for the new document. Counters are not stored on the document itself so incrementing/decrementing them avoids locking the document.
Now that we've seeded the upvotes counter and posted our image, we can start accepting up and down votes from the world, let's say through the following two methods:
void Upvote(string postId) {
using (var session = store.OpenSession()) {
var postCounters = session.CountersFor(postId);
postCounters.Increment("Votes");
session.SaveChanges();
}
}
void Downvote(string postId) {
using (var session = store.OpenSession()) {
var postCounters = session.CountersFor(postId);
postCounters.Increment("Votes", -1);
session.SaveChanges();
}
}
In the code above we have two methods to upvote and downvote a post using a single Votes
counter. You could choose to track upvotes/downvotes separately in their own counters but you can decrement a counter by passing a negative number. Since counters are distributed across the cluster, this can handle massive-scale scenarios while avoiding race conditions incrementing and decrementing the counters. Since counter values are managed separately from documents, updating counters is a low-cost, high-performance operation.
RavenDB also allows you to batch update counters in a single operation by sending a BatchCounterOperation
command for even more advanced high-performance scenarios.
We are accepting votes now on our Corgi post so after an hour we decided to come back to bask in the glorious karma we've definitely received. RavenDB allows you to Include
counters while loading documents, making it simple to use a single request to the database to pass data to the UI for rendering:
public PostWithVotes GetPost(string postId) {
using (var session = store.OpenSession()) {
var post = session.Load<Post>(postId, includeBuilder =>
includeBuilder.IncludeCounter("Votes"));
var postCounters = session.CountersFor(post);
var votes = postCounters.Get("Votes");
return new PostWithVotes() {
Id = post.Id,
Title = post.Title,
Url = post.Url,
PostedBy = post.PostedBy,
Votes = votes
}
}
}
There is more that Counters offer like projecting counter values in queries, indexing counters by name, and using the Changes API to push real-time updates to clients.
What else is new?
JavaScript indexes are generally available
JavaScript indexes are no longer experimental and now have first-class support alongside the existing support for C# indexes:
JavaScript indexes are another way to express indexes and may provider an alternative way to perform more complex modeling logic using JavaScript versus C#'s LINQ syntax. JavaScript indexes also support referencing external scripts like Lodash or Moment.js to bring in additional global functions.
For an in-depth example, see my previous article on Data Modeling with Indexes in RavenDB where I feature JavaScript index examples. You can also read the JavaScript Indexes documentation for more information.
Revert revisions using time-travel restore
In 4.2, you can now revert documents to previous revisions without any downtime (an "online" operation). In traditional databases that offer a restore feature, you must take the database offline impacting users. Instead, RavenDB leverages the existing revisions feature to allow you to restore to a snapshot in time while maintaining read-access to the database.
This feature can be accessed under the Database Settings > Document Revisions.
To learn more you can read about how the point-in-time restore is performed and consult the Revert Revisions documentation.
Conclusion
In this article we explored several of the major new features RavenDB 4.2 has shipped with and touched on several more. If you want to get started, you can download the new version now or head over to the Learn RavenDB site!