In this article, I share my experience of using Cassandra together with Microsoft technologies.
Introduction
After a long time of coding in Microsoft technology stack and dealing with relational databases, time came for me also to look to the bright side of NoSql databases. The choice that has fallen to Cassandra was fortuitous. I never researched about NoSql databases or compared its performance or features. It just was Cassandra that came first. So I started to play with it and after some time of self studying and investigation, I decided to share my experience of using it together with Microsoft technologies. I know that Cassandra is a daughter of Priam, Hecuba, Apache and Facebook and was predetermined to run on Unix/Linux/Ubuntu and etc., but I wanted it on Windows...
I hope that my humble exercises would be useful for those who want to combine Cassandra with Microsoft.
Background
Before you continue to read, you have to understand what is Cassandra, Angular and ASP.NET Web API. If you do not, get to know about it DataStax, Cassandra, ASP.NET Web API 2 and Angular. DataStax distributed Cassandra for Windows and has its own tools for querying and building data.
Using the Code
OK, now after studying all installations and understanding what cluster, keyspace, column family, aggregate key are, as well as what is Anti-Entropy and Read Repair, memtables and bloom-filters, we have to know what is databinding, controllers, scopes, services, dependency injection, filters, modules and directives. Ah, and of course be familiar with .NET, C# and ASP.NET MVC. But even if we omitted something, it's all right, we'll try to learn it.
Let's start!
After installing DataStax DevCenter 1.5, you will see newly added applications in application list. This is what I see on my Windows 8:
Run DataStax DevCenter application. It has Eclipse look and feel for me.
First of all, we need to create a new connection on Connections pane in the upper left corner of DevCenter. Right click on it and select Open Connection. In the opened window, enter a connection name in the Connection name input, 127.0.0.1 in contact hosts. In the future, we add more hosts than one, but now one host is enough. Click add on the right of Contact hosts input area. Native protocol port is 9042. Save the form:
If everything is correct, you will connect to Cassanda database and will see a new connection added to Connection pane.
Now, when everything is OK, let's plunge into the captivating world of creating keyspacses and column families. You can think of a keyspace like of a database. A keyspace is the outmost container for data in Cassandra. Sounds like a database in the world of SQL Server or Oracle or MySql. In the same way as database is a container for tables, a namespace is a container for column families. Thus, you can think about them as though they are tables.
DataStax on installing, creates a folder DataStax in Program Files on your PC. Go there, to DevCenter, examples and there, you will find six CQL files.
CQL is for contextual query language. It looks like SQL but it is not the same. The difference is originated from the different concepts of storing and retrieving data of two kinds of databases. One is relational and the other is a key-value container. So forget about JOIN, GROUP BY and FOREIGN KEY.
One of the files is called videodb-schema. Let's open it. The first row says:
CREATE KEYSPACE videodb WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
This command will create keyspace videodb as we can guess. 'Class
' is the name of replica placement strategy. It could be SimpleStrategy
uses only for a single data center, if you need more than one data center, choose in NetworkTopologyStrategy
. We shall not. So our class is SimpleStrategy
.
Replication factor is a parameter that determines how many nodes in your cluster store copies of data. For example, if Replication Factor is set to 2
, there will be two copies of every data stored on different nodes. As common sense dictates, the Replication Factor cannot be greater than the number of nodes in the cluster. You cannot store 10 replicas of data when you only have 8 nodes available. If you try to do this, your writes will fail. We have one node and our Replication Factor will be 1.
After running this command in central big pane of our DataStax DevCenter, we will see a new keyspace added to the already existing default keyspaces in the upper right schema pane:
Now we can select our new keyspace and work with it.
Don't forget to select your connection.
OK, now we are ready to create tables. Ah, sorry, column families in our new NoSql world.
Go to our videodb-schema
file, copy all the create table commands to central pane of DevCenter and run the execute cql script button (white triangle on green background).
Now we'll expand videodb schema:
All the column families that are called tables can be seen in the schema pane.
No, when we have database schema, let's add some data.
Open videodb-inserts files, copy its content and run execute button.
To check if everything is correct, ask the Cassandra:
select * from videos;
The result should look like this:
For working with Cassandra, we have another, more lightweight tool - Cassandra CQL Shell. For all these tasks I've written above, you can use this tool. Just open it:
Run:
use videodb;
and you are connected to Cassandra. Now it's ready and listening to your commands. Let's ask her something.
select * from users;
The result should be like this:
Now we have database with data. It's time for coding!
In this part, we will create a simple console application that will connect to Cassandra and do some basic queries, after that, we'll proceed to announced Angular and MVC API.
Open your Visual Studio and create console application project. Go to NuGet Package Manager and browse for CassandraCSharpDriver
by DataStax.
Install it and be sure that reference Cassandra was added to your project.
Create CassandraEngine
class and add to it:
public class CassandraEngine
{
private Cluster cluster;
private ISession session;
public CassandraEngine()
{
SetCluster();
}
private void SetCluster()
{
if (cluster == null)
{
cluster = Connect();
}
}
public ISession GetSession()
{
if (cluster == null)
{
SetCluster();
session = cluster.Connect();
}
else if (session == null)
{
session = cluster.Connect();
}
return session;
}
private Cluster Connect()
{
string[] nodes = GetAppSetting("cassandraNodes").Split(',');
QueryOptions queryOptions = new QueryOptions()
.SetConsistencyLevel(ConsistencyLevel.One);
Cluster cluster = Cluster.Builder()
.AddContactPoints(nodes)
.WithDefaultKeyspace("videodb")
.WithQueryOptions(queryOptions)
.Build();
return cluster;
}
private string GetAppSetting(string key)
{
return ConfigurationManager.AppSettings[key];
}
private void Close()
{
cluster.Shutdown();
}
}
You can see that I commented some rows for future uses. After that, we will see how to connect to Cassandra with username and password and create additional node on another computer and will try to work with Cassandra that has two nodes. I don't think there are reasons to use Cassandra that are running on one machine. But it fits our training purposes.
Our config file is very simple, now it has only one key:
<appSettings>
<add key="cassandraNodes" value="127.0.0.1"/>
</appSettings>
In the Connect
method, we get nodes ip from config file:
string[] nodes = GetAppSetting("cassandraNodes").Split(',');
In the next parts we will add, as I've mentioned, more nodes and array of nodes will have sense. Till now, we have only one node that is run on localhost. Remember we created it above in the DataStax DevCenter?
After that, we are setting Consistency Level. Well, Consistency Level means how we want to synchronize a row on all of Cassandra nodes. It could be one of a dozen of options. Consistency Level number determines on how many replicas write should succeed before Cassandra says OK to the client application. In our case, it is set to one. As you can understand that after writing to the commit log and memtable
of at least one node, the process will be counted as successful.
Next command builds the cluster with node ip, credentials (if we have), default keyspace and Consistency Level mentioned above.
Now, go to our Program
class.
class Program
{
protected static ISession session;
protected static IMapper mapper;
static void Main(string[] args)
{
CassandraEngine engine = new CassandraEngine();
session = engine.GetSession();
GetUsers();
Console.ReadLine();
}
public static void GetUsers()
{
string json = string.Empty;
var rows = session.Execute("SELECT * FROM videodb.users;");
foreach (var row in rows)
{
Console.WriteLine("\n");
Console.WriteLine(row.GetValue(row.GetColumn("username").Type, 0) == null ?
"" : row.GetValue(row.GetColumn("username").Type, 0).ToString());
Console.WriteLine(row.GetValue(row.GetColumn("created_date").Type, 1)
== null ? "" : row.GetValue(row.GetColumn("created_date").Type, 1).ToString());
Console.WriteLine(row.GetValue(row.GetColumn("email").Type, 2) == null ?
new List<string>() : row.GetValue(row.GetColumn("email").Type, 2)
as List<string>);
Console.WriteLine(row.GetValue(row.GetColumn("firstname").Type, 3) == null ?
"" : row.GetValue(row.GetColumn("firstname").Type, 3).ToString());
Console.WriteLine(row.GetValue(row.GetColumn("lastname").Type, 4) == null ?
"" : row.GetValue(row.GetColumn("lastname").Type, 4).ToString());
Console.WriteLine(row.GetValue(row.GetColumn("password").Type, 5) == null ?
"" : row.GetValue(row.GetColumn("password").Type, 5).ToString());
Console.WriteLine("===========================================");
}
}
}
Here, we are initializing our CassandraEngine, getting the session and selecting users.
You can see that I don't say to query which keyspace to use because it was set in the cluster building. But generally, you can use select
command like this:
use videodb; SELECT * FROM videodb.users;
We are not restricted in using only one keyspace for one application. In our example, I am using in one keyspace but there may be more than one.
Interesting that when you are using...
SELECT * FROM
...the columns returned not as you declared it in your CREATE TABLE
command but in alphabetic order except primary key column that comes first with index zero.
I am using in method...
public object GetValue(Type type, int index)
...that accepts type and column index. So if I want to get first name of the user, I should know column type and column index. First name has index 3 after username
, created_date
and email
.
Another point is CQL data types and how to convert it to .NET types. If you remember from schema listing our users column family has email list of varchar
s. Cassandrs list converted to .NET as a collection of ordered elements so list of strings that I used in the code should be OK for email.
That's all! Let's run our application and see the output:
Very simple, isn't it?
Next time, we will create user defined functions, user defined types and tuples. We will try to receive data from Cassandra as .NET objects, will insert json data and will start with our API.
Thank you for reading!
History
- 24th February, 2016: Initial version