Introduction
This article attempts to highlight the latest developments in both the Mongo open-source document database and the open-source official C# driver. This piece has now been updated to reflect version 4.0.2 of the database and version 2.7 of the C# driver.
Overview of Document Databases.
Document databases store information relating to a record in a contiguous blob of data known as a document . A document’s structure usually follows the JSON format and consists of a series of key-value pairs. Unlike the schema of relational databases, the document’s structure does not reference empty fields. This flexible arrangement allows fields to be added and removed with ease. What’s more, there is no need to rummage about in various tables when trying to assemble the data; it’s all there in one solid block. The downside of all this is that Document databases tend to be bulky. But, now that disk drives are in the bargain basement, the trade off between speed of access and storage costs has shifted in favour of speed and that has given rise to the increased use of document databases. The Large Hadron Collider at Cern uses a document database but that's not why it keeps breaking down.
Cloud-Based Server for MongoDb.
There is a free cloud deployment of MongoDb available. The sandbox database plan provides 512mb of storage and is an easy way to test drive the database. It’s not time-limited but some commands are not supported. It's a good idea to watch the introductory videos first so that the appropriate options can be selected.
Desktop Installation of MongoDb.
The Mongodb binaries can be downloaded from here. The Community Server is the free desktop application. Installation is straight-forward using the default options but it’s not quite so easy to install in a directory other than the default, How to successfully change the default directory is detailed in the sample application. You may have to change your firewall settings to allow ‘mongod’ and ‘MongoDB Database Server’ through it. The best place to look for any installation errors is the log file at “C:\Program Files\MongoDB\Server\4.0\log\mongod.log”, the pop-up error messages can be misleading.
The Database Structure
The basic structure for storing data fields is the BsonElement
. It’s a simple KeyValue pair . The Key contains a field name and the Value its value. The Value can itself be a BsonElement
, so they can be nested, Russian doll style. Records are stored as documents. The Document
is a collection of BsonElements
. Here is an example document.
{
_id : 50e5c04c0ea09d153c919473,
Age : 43,
Cars : {0:Humber,1: Riley},
Forename : Rhys,
Lastname : Richards
}
Every record does not need to contain every field. The only required field is the _id and fields can be added at a future date without having to change the existing records. In this example, the Cars field is an array. Its Value field contains a nested Document
. The elements in the nested Document
are KeyValue pairs. The key is the array index number and the value is the name of the car.
The C# driver.
The C# .Net driver is available as the nuget package, MongoDB.Driver. The driver is used to interface your code to a Mongo database. It can serialize data classes to the database without the need for special attributes. All that's required is a unique Id. This is usually of type BSON.ObjectId
, a 12 byte time-stamped value which is automatically assigned by Mongodb . You can use a GUID instead but it needs to be mapped to a string. The reason for this is that a GUID is usually stored as a binary and the driver’s Aggregation Framework has problems digesting binary data. I get the same sort of trouble with cucumber sandwiches. The driver now employs the Task-based Asynchronous Pattern to prevent calls to the server blocking the user interface thread. This means that Console applications need to be provided with a user interface type thread so that async methods can return to the same thread that they were called on. The default behaviour is to use a threadpool thread. There are lots of examples on how to use the driver in the driver's documentation Quick Tour section and in the Reference section.
Connecting to the database.
The connection string for the default deployment of a desktop server is simply mongodb://localhost
. This sort of deployment is only suitable for testing the capabilities of MongoDb as it is insecure. Here is the code for accessing a database named test.
const string connectionString = "mongodb://localhost";
var client = new MongoClient(connectionString);
IMongoDatabase database = client.GetDatabase("test");
A new database will be created if it does not already exist.
Accessing collections.
Documents with a similar structure are arranged as named collections of data in the database. The driver has a Collection
object that acts as a proxy for a database’s collection. The following code shows how to access a collection, named 'entities', of type ClubMember
.
IMongoCollection<ClubMember> collection = database.GetCollection<ClubMember>("entities");
The IMongoCollection
object can then be passed to async
methods to enable CRUD operations to be carried out
Indexes.
MongoDB indexes use a B-tree data structure. All queries only use one index and a query optimiser chooses the most appropriate index for the task. The following code builds an index to sort data based on the Lastname
property then by the Forename
sorted A-Z and finally by the Age
property, oldest to youngest.
IndexKeysDefinition<ClubMember> keys =
Builders<ClubMember>.IndexKeys.Ascending("Lastname").Ascending("Forename").Descending("Age");
var options = new CreateIndexOptions { Name = "MyIndex"};
var indexModel = new CreateIndexModel<ClubMember>(keys,options);
await collection.Indexes.CreateOneAsync(indexModel);
This index is great for searching on Lastname or Lastname, Forename or Lastname, Forename, Age. It is not useful for sorting on Forename or Age or any combination of the two. The default behaviour is for indexes to be updated when the data is saved as this helps to prevent concurrency problems.
Querying Data Using Linq.
This is done by referencing the Collection’s AsQueryable
method before writing the Linq statements All the usual methods are available. Here are a couple of examples.
public async Task EnumerateClubMembersAsync(IMongoCollection<ClubMember> collection)
{
Console.WriteLine("Starting EnumerateClubMembersAsync");
List<ClubMember> membershipList =
await collection.AsQueryable()
.OrderBy(p => p.Lastname)
.ThenBy(p => p.Forename)
.ToListAsync();
Console.WriteLine("Finished EnumerateClubMembersAsync");
Console.WriteLine("List of ClubMembers in collection ...");
foreach (ClubMember clubMember in membershipList)
{
ConsoleHelper.PrintClubMemberToConsole(clubMember);
}
}
public async Task OrderedFindSelectingAnonymousTypeAsync(IMongoCollection&th;ClubMember> collection)
{
Console.WriteLine("Starting OrderedFindSelectingAnonymousTypeAsync");
var names =
await
collection.AsQueryable()
.Where(p => p.Lastname.StartsWith("R") && p.Forename.EndsWith("an"))
.OrderBy(p => p.Lastname)
.ThenBy(p => p.Forename)
.Select(p => new { p.Forename, p.Lastname })
.ToListAsync();
Console.WriteLine("Finished OrderedFindSelectingAnonymousTypeAsync");
Console.WriteLine("Members with Lastname starting with 'R' and Forename ending with 'an'");
foreach (var name in names)
{
Console.WriteLine(name.Lastname + " " + name.Forename);
}
}
In these examples, a List<T>
is returned from the query in order to output the results consecutively, but it is better, where possible, to enumerate the query result using the ForEachAsync(Action<T>)
method. This avoids having to hold the whole of the query result in memory. Something like.
public async Task FindUsingForEachAsync(IMongoCollection<ClubMember> collection)
{
Console.WriteLine("Starting FindUsingForEachAsync");
var builder = Builders<ClubMember>.Filter;
var filter =
builder.Or(
Builders<ClubMember>.Filter.Eq("Lastname", "Rees"),
Builders<ClubMember>.Filter.Eq("Lastname", "Jones"));
await
collection.Find(filter)
.ForEachAsync(c =>DoSomeAction(c));
Console.WriteLine(" Finished FindUsingForEachAsync");
}
private void DoSomeAction(ClubMember c)
{
}
Querying Data By Using Filters.
The driver uses filters or, more correctly, filter definitions in order to filter data. The filters are constructed by first instantiating a FilterDefinition
builder using the Builders<T>
helper class. The required FilterDefinition
is then constructed by calling one of the builder's filter constructing methods, after first passing the appropriate parameters to it. Here are a some examples.
public async Task FindUsingFilterDefinitionBuilder1Async(IMongoCollection<ClubMember> collection)
{
Console.WriteLine("Starting FindUsingFilterDefinitionBuilder1Async");
DateTime cutOffDate = DateTime.Now.AddYears(-5);
var builder = Builders<ClubMember>.Filter;
var filterDefinition = builder.Gt("MembershipDate", cutOffDate.ToUniversalTime());
List<ClubMember> membersList =
await
collection.Find(filterDefinition)
.SortBy(c => c.Lastname)
.ThenBy(c => c.Forename)
.ToListAsync();
Console.WriteLine("Finished FindUsingFilterDefinitionBuilder1Async");
Console.WriteLine("\r\nMembers who have joined in the last 5 years ...");
foreach (ClubMember clubMember in membersList)
{
ConsoleHelper.PrintClubMemberToConsole(clubMember);
}
}
public async Task FindUsingFilterDefinitionBuilder2Async(IMongoCollection<ClubMember> collection)
{
Console.WriteLine("Starting FindUsingFilterDefinitionBuilder2Async");
var builder = Builders<ClubMember>.Filter;
var filter =
builder.Or(
Builders<ClubMember>.Filter.Eq("Lastname", "Rees"),
Builders<ClubMember>.Filter.Eq("Lastname", "Jones"));
IEnumerable<ClubMember> jonesReesList =
await
collection.Find(filter)
.SortBy(c => c.Lastname)
.ThenBy(c => c.Forename)
.ThenByDescending(c => c.Age)
.ToListAsync();
Console.WriteLine("Finished FindUsingFilterDefinitionBuilder2Async");
Console.WriteLine("Members named Jones or Rees ...");
foreach (ClubMember clubMember in jonesReesList)
{
ConsoleHelper.PrintClubMemberToConsole(clubMember);
}
Console.WriteLine("...........");
}
Querying Data Using The Aggregation Framework.
The Aggregation Framework is used to collect and collate data from various documents in the database. The aggregation is achieved by passing a collection along a pipeline where various pipeline operations are performed consecutively to produce a result. It’s an oven-ready chicken type production line -there is less product at the end but it is more fit for purpose. Aggregation is performed by calling the Collection’s Aggregate method with an array of documents that detail various pipeline operations.
Aggregation Example.
In this example there is a document database collection consisting of the members of a vintage car club. Each document is a serialized version of the following ClubMember
Class
public class ClubMember
{
#region Public Properties
public int Age { get; set; }
public List <string> Cars { get; set; }
public string Forename { get; set; }
public ObjectId Id { get; set; }
public string Lastname { get; set; }
public DateTime MembershipDate { get; set; }
#endregion
}
The ClubMember Class has an array named Cars that holds the names of the vintage cars owned by the member. The aim of the aggregation is to produce an ordered, distinct, list of owners who have joined in the last five years for each type of car in the collection.
Step 1 Match Operation.
The match operation selects only the members that have joined in the last five years. Here's the code.
var utcTime5yearsago = DateTime.Now.AddYears(-5).ToUniversalTime();
var matchMembershipDateOperation = new BsonDocument
{
{ "$match", new BsonDocument { { "MembershipDate",
new BsonDocument { { "$gte",utcTime5yearsago } } } } }
};
As you can see, the code ends up with more braces than an orthodontist but at least itelliSense assists when you are writing it. The keyword $gte
indicates a greater than or equal query.
Step 2 Unwind Operation.
Unwind operations modify documents that contain a specified Array. For each element within the array a document identical to the original is created. The value of the array field is then changed to be equal to that of the single element. So a document with the following structure
_id:700,Lastname: "Evans", Cars["MG","Austin",Humber"]
Becomes 3 documents
_id:700,Lastname: "Evans", Cars:"MG"
_id:700,Lastname: "Evans", Cars:"Austin"
_id:700,Lastname: "Evans", Cars:"Humber"
If there are two or more identical elements, say Evans has two MGs, then there will be duplicate documents produced. Unwinding an array makes its members accessible to other aggregation operations.
var unwindCarsOperation = new BsonDocument { { "$unwind", "$Cars" } };
Step3 Group Operation.
Define an operation to group the documents by car type. Each consecutive operation does not act on the original documents but the documents produced by the previous operation. The only fields available are those present as a result of the previous pipeline operation. You can not go back and pinch a field from the original documents. The $ sign is used in two ways. Firstly, to indicate a keyword and, secondly, to differentiate field names from field values. For example, Age is a field name, $Age is the value of the Age field.
var groupByCarTypeOperation = new BsonDocument
{
{
"$group",
new BsonDocument
{
{ "_id", new BsonDocument { { "Car", "$Cars" } } },
{
"Owners",
new BsonDocument
{
{
"$addToSet",
new BsonDocument
{
{ "_id", "$_id" },
{ "Lastname", "$Lastname" },
{ "Forename", "$Forename" },
{ "Age", "$Age" },
{"MembershipDate","$MembershipDate"}
}
}
}
}
}
}
};
Step 4 Project Operation.
The _id field resulting from the previous operation is a BsonElement
consisting of both the field name and its Value. It would be better to drop the field name and just use the Value. The following Project operation does that.
var projectMakeOfCarOperation = new BsonDocument
{
{
"$project", new BsonDocument
{
{ "_id", 0 },
{ "MakeOfCar", "$_id.Car" },
{ "Owners", 1 }
}
}
};
Step 5 Sort Operation.
Define an operation to Sort the documents by car type.
var sortCarsOperation = new BsonDocument { { "$sort", new BsonDocument { { "MakeOfCar", 1 } } } };
The number 1 means perform an ascending sort. A 0 is used to indicate a decending sort
Step 6 Run the Aggregation and output the result.
var pipeline = new[]
{
matchMembershipDateOperation,
unwindCarsOperation,
groupByCarTypeOperation,
projectMakeOfCarOperation,
sortCarsOperation
};
var carStatsList = await collection.Aggregate<CarStat>(pipeline).ToListAsync();
Console.WriteLine("Finished AggregateOwnersByCarManufacturerAsync");
The results are returned as a List<CarStat>
.
public class CarStat
{
#region Public Properties
public string MakeOfCar { get; set; }
public BsonDocument[] Owners { get; set; }
#endregion
}
The results are enumerated like this
Console.WriteLine("\r\nMembers grouped by Car Marque");
foreach (CarStat stat in carStatsList)
{
Console.WriteLine("\n\rCar Marque : {0}\n\r", stat.MakeOfCar);
IEnumerable<ClubMember> clubMembers =
stat.Owners.ToArray()
.Select(d => BsonSerializer.Deserialize<ClubMember>(d))
.OrderBy(c => c.Lastname)
.ThenBy(c => c.Forename)
.ThenBy(c => c.Age)
.Select(c => c);
foreach (ClubMember clubMember in clubMembers)
{
ConsoleHelper.PrintClubMemberToConsole(clubMember);
}
}
The sample application has an aggregation example that performs various calculations on the data set such as Count, Min, Max and Total. But these sorts of aggregations, where there are not a lot of projections, are probably best carried out using Linq.
List<FamilyStat> familyStats =
await
collection.Aggregate()
.Group(
x => x.Lastname,
g =>
new FamilyStat
{
FamilyName = g.Key,
TotalAge = g.Sum(x => x.Age),
MinAge = (g.Min(x => x.Age)),
MaxAge = (g.Max(x => x.Age)),
Count = g.Count()
})
.SortBy(x => x.FamilyName)
.ToListAsync();
Querying Data Using Map Reduce.
MapReduce is a heavy-duty method used for batch processing large amounts of data. In this example, the ages for every person with the same Lastname
are totalled. There are two functions to be defined. The map method outputs a key/value pair for each document. The reduce method groups data by the key and performs some sort of mathematical function on the value
var map = new BsonJavaScript(@"function()
{
//Associate each LastName property with the Age value
emit(this.Lastname,this.Age);
}");
The Reduce method returns, for every Lastname
, the Lastname
as the key and the sum of the ages for every person with the same Lastname
as the value
var reduce = new BsonJavaScript(@"function(lastName,ages)
{
return Array.sum(ages);
}");
The output of one batch can be combined with that of another batch and fed back through the Reducer. In this example, the results for one batch of documents are output to a collection named ResultsCollection
on the server. If the collection already contains data, the new batch of data and the collection's data will be reduced and the total age for each Lastname will be updated and saved in the collection.
var options = new MapReduceOptions<ClubMember, BsonDocument>
{
OutputOptions = MapReduceOutputOptions.Reduce("ResultsCollection")
};
var resultAsBsonDocumentList = await collection.MapReduce(map, reduce, options).ToListAsync();
Console.WriteLine("The total age for every member of each family is ....");
var reduction =
resultAsBsonDocumentList.Select(
doc => new { family = doc["_id"].AsString, age = (int)doc["value"].AsDouble });
foreach (var anon in reduction)
{
Console.WriteLine("{0} Family Total Age {1}", anon.family, anon.age);
}
GridFS.
GridFS is a means of storing and retrieving files that exceed the BsonDocument
size limit of 16MB. Instead of storing a file in a single document, GridFS divides a file into chunks and stores each of the chunks as a separate document. GridFS uses two collections to store files. One collection stores the file chunks and the other is a collection of type GridFSFileInfo,
this collection holds information about how each file is stored and can include additional metadata. The chunk size is about 255k. The idea here is that smaller chunks of data can be stored more efficiently and consume less memory when being processed than large files. It’s generally not a good idea to store binary data in the main document as it takes up space that is best used by more meaningful data. Uploading data into GridFS is now more complicated than it used to be because of the introduction of asynchronous streams.
public async Task UploadDemoAsync(IMongoCollection<ClubMember> collection)
{
Console.WriteLine("Starting GridFSDemo");
IMongoDatabase database = collection.Database;
const string filePath = @"C:\temp\mars996.png";
const string fileName = @"mars996";
var photoMetadata = new BsonDocument
{
{ "Category", "Astronomy" },
{ "SubGroup", "Planet" },
{ "ImageWidth", 640 },
{ "ImageHeight", 480 }
};
var uploadOptions = new GridFSUploadOptions { Metadata = photoMetadata };
try
{
await UploadFileAsync(database, filePath, fileName,uploadOptions);
}
catch (Exception e)
{
Console.WriteLine("***GridFS Error " + e.Message);
}
}
public async Task UploadFileAsync(
IMongoDatabase database,
string filePath,
string fileName,
GridFSUploadOptions uploadOptions=null)
{
var gridFsBucket = new GridFSBucket(database);
using (FileStream sourceStream = File.Open(filePath, FileMode.Open))
{
using (
GridFSUploadStream destinationStream =
await gridFsBucket.OpenUploadStreamAsync(fileName, uploadOptions))
{
await sourceStream.CopyToAsync(destinationStream);
await destinationStream.CloseAsync();
}
}
}
An index can be created based on the GridFSFileInfo
metadata
public async Task DemoCreateIndexAsync(IMongoDatabase database, string indexName)
{
IMongoCollection<GridFSFileInfo> filesCollection = database.GetCollection<GridFSFileInfo>("fs.files");
IndexKeysDefinition<GridFSFileInfo> keys =
Builders<GridFSFileInfo>.IndexKeys.Ascending("Metadata.Category").Ascending("Metadata.SubGroup");
var options = new CreateIndexOptions { Name = indexName };
var indexModel = new CreateIndexModel<GridFSFileInfo>(keys, options);
await filesCollection.Indexes.CreateOneAsync(indexModel);
}
The index can be used to find a sub group of stored files. The files can then be downloaded from the database.
public async Task DemoDownloadFilesAsync()
{
IMongoCollection<GridFSFileInfo> filesCollection = database.GetCollection<GridFSFileInfo>("fs.files");
List<GridFSFileInfo> fileInfos = await DemoFindFilesAsync(filesCollection);
foreach (GridFSFileInfo gridFsFileInfo in fileInfos)
{
Console.WriteLine("Found file {0} Length {1}", gridFsFileInfo.Filename, gridFsFileInfo.Length);
try
{
await DemoDownloadFileAsync(database, filePath, fileName);
}
catch (Exception e)
{
Console.WriteLine("***GridFS Error " + e.Message);
}
}
}
public async Task<List<GridFSFileInfo>> DemoFindFilesAsync(IMongoCollection<GridFSFileInfo> filesCollection)
{
FilterDefinitionBuilder<GridFSFileInfo> builder = Builders<GridFSFileInfo>.Filter;
FilterDefinition<GridFSFileInfo> filter = builder.Eq("metadata.Category", "Astronomy")
& builder.Eq("metadata.SubGroup", "Planet");
return await filesCollection.Find(filter).ToListAsync();
}
public async Task DemoDownloadFileAsync(IMongoDatabase database, string filePath, string fileName)
{
var gridFsBucket = new GridFSBucket(database);
using (
GridFSDownloadStream<ObjectId> sourceStream = await gridFsBucket.OpenDownloadStreamByNameAsync(fileName)
)
{
using (FileStream destinationStream = File.Open(filePath, FileMode.Create))
{
await sourceStream.CopyToAsync(destinationStream);
}
}
}
MongoDB Replica Sets.
A replica set is a cluster of mongoDB instances that replicate amongst one another so that they all store the same data. One server is the primary and receives all the writes from clients. The others are secondary members and replicate from the primary asynchronously. The clever bit is that, when a primary goes down, one of the secondary members takes over and becomes the new primary. This takes place totally transparently to the users and ensures continuity of service. Replica sets have other advantages in that it is easy to backup the data and databases with a lot of read requests can reduce the load on the primary by reading from a secondary. You cannot rely on any one instance being the primary as the primary is determined by members of the replica set at run time.
Installing A Replica set as a Windows Service.
This example installs a replica set consisting of one primary and two secondary instances. The instances will be named MongDB0. MongoDB1, MongoDB2. They will use IP address localhost and listen on ports 27017, 27018 and 27019 respectively. The replica set name is myReplSet. The ports should be allowed through the windows firewall for both input and output, so you may need to add new Inbound and Outbound Rules. To allow access to ports, enter wf.msc in a command window. In the pop-up window select Inbound Rules. in the 'Actions' column select 'new Rule' then 'Port' and enter the details. Repeat the procedure for Outbound Rules. In a practical deployment the three servers will be on different machines within the network but here they are all on the same machine.
Step 1 Housekeeping Tasks.
Three new data folders named MongoDB0, MongoDB1, MongoDB2 need to be added and any instance of mongoDB that may be already running has to be removed. In this example, the service name to be removed is MongoDB. Open a command prompt in administrator mode and copy and paste the following:
cd "C:\Program Files\MongoDB\Server\4.0\bin"
mongod.exe --serviceName MongoDB --remove
md "C:\Program Files\MongoDB\Server\4.0\data\MongoDB0"
md "C:\Program Files\MongoDB\Server\4.0\data\MongoDB1"
md "C:\Program Files\MongoDB\Server\4.0\data\MongoDB2"
Step 2 Install three new service instances.
The best way to do this is to have three configuration files, one for each instance . The format of these files is very similar. Here is the congfig file for MongoDB0 repSetDb0.cfg .
# repSetDb0.cfg
storage:
dbPath: "C:/Program Files/MongoDB/Server/4.0/data/MongoDB0"
systemLog:
destination: file
path: "C:/Program Files/MongoDB/Server/4.0/log/MongoDB0.log"
logAppend: true
timeStampFormat: iso8601-utc
replication:
replSetName: "myReplSet"
net:
port: 27017
The config files are included in the sample code bundle, but, basically, you change the port, dbpath and logpath for each instance. Store the config files in the mongoDB\Server\4.0\bin directory and enter the following commands from the mongoDB\Server\4.0\bin directory command prompt.
mongod.exe --config "C:\Program Files\MongoDB\Server\4.0\bin\repSetDb0.cfg" --serviceName MongoDB0 --serviceDisplayName MongoDB0 --install
mongod.exe --config "C:\Program Files\MongoDB\Server\4.0\bin\repSetDb1.cfg" --serviceName MongoDB1 --serviceDisplayName MongoDB1 --install
mongod.exe --config "C:\Program Files\MongoDB\Server\4.0\bin\repSetDb2.cfg" --serviceName MongoDB2 --serviceDisplayName MongoDB2 --install
Check the log files to confirm all is well and enter the following commands to start the services.
net start MongoDB0
net start MongoDB1
net start MongoDB2
services.msc
In the services management window, right click on each of the three instances and change 'Startup type' to 'Automatic (DelayedStart)'.
Step 3 Configure the Replica Set.
To configure the Replica set you need to use the Mongo shell. Make sure you are in the MongoDB\Server\4.0\bin directory then copy and paste the following
mongo MongoDB0
use admin
config = { _id : "myReplSet",members : [ {_id : 0, host :"localhost:27017"},
{_id : 1, host : "localhost:27018"}, {_id : 2, host :"localhost:27019"}]}
rs.initiate(config)
exit
When the installation is complete, you can see the status of your replica set by entering rs.status()
in the mongo shell. To connect to the Replica set with the C# driver use this connection string.
const string connectionString =
"mongodb://localhost/?replicaSet=myReplSet&readPreference=primary";
Conclusion
There is much more to MongoDB than is detailed in this article but the hope is that there is enough information here for you to be able to begin exploring the capabilities of this open-source software. The database configurations given here are insecure and are only suitable for testing purposes. MongoDB has a lot of security features, if you are interested in implementing them, please see the sequel to this article, Using Encryption and Authentication to Secure MongoDB.