Introduction
In this post, we are going to discuss a solution for replacing a SQL Server based search with one based on MongoDB. We will build upon the Footloose Financial Services ASP.NET MVC application that was introduced in earlier posts: http://www.codeproject.com/Articles/770341/Embedded-Application-Identity-Part-Basic-Identit.
MongoDB is an implementation of an object-oriented document database which is a flavor of NoSQL databases. NoSQL is an alternative to the traditional relational DMBS that solves several limitations of relational databases although usually at the cost of normalization or referential integrity at the DBMS level. Document databases are a type of NoSQL databases that pair a key with a complex data structure known as a document. A document can contain one or more key-value pairs. NoSQL databases have simpler structures and do not have the overhead of enforcing referential integrity. They are more scalable than relational databases and provide superior performance for searching.
For the rest of the article, we will be referring to the Footloose Financial Services Visual Studio 2013 project at the following github address: https://github.com/pcarrasco23/FootlooseFinancialServices.
We will be using the code in the commit labeled "Modifications for MongoDB".
MongoDB Installation and Server Architecture
MongoDB should be installed on a server on a network that is accessible by the web server that hosts the ASP.NET application. On my personal installation I installed MongoDB on Ubuntu 14.0.4 LTS 64-bit running in a VMWarePlayer virtual that is on the same virtual network as that of the Windows domain controller and IIS servers.
The MongoDB online manual (http://docs.mongodb.org/manual) has pretty good step-by-step instructions for installing MongoDB on your computer.
Because this particular MongoDB instance is not on the same server as the ASP.NET web server, we will need to configure the instance to allow remote connections. By default, the MongoDB server process will only allow connections from the local loopback. On Linux, the MongoDB server configuration file is /etc/mongodb.conf. Open the file and comment the line that specifies the bind_ip
value. This value will filter by IP address the remote connections that MongoDB will permit.
#bind_ip = 127.0.0.1
#port = 27017
For now, by commenting out the bind_ip
value, we will permit all remote connections to the MongoDB server. In a subsequent article, I will discuss how to add security to our MongoDB instance including specifying the IP address of the ASP.NET web server in the bind_ip
value.
Data Architecture
In the Visual Studio project, we will be changing the back-end data source on the client search page from SQL-Server to MongoDB. However the very first step is deciding on the schema of the document in the MongoDB database. We want to improve performance of the search significantly so that even complex searches with large resultsets return data from the server with sub-second response times. In the SQL Server database, a client (person) has several attributes such as name and email address. This person can have one or more phone numbers, one or more addresses, and one or more accounts. All of this data exists across multiple tables. For the person search we are only concerned with the client's main phone number, address, and account number. Therefore to take advantage of MongoDB's strength (which is searching attributes within a flat document), we will compress each person's hierarchical data into a single record in the MongoDB document. In place of the hierarchical schema of the relational database, we will have a flat document as illustrated below.
This flatted document in a MongoDB database will provide improved search performance for our client search page. This will be done in the ETL process which will be discussed in more detail later.
MongoDB C# Driver and the Service Layer
The MongoDB C# driver can be downloaded from within Visual Studio using NuGet. With the C# driver, you can connect to a MongoDb instance and manipulate data within a document store using C# and LINQ.
The next step after deciding on the schema is to create a physical model based on the schema of the document. In my case, I created a PersonDocument
in PersonDocument.cs that I will use as a DTO (Data transfer object) between the service layer code and the MongoDB C# driver. The driver will serialize this object into the BSON format of the MongoDB document store.
public class PersonDocument
{
public int PersonID { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string EmailAddress { get; set; }
public string PhoneNumber { get; set; }
public string StreetAddress { get; set; }
public string City { get; set; }
public string County { get; set; }
public string State { get; set; }
public string Zip { get; set; }
}
Then I implemented the Unit of Work pattern for the MongoDB database and the Repository<T>
interface for MongoDB documents in FootlooseFSDocUnitOfWork.cs and DocumentRepository.cs respectively so that client code can perform CRUD operations on the Person
document store in the MongoDB database using the familiar Repository
interface.
The Search service method uses the IQueryable
interface for searching for Person
documents in the SearchPersonDocument
method of FootlooseFSService.cs.
IQueryable<PersonDocument> personsQueryable = unitOfWork.Persons.GetQueryable();
The ETL process uses the AddBatch
method of the DocumentRepository
to add records to the Person
document store.
var docUnitOfWork = new FootlooseFSDocUnitOfWork();
docUnitOfWork.Persons.AddBatch(personDocuments);
Going back to the unit of work class for the MongoDB database, there are a couple of application configuration keys that are necessary for any client (ASP.NET web and ETL process) that uses the MongoDB unit of work and DocumentRepository
. For the ASP.NET web application, they would be in the web.config and for the ETL stand-alone program they would be in App.config. The MongoDBConnectionString
key contains the MongoDB connection string which is essentially the IP address or DNS of the MongoDB host computer on the network. The MongoDBDatabaseName
is the name of the MongoDB database that contains one or more document stores.
<add key="MongoDBConectionString" value="mongodb://192.168.1.6" />
<add key="MongoDBDatabaseName" value="footloosefs"/>
The unit of work class for MongoDB has a private MongoDatabase
variable that contains the connection to the database and connection initialization code where we utilize the configuration values. The DocRepository<PersonDocument>
object has the code to perform CRUD operations on the person document store. In the Init
method, we indicate the fields of PersonDocument
that we want serialized to the person MongoDB document. By default, MongoDB will create an object ID field in each document to serve as the primary key unless we indicate otherwise. The MapIdProperty
field allows us to specify a field in the DTO as the primary key as I have done with the PersonID
field.
public class FootlooseFSDocUnitOfWork
{
private MongoDatabase _database;
protected DocRepository<PersonDocument> _persons;
public FootlooseFSDocUnitOfWork()
{
var connectionString = ConfigurationManager.AppSettings["MongoDBConectionString"];
var client = new MongoClient(connectionString);
var server = client.GetServer();
var databaseName = ConfigurationManager.AppSettings["MongoDBDatabaseName"];
_database = server.GetDatabase(databaseName);
}
public IRepository<PersonDocument> Persons
{
get
{
if (_persons == null)
_persons = new DocRepository<PersonDocument>(_database, "persons");
return _persons;
}
}
public static void Init()
{
BsonClassMap.RegisterClassMap<PersonDocument>(cm =>
{
cm.MapIdProperty(p => p.PersonID);
cm.MapProperty(p => p.FirstName);
cm.MapProperty(p => p.LastName);
cm.MapProperty(p => p.EmailAddress);
cm.MapProperty(p => p.PhoneNumber);
cm.MapProperty(p => p.StreetAddress);
cm.MapProperty(p => p.City);
cm.MapProperty(p => p.County);
cm.MapProperty(p => p.State);
cm.MapProperty(p => p.Zip);
});
}
}
ETL Process
Now that we have decided on the layout of the document and add a MongoDB interface to the service layer, we need to populate the MongoDB person document collection with the corresponding data from SQL Server. ETL stands for Extract-Transform-Load. The FootlooseFSDocDBETL
project contains code for the process that will extract the necessary data from the Person
and associated tables (PersonAccount
, PersonAddress
, Phone
, Address
, Account
, Login
) for each person (represented by a record in dbo.Person
) and transform the data into a single object and load the object into the Person
document store in the MongoDB data source.
Next Steps
After running the program in the FootlooseFSDocDBETL
project, we will have a fully loaded Person
document against which our ASP.NET application can run queries. If you have gone this far, you will note that the performance of the MongoDB queries are significantly faster than the SQL Server queries. With a Person
document store with 1 million records, searches, paging, and sorting perform within a second. Now imagine 100 or possibly thousands of users searching at the same time! MongoDB can be scaled horizontally to meet the needs of the user demand very easily.
The next step - which is not part of this article - would be to create a process that updates the Person
document store in MongoDB whenever a Person
or associated record is updated in SQL through the application. This would most likely involve a message with the update written to a queue (such as MSMQ) that is picked up by a Windows service polling the queue and updates the Person
document store.