Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Paging in MongoDB – How to Actually Avoid Poor Performance?

0.00/5 (No votes)
22 Sep 2017 1  
Few insights on how to better paginate results in MongoDB, and to achieve excellent performance

What is the best way (performance wise) to paginate results in MongoDB? Especially when you also want to get the total number of results?

*You can find the solution, together with the data files at https://github.com/fpetru/WebApiQueryMongoDb.

Update 23 Sep 2017

Solution converted to .NET Core 2.0, using Visual Studio 2017.

Where to Start?

For answering these questions, let's start from the datasets defined in my earlier article, How to search good places to travel (MongoDb, LINQ & .NET Core). That article was a quick introduction on how to load big chunks of data and then retrieve values using WebApi and LINQ. Here, I will start from that project, extending it with more details related to paging the query results.

Topics Covered

  • Paging query results with skip and limit
  • Paging query results using last position
  • MongoDb BSonId
  • Paging using MongoDb .NET Driver

To Install

Here are all the things needed to be installed:

See the Results

Here are a few steps to have the solution ready, and see the results immediately:

  1. Download the project.
  2. Run import.bat file from Data folder (available in GitHub). This will create the database (TravelDb), and it will fill two collections.
  3. Open solution with Visual Studio 2017 and check the connection settings appsettings.json.
  4. Run the solution.

If you have any issues on installing MongoDb, setting up the databases, or project structure, please review my earlier article.

Paging Results using cursor.skip() and cursor.limit()

If you do a Google search, this is usually the first presented method to make pagination of the query results in MongoDB. It is a straightforward method, but also expensive in terms of performance. It requires the server to walk from the beginning of the collection or index each time, to get the offset or skip position, before actually beginning to return the result you need.

For example:

db.Cities.find().skip(5200).limit(10);

The server will need to parse the first 5200 items in WikiVoyage collection, and then return the next 10. This doesn't scale well due to skip() command.

Paging using the Last Position

To be faster, we should search and retrieve the details starting from the last retrieved item. As an example, let's assume we need to find all the cities in France, with a population greater than 15.000 inhabitants.

Following this method, the initial request to retrieve first 200 records would be:

LINQ Format

We first retrieve AsQueryable interface:

var _client = new MongoClient(settings.Value.ConnectionString);
var _database = _client.GetDatabase(settings.Value.Database);
var _context = _database.GetCollection<City>("Cities").AsQueryable<City>();

and then we run the actual query:

query = _context.CitiesLinq
                .Where(x => x.CountryCode == "FR"
                            && x.Population >= 15000)
                .OrderByDescending(x => x.Id)
                .Take(200);				
List<City> cityList = await query.ToListAsync();

The subsequent queries would start from the last retrieved Id. Ordering by BSonId, we retrieve the most recent records created on the server before the last Id.

query = _context.CitiesLinq
                .Where(x => x.CountryCode == "FR"
                         && x.Population >= 15000
                         && x.Id < ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"))
                .OrderByDescending(x => x.Id)
                .Take(200);
List<City> cityList = await query.ToListAsync();

Mongo's ID

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. It is immutable, and may be of any type other than an array (by default, a MongoDb ObjectId, a natural unique identifier, if available; or just an auto-incrementing number).

Using default ObjectId type,

[BsonId]
public ObjectId Id { get; set; }

it brings more advantages, such as having available the date and timestamp when the record has been added to the database. Furthermore, sorting by ObjectId will return last added entities to the MongoDb collection.

cityList.Select(x => new
					{
						BSonId = x.Id.ToString(), // unique hexadecimal number
						Timestamp = x.Id.Timestamp,
						ServerUpdatedOn = x.Id.CreationTime
						/* include other members */
					});

Returning Fewer Elements

While the class City has 20 members, it would be relevant to return just the properties we actually need. This would reduce the amount of data transferred from the server.

cityList.Select(x => new
					{
						BSonId = x.Id.ToString(), // unique hexadecimal number
						Name,
						AlternateNames,
						Latitude,
						Longitude,
						Timezone,
						ServerUpdatedOn = x.Id.CreationTime
					});

Indexes in MongoDB – Few Details

We would rarely need to get data, in exact order of the MongoDB internal ids (_id)I, without any filters (just using find()). In most of the cases, we would retrieve data using filters, and then sorting the results. For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

How Do We Add An Index?

Using RoboMongo, we create the index directly on the server:

db.Cities.createIndex( { CountryCode: 1, Population: 1 } );

How Do We Check Our Query is Actual Using the Index?

Running a query using explain command would return details on index usage:

db.Cities.find({ CountryCode: "FR", Population : { $gt: 15000 }}).explain();

Is There a Way to See the Actual Query Behind the MongoDB LINQ Statement?

The only way I could find this was via GetExecutionModel() method. This provides detailed information, but inside elements are not easily accessible.

query.GetExecutionModel();

Using the debugger, we could see the elements as well as the full actual query sent to MongoDb.


Then, we could get the query and execute it against MongoDb using RoboMongo tool, and see the details of the execution plan.

Non LINQ Way – Using MongoDb .NET Driver

LINQ is slightly slower than using the direct API, as it adds abstraction to the query. This abstraction would allow you to easily change MongoDB for another data source (MS SQL Server / Oracle / MySQL, etc.) without many code changes, and this abstraction brings a slight performance hit.

Even so, newer version of the MongoDB .NET Driver has simplified a lot the way we filter and run queries. The fluent interface (IFindFluent) brings very much with LINQ way of writing code.

var filterBuilder = Builders<City>.Filter;
var filter = filterBuilder.Eq(x => x.CountryCode, "FR")
				& filterBuilder.Gte(x => x.Population, 10000)
				& filterBuilder.Lte(x => x.Id, ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"));

return await _context.Cities.Find(filter)
							.SortByDescending(p => p.Id)
							.Limit(200)
							.ToListAsync();

where _context is defined as:

var _context = _database.GetCollection<City>("Cities");	

Implementation

Wrapping up, here is my proposal for the paginate function. OR predicates are supported by MongoDb, but it is usually hard for the query optimizer to predict the disjoint sets from the two sides of the OR. Trying to avoid them whenever possible is a known trick for query optimization.

// building where clause
//
private Expression<Func<City, bool>> GetConditions(string countryCode, 
												   string lastBsonId, 
												   int minPopulation = 0)
{
    Expression<Func<City, bool>> conditions 
						= (x => x.CountryCode == countryCode
                               && x.Population >= minPopulation);

    ObjectId id;
    if (string.IsNullOrEmpty(lastBsonId) && ObjectId.TryParse(lastBsonId, out id))
    {
        conditions = (x => x.CountryCode == countryCode
                        && x.Population >= minPopulation
                        && x.Id < id);
    }

    return conditions;
}

public async Task<object> GetCitiesLinq(string countryCode, 
										string lastBsonId, 
										int minPopulation = 0)
{
    try
    {
        var items = await _context.CitiesLinq
                            .Where(GetConditions(countryCode, lastBsonId, minPopulation))
                            .OrderByDescending(x => x.Id)
                            .Take(200)
                            .ToListAsync();

        // select just few elements
        var returnItems = items.Select(x => new
                            {
                                BsonId = x.Id.ToString(),
                                Timestamp = x.Id.Timestamp,
                                ServerUpdatedOn = x.Id.CreationTime,
                                x.Name,
                                x.CountryCode,
                                x.Population
                            });

        int countItems = await _context.CitiesLinq
                            .Where(GetConditions(countryCode, "", minPopulation))
                            .CountAsync();

        return new
            {
                count = countItems,
                items = returnItems
            };
    }
    catch (Exception ex)
    {
        // log or manage the exception
        throw ex;
    }
}

and in the controller:

[NoCache]
[HttpGet]
public async Task<object> Get(string countryCode, int? population, string lastId)
{
	return await _travelItemRepository
					.GetCitiesLinq(countryCode, lastId, population ?? 0);
}

The initial request (sample):

http://localhost:61612/api/city?countryCode=FR&population=10000

followed by other requests where we specify the last retrieved Id:

http://localhost:61612/api/city?countryCode=FR&population=10000&lastId=58fc8ae631a8a6f8d00101f9

Here is just a sample:

At the End

I hope this helps, and please let me know if you need this to be extended or if you have any questions.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here