What is the best way (performance wise) to paginate results in MongoDB? Especially when you also want to get the total number of results?
*You can find the solution, together with the data files at https://github.com/fpetru/WebApiQueryMongoDb.
Update 23 Sep 2017
Solution converted to .NET Core 2.0, using Visual Studio 2017.
Where to Start?
For answering these questions, let's start from the dataset
s defined in my earlier article, How to search good places to travel (MongoDb, LINQ & .NET Core). That article was a quick introduction on how to load big chunks of data and then retrieve values using WebApi and LINQ. Here, I will start from that project, extending it with more details related to paging the query results.
Topics Covered
- Paging query results with skip and limit
- Paging query results using last position
- MongoDb BSonId
- Paging using MongoDb .NET Driver
To Install
Here are all the things needed to be installed:
See the Results
Here are a few steps to have the solution ready, and see the results immediately:
- Download the project.
- Run import.bat file from Data folder (available in GitHub). This will create the database (
TravelDb
), and it will fill two collections.
- Open solution with Visual Studio 2017 and check the connection settings appsettings.json.
- Run the solution.
If you have any issues on installing MongoDb, setting up the databases, or project structure, please review my earlier article.
Paging Results using cursor.skip() and cursor.limit()
If you do a Google search, this is usually the first presented method to make pagination of the query results in MongoDB. It is a straightforward method, but also expensive in terms of performance. It requires the server to walk from the beginning of the collection or index each time, to get the offset or skip position, before actually beginning to return the result you need.
For example:
db.Cities.find().skip(5200).limit(10);
The server will need to parse the first 5200 items in WikiVoyage collection, and then return the next 10. This doesn't scale well due to skip()
command.
Paging using the Last Position
To be faster, we should search and retrieve the details starting from the last retrieved item. As an example, let's assume we need to find all the cities in France, with a population greater than 15.000 inhabitants.
Following this method, the initial request to retrieve first 200 records would be:
LINQ Format
We first retrieve AsQueryable interface
:
var _client = new MongoClient(settings.Value.ConnectionString);
var _database = _client.GetDatabase(settings.Value.Database);
var _context = _database.GetCollection<City>("Cities").AsQueryable<City>();
and then we run the actual query:
query = _context.CitiesLinq
.Where(x => x.CountryCode == "FR"
&& x.Population >= 15000)
.OrderByDescending(x => x.Id)
.Take(200);
List<City> cityList = await query.ToListAsync();
The subsequent queries would start from the last retrieved Id
. Ordering by BSonId
, we retrieve the most recent records created on the server before the last Id.
query = _context.CitiesLinq
.Where(x => x.CountryCode == "FR"
&& x.Population >= 15000
&& x.Id < ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"))
.OrderByDescending(x => x.Id)
.Take(200);
List<City> cityList = await query.ToListAsync();
Mongo's ID
In MongoDB, each document stored in a collection requires a unique _id
field that acts as a primary key. It is immutable, and may be of any type other than an array (by default, a MongoDb ObjectId
, a natural unique identifier, if available; or just an auto-incrementing number).
Using default ObjectId
type,
[BsonId]
public ObjectId Id { get; set; }
it brings more advantages, such as having available the date
and timestamp
when the record has been added to the database. Furthermore, sorting
by ObjectId
will return last added entities to the MongoDb collection.
cityList.Select(x => new
{
BSonId = x.Id.ToString(),
Timestamp = x.Id.Timestamp,
ServerUpdatedOn = x.Id.CreationTime
});
Returning Fewer Elements
While the class City
has 20 members, it would be relevant to return just the properties we actually need. This would reduce the amount of data transferred from the server.
cityList.Select(x => new
{
BSonId = x.Id.ToString(),
Name,
AlternateNames,
Latitude,
Longitude,
Timezone,
ServerUpdatedOn = x.Id.CreationTime
});
Indexes in MongoDB – Few Details
We would rarely need to get data, in exact order of the MongoDB internal ids (_id)I
, without any filters (just using find()
). In most of the cases, we would retrieve data using filters, and then sorting the results. For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.
How Do We Add An Index?
Using RoboMongo, we create the index directly on the server:
db.Cities.createIndex( { CountryCode: 1, Population: 1 } );
How Do We Check Our Query is Actual Using the Index?
Running a query using explain
command would return details on index usage:
db.Cities.find({ CountryCode: "FR", Population : { $gt: 15000 }}).explain();
Is There a Way to See the Actual Query Behind the MongoDB LINQ Statement?
The only way I could find this was via GetExecutionModel()
method. This provides detailed information, but inside elements are not easily accessible.
query.GetExecutionModel();
Using the debugger, we could see the elements as well as the full actual query sent to MongoDb.
Then, we could get the query and execute it against MongoDb using RoboMongo tool, and see the details of the execution plan.
Non LINQ Way – Using MongoDb .NET Driver
LINQ is slightly slower than using the direct API, as it adds abstraction to the query. This abstraction would allow you to easily change MongoDB for another data source (MS SQL Server / Oracle / MySQL, etc.) without many code changes, and this abstraction brings a slight performance hit.
Even so, newer version of the MongoDB .NET Driver has simplified a lot the way we filter and run queries. The fluent interface (IFindFluent
) brings very much with LINQ way of writing code.
var filterBuilder = Builders<City>.Filter;
var filter = filterBuilder.Eq(x => x.CountryCode, "FR")
& filterBuilder.Gte(x => x.Population, 10000)
& filterBuilder.Lte(x => x.Id, ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"));
return await _context.Cities.Find(filter)
.SortByDescending(p => p.Id)
.Limit(200)
.ToListAsync();
where _context
is defined as:
var _context = _database.GetCollection<City>("Cities");
Implementation
Wrapping up, here is my proposal for the paginate
function. OR
predicates are supported by MongoDb, but it is usually hard for the query optimizer to predict the disjoint sets from the two sides of the OR
. Trying to avoid them whenever possible is a known trick for query optimization.
private Expression<Func<City, bool>> GetConditions(string countryCode,
string lastBsonId,
int minPopulation = 0)
{
Expression<Func<City, bool>> conditions
= (x => x.CountryCode == countryCode
&& x.Population >= minPopulation);
ObjectId id;
if (string.IsNullOrEmpty(lastBsonId) && ObjectId.TryParse(lastBsonId, out id))
{
conditions = (x => x.CountryCode == countryCode
&& x.Population >= minPopulation
&& x.Id < id);
}
return conditions;
}
public async Task<object> GetCitiesLinq(string countryCode,
string lastBsonId,
int minPopulation = 0)
{
try
{
var items = await _context.CitiesLinq
.Where(GetConditions(countryCode, lastBsonId, minPopulation))
.OrderByDescending(x => x.Id)
.Take(200)
.ToListAsync();
var returnItems = items.Select(x => new
{
BsonId = x.Id.ToString(),
Timestamp = x.Id.Timestamp,
ServerUpdatedOn = x.Id.CreationTime,
x.Name,
x.CountryCode,
x.Population
});
int countItems = await _context.CitiesLinq
.Where(GetConditions(countryCode, "", minPopulation))
.CountAsync();
return new
{
count = countItems,
items = returnItems
};
}
catch (Exception ex)
{
throw ex;
}
}
and in the controller:
[NoCache]
[HttpGet]
public async Task<object> Get(string countryCode, int? population, string lastId)
{
return await _travelItemRepository
.GetCitiesLinq(countryCode, lastId, population ?? 0);
}
The initial request (sample):
http://localhost:61612/api/city?countryCode=FR&population=10000
followed by other requests where we specify the last retrieved Id:
http://localhost:61612/api/city?countryCode=FR&population=10000&lastId=58fc8ae631a8a6f8d00101f9
Here is just a sample:
At the End
I hope this helps, and please let me know if you need this to be extended or if you have any questions.