Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Adding new documents to a collection in Mongodb

0.00/5 (No votes)
12 Apr 2015 1  
How to add new documents to a collection more efficiently

Introduction

This tip covers the basics of moving data into database, including the following:

1) Insert

2) Bulk Insert

For part #1:

To insert a document into a collection, we can use below command:

> db.tasks.insert({"user" : "Coldsky", "finished" :  1, "unfinished" : 10})

This will add an extra "_id" to the document and then store it in mongodb. 

Please note that the mongodb will ensure the "_id" is unique, and indexed as well, this help improve the performance of query.

And I think most of us wonder the performance of the insert method, so let's insert one million documents into a collection, and pay attention to the time it consumes, I have created a function do that, see below:

> var insertT = function() {
    var start = (new Date()).getTime();
    for (var i=0; i< 1000000; i++) {
        db.task1.insert({"user" : "Coldsky", "finished" : i, "unfinished" : 1000000 - i})
    }
    var end  = (new Date()).getTime();
    var diff = end -start;
    print("Insert 1M documents took " + diff + "ms" )
    }
> insertT()
Insert 1M documents took 409438ms

We can see that the insert rate is 2500 documents per second. And we find that we can't torelate the low performance. Fortunately, bulk insert can help in this situation, let's take a look in part #2.

For part #2:

What is bulk insert ? Bulk insert allows you to pass an array of documents to a collection, it reduces the io times when interact with mongodb server, so the performance is improved a lot when compare with the normal insert.

And I also have created a function for the performance testing, see below:

> var InsertTime = function() { 
    var start = (new Date()).getTime();
    for (var i=0; i< 10; i++) { 
        var tasks = new Array(); 
        for (var j=0; j< 100000;j++) { 
            tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}         
        } 
        db.task2.insert(tasks); 
    } 
    var end = (new Date()).getTime(); 
    var diff = end - start; 
    print("Insert 1M documents took " + diff + "ms"); 
  }

> InsertTime()
Insert 1M documents took 56691ms

We can see that we used 10 bulk insert  to insert one million documents into a collection, and the insert rate is 17636 documents per second, the performance is 7 times of normal insert.

So when we do massive insert, we should consider bulk insert first, and I would like giving a generic method for the bulk insert:

> var InsertTime = function() { 
    var start = (new Date()).getTime();
    for (var i=0; i<(n-n%m)/m; i++) { 
        var tasks = new Array(); 
        for (var j=0; j<m;j++) { 
            tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}         
        } 
        db.taskMgr.insert(tasks); 
    }
    
    var tasks = new Array();
    for (var i=n-n%m; i<n; i++) {
        tasks[i - (n - n%m)] = {"user" : "Coldsky", "finished" : i, "unfinished" : n-i}
    }
    db.taskMgr.insert(tasks);
    var end = (new Date()).getTime(); 
    var diff = end - start; 
    print("Insert 1M documents took " + diff + "ms"); 
​  }

And n is the number of total documents you want to insert into database while m is the number of documents in a single bulk insert.

Also need to note that the size of "m" documents shoudn't exceed 48M, or the mongodb driver will split up the bulk insert into multiple 48M bulk insert .

To check whether the sizeof "m" documents is exceed 48M,  we can use the Object.bsonsize() method, just put the array which contains "m" documents as the parameter. 

This tip has reached the end. Thanks for reading and feel free to contact me if you have any questions.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here