Introduction
This tip covers the basics of moving data into database, including the following:
1) Insert
2) Bulk Insert
For part #1:
To insert a document into a collection, we can use below command:
> db.tasks.insert({"user" : "Coldsky", "finished" : 1, "unfinished" : 10})
This will add an extra "_id" to the document and then store it in mongodb.
Please note that the mongodb will ensure the "_id" is unique, and indexed as well, this help improve the performance of query.
And I think most of us wonder the performance of the insert method, so let's insert one million documents into a collection, and pay attention to the time it consumes, I have created a function do that, see below:
> var insertT = function() {
var start = (new Date()).getTime();
for (var i=0; i< 1000000; i++) {
db.task1.insert({"user" : "Coldsky", "finished" : i, "unfinished" : 1000000 - i})
}
var end = (new Date()).getTime();
var diff = end -start;
print("Insert 1M documents took " + diff + "ms" )
}
> insertT()
Insert 1M documents took 409438ms
We can see that the insert rate is 2500 documents per second. And we find that we can't torelate the low performance. Fortunately, bulk insert can help in this situation, let's take a look in part #2.
For part #2:
What is bulk insert ? Bulk insert allows you to pass an array of documents to a collection, it reduces the io times when interact with mongodb server, so the performance is improved a lot when compare with the normal insert.
And I also have created a function for the performance testing, see below:
> var InsertTime = function() {
var start = (new Date()).getTime();
for (var i=0; i< 10; i++) {
var tasks = new Array();
for (var j=0; j< 100000;j++) {
tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}
}
db.task2.insert(tasks);
}
var end = (new Date()).getTime();
var diff = end - start;
print("Insert 1M documents took " + diff + "ms");
}
> InsertTime()
Insert 1M documents took 56691ms
We can see that we used 10 bulk insert to insert one million documents into a collection, and the insert rate is 17636 documents per second, the performance is 7 times of normal insert.
So when we do massive insert, we should consider bulk insert first, and I would like giving a generic method for the bulk insert:
> var InsertTime = function() {
var start = (new Date()).getTime();
for (var i=0; i<(n-n%m)/m; i++) {
var tasks = new Array();
for (var j=0; j<m;j++) {
tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}
}
db.taskMgr.insert(tasks);
}
var tasks = new Array();
for (var i=n-n%m; i<n; i++) {
tasks[i - (n - n%m)] = {"user" : "Coldsky", "finished" : i, "unfinished" : n-i}
}
db.taskMgr.insert(tasks);
var end = (new Date()).getTime();
var diff = end - start;
print("Insert 1M documents took " + diff + "ms");
}
And n is the number of total documents you want to insert into database while m is the number of documents in a single bulk insert.
Also need to note that the size of "m" documents shoudn't exceed 48M, or the mongodb driver will split up the bulk insert into multiple 48M bulk insert .
To check whether the sizeof "m" documents is exceed 48M, we can use the Object.bsonsize() method, just put the array which contains "m" documents as the parameter.
This tip has reached the end. Thanks for reading and feel free to contact me if you have any questions.