Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Removing documents from a collection in Mongodb

0.00/5 (No votes)
12 Apr 2015 1  
How to remove documents from a collection more efficiently

Introduction

This tip covers the basics of moving data out of the database, including the following:

1) remove

2) drop

For part #1:

To remove the documents from a collection:

> db.task1.remove({})

This will remove all the documents in the task1 collection, but please note that it doesn't actually remove the collection, all the meta information about it is still existed.

And usually remove all the documents in the collection is not often happened, most of time we perform to remove the documents with specific criteria:

> db.task1.remove({"user" : "Coldsky"})

Then only the documents which match the criteria will be removed.

And please pay attention to the remove method due to all the remove data is uncoverable.

Just like the insert method, let's also measure the remove speed in a function, see below:

> var removeTime = function() { 
    for (var i=0; i< 10; i++) {
          var tasks = new Array();
          for (var j=0; j< 100000;j++) {
              tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}
          }
          db.task2.insert(tasks);
      }
      var start = (new Date()).getTime();
      db.task2.remove({}); 
      db.task2.findOne(); 
      var end = (new Date()).getTime(); 
      var diff = end - start; 
      print("Reove 1M documents took " + diff + "ms"); }
> removeTime()
Reove 1M documents took 9763ms

First, we use the bulk insert method to add one million documents into task2 collection, then remove them, the remove rate is 100000 documents per second, maybe most of us are not satified with the remove speed, and seek for a way to speed up. Fortunately, mongodb also provide the drop method for the removing. Please refer to the part #2.

For part #2:

Let's also write a function to drop one million documents: 

> var dropTime = function() {
   for (var i=0; i< 10; i++) {
      var tasks = new Array();
      for (var j=0; j< 100000;j++) {
          tasks[j] = {"user" : "Coldsky", "finished" : i*100000 +j, "unfinished": 1000000 - i*100000 -j}      }
      db.task2.insert(tasks);
      }
   var start = (new Date()).getTime();
   db.task2.drop();
   var end = (new Date()).getTime();
   var diff = end - start;
   print("drop one million documents took " + diff + "ms")
 }
> dropTime()
drop one million documents took 1ms

We can see the drop rate is amazing, the performance improve a lot when compare with the remove, but it has a shorting: can't specify any criteria. And all the metadata of collection is also been removed, proved by below command:

> show collections

And we find that task2 collection is not belonged to current database any more.

 

This tip has reached the end. Thanks for reading and feel free to contact me if you have any questions.

By the way, discussion is welcomed.

 

 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here