Introduction
This is a small article about an issue I recently had while trying to save some big documents represented as .NET objects in MongoDB using the MongoDB .Net driver.
While saving a “relatively” big document, I received the following exception:
System.IO.FileFormatException: Size 32325140 is larger than MaxDocumentSize 16777216.
at MongoDB.Bson.IO.BsonBinaryWriter.BackpatchSize() in
c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 697
at MongoDB.Bson.IO.BsonBinaryWriter.WriteEndArray() in
c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 294
at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`1.Serialize
(BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options)
in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\EnumerableSerializerBase.cs:line 408
at MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember
(BsonWriter bsonWriter, Object obj, BsonMemberMap memberMap) in
c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 684
at MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(BsonWriter bsonWriter,
Type nominalType, Object value, IBsonSerializationOptions options) in
c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 432
at MongoDB.Driver.Internal.MongoInsertMessage.AddDocument(BsonBuffer buffer,
Type nominalType, Object document) in
c:\projects\mongo-csharp-driver\MongoDB.Driver\Communication\Messages\MongoInsertMessage.cs:line 53
at MongoDB.Driver.Operations.InsertOperation.Execute(MongoConnection connection)
in c:\projects\mongo-csharp-driver\MongoDB.Driver\Operations\InsertOperation.cs:line 97
at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents,
MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1149
at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document,
MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1004
at MongoDB.Driver.MongoCollection.Save(Type nominalType, Object document,
MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1426
Well the message is clear: seems like I’ve exceeded the MongoDB max document size threshold which is 16MB, fair enough this is quite a sane design decision.
First, I’ll explain why I had this issue, then how I solved it.
Causes and Consequences
At first, I was quite surprised because the same set of objects represented as a CSV document was only a 6MB file.
But rethinking about the data, I remembered that this data-set is mostly a sparse matrix because a lot of properties are null
.
With the CSV format for each null
property, you only pay for a semi-colon, quite cheap even if you have hundreds of thousands of them.
But with an object-oriented representation like .NET objects or BSON documents, this is another story: for each null
property, the cost is far higher because you still store the name of the property and the “null” symbol!
And when you have dozens of properties (and yes, I have good reasons to have that many properties in a single object ), the overhead can be huge and represent most of the total size.
So you end up with documents that look something like:
{
a: "Some data",
b: null,
c: null,
d: "Some other data",
e: null,
f: null,
g: null,
...
z: "Last data"
}
Much of the document is filled with useless markers increasing its size for no additional information.
And this is not really flattering for BSON: my BSON document was 6 times bigger than the CSV document!
The Solution
Fortunately, the guys behind the MongoDB .NET driver are aware of this kind of issue, and they have taken it into account when designing the driver, allowing you to customize the way the BSON documents are generated.
You have at least 2 solutions:
- mark properties that should be ignored if
null
- register a global policy for the whole app-domain
If you want to mark properties individually, you can use the BsonIgnoreIfNull attribute:
class Data
{
[BsonIgnoreIfNull]
public string A { get; set; }
[BsonIgnoreIfNull]
public string B { get; set; }
[BsonIgnoreIfNull]
public string C { get; set; }
}
The good thing is that this is quite explicit.
But it can add a lot of code if like me, you have dozens of properties to mark.
Moreover, it is quite obtrusive and I don’t like to pollute my business entities with technical attributes, though I do it if there is no simpler solution: again pragmatism should always prevail over dogmatism, though some dogmatic geeks prefer duplicating code and add mappings to clearly isolate business entities. (I’m a recovering dogmatic .)
For my current issue, I’ve chosen the other way by registering a global policy:
ConventionPack pack = new ConventionPack();
pack.Add(new IgnoreIfNullConvention(true));
ConventionRegistry.Register("Ignore null properties of data", pack, type => type == typeof(Data));
The last predicate ensures the policy only applies to my “Data
” class.
I’ve put this code in the static constructor of the type that is the entry point to the MongoDB database.
So if I have no need for MongoDB, the type won’t be loaded by the CLR and this code won’t be executed.
You could also put this code in the Main of your application, but if you have more than one application that uses your MongoDB layer, you might need to duplicate code, so prefer a static
constructor or any other “Init
” method.
Conclusion
After applying this patch, I was able to save my documents, and to have an idea of how much space was saved, I’ve checked the size of the newly saved document in the Mongo Shell using the Object.bsonsize() method:
> Object.bsonsize(db.data.find()[0])
7161729
Compared to the original BSON document that included all the properties, this is far better, 7MB instead of 32MB, more than 4 times smaller.
Of course, there is still an overhead compared to CSV because you need to store the fields names when the values are not null
, but it’s limited to “only” 15%.
It’s still a big document, but one that fits into the MongoDB database, and this is all that matters.
Hopefully, this article will help somebody with the same issue.
If you catch any typo or mistake or have additional questions, feel free to leave a comment.