Things they don’t tell you about MongoDB

MongoDB is by far the most popular NoSQL database in Brazil (at least based on the amount of blog posts and articles writen about it here that I read). It’s really an amazing solution but what really bothers me is the fact that very few people know about it’s limitations. So I see the same story repeating itself: people unhappy with it treating his limitations as if they were bugs.

This post is about some of it’s limitations that really caught me by surprise, so that if you are thinking in adopting it at least you’ll be warned about them and so avoid these headaches.

Hungry for bytes

This was my first surprise: MongoDB consumes too much disk space. This is related to the way it is coded to avoid issues with disk fragmentation by pre-allocating space in large files. Here is how it works: when you create a database will be created a file named [db name].0 and it will have (by default) 64 Mb of size. When more than half of this file is used, another one will be created named [db name].1 with 128 Mb. This will happen again and again, so that files with 256, 512, 1024 and finally 2048 Mb will be written on your disk. All subsequent files will have 2048 Mb size.

If storage space is a restriction to your project you MUST take this in consideration. There’s a commercial solution for this problem: it’s named TokuMX. I didn’t tried it, but if it really works, so the storage consumption will decrease 90%. Also the repairDatabase and compact commands can help you on the long run.

Data replication with the replica-set strategy is amazing, but have it’s limitations.

The replica-set strategy for data replication in MongoDB is amazing. Easy to configure and works really well. But if your cluster have more than 12 nodes you have a problem. Replica-set in MongoDB have a limit of 12 nodes. There’s an issue to eliminate this restriction, so you can track to see if this problem will soon be only a sad memory.

Master-slave replication will not ensure to you high availability

Despite being considered deprecated, there’s another replication strategy that MongoDB implements which is the master slave. It solves the problem of the 12 nodes limitation but brings you another one: if you need to change the master node of your cluster, you must do it manually. Doubt me? Here is the link.

Avoid the 32 bit version

The 32 bit version of MongoDB is also considered deprecated because it can only handle 2 Gb of data. Remember the first limitation on this post? Very quickly you’ll face it with this version. Here is a blog post from MongoDB about this limitation.

Consulting prices are VERY expensive (at least for brazilian developers and companies)

I don’t know how this is considered outside Brazil, but at least here the consulting prices of MongoDB are infeasible. For the “Lightning Consult” plan it costs US$ 450,00 per hour, and you have to buy at least 2, so it cost at least US$ 900,00 to any company. We are used to contract companies like RedHat and Oracle for much better prices.

Bad administration tools

This is still a huge problem for beginners. The administration tools are in terrible shape. The best one I know is called RoboMongo, which is really handy for those who are taking the first steps with the tool. Unfortunetely it’s an exception.

Know the oficial limitations

It amazes me that so few people search on the limitations of the tools they are wanting to adopt. Fortunetely the staff of MongoDB have published a post with all those limitations so that you can know them in advance and avoid many unfortunate surprises on your adoption path.  Hope that you are not caught by surprise by the limitations exposed on this post. :)

31 thoughts on “Things they don’t tell you about MongoDB

  1. Why doesn’t the Brazilian government give up on it’s ridiculously regulations and it’s high import taxes. Companies are not going to relocate to Brazil ! It’s only hurting Brazilians.

    [Reply]

    admin Reply:

    Totally agree with you. And I must say that 99,9999% of brazilians also agree with you.
    Our greatest problem is the incompetence and the thievery of our politicians.

    Maybe things will start to change now that there are happening many protests here against corruption, bad taxes policies and many other problems.

    [Reply]

  2. At scale you generally use MongoDBs sharding rather than replica sets, which is highly available.

    Also, there are –no-prealloc, and –small-files options which make the on disk database size much much smaller, but shouldn’t be used in production due to their performance impact.

    Lastly, if storage space is a concern, your dataset isn’t large enough to be a good fit for MongoDB. It has great use cases, but obviously like any tool, needs to be applied correctly.

    [Reply]

    admin Reply:

    Hi,

    yes, sharding is a great option when you can distribute all your data.
    But there are many cases in which what you really need is just replication: full replication in each node.
    For example: when you need high availability and can’t afford to not get access to part of your data if you don’t have communication with one of the nodes.

    By the way: MongoDB is not just for large scale use scenarios, but to smaller too: when you need a documental approach to your data rather than a relational one.

    [Reply]

  3. MongoDB’s creators have a monitoring service which isn’t bad at all called MMS. It has support for replica sets/ sharding as well, give it a try.

    [Reply]

    admin Reply:

    I’ll take a look. Thanks!

    [Reply]

  4. I work at Tokutek, on TokuMX. Thanks for the reference.

    I just wanted to point out that TokuMX is open source, and there is a free community edition one can use, with all the compression benefits. We have an enterprise version of the product for which we provide support. So, we are not only commercial :).

    [Reply]

    admin Reply:

    Hi Zardosht,

    this is great to know! I’ve found your website and read about your product, which really caught my attention, but didn’t see that it was open source and that I could use it too.

    Thanks a lot for this information. :)

    [Reply]

  5. “they don’t tell you ”

    C’mon man, you are trying to sell it as some kind of conspiracy theory, while all of this is pretty well documented in the mongodb documentation. Actually this is cool: they have documentation :-D

    [Reply]

    admin Reply:

    Seeing the title of this post after your comment I must admit that you have some reason, but that REALLY wasn’t the intention.
    Accidents that happen when english is not your primary language I guess. :)

    [Reply]

  6. Hello everyone,
    I was starting with MongoDB and after reading this articule I am thinking that may be I made a mistake, can anyone could tell me if the The Apache Cassandra could be a better option?

    [Reply]

  7. Caution on the repairDatabase. While it does recover unused space and returns that space back to the file system it does require 2X the size of the database available in order to run. Compact does not return any file space but must be run on a collection by collection basis. Just regularly schedule your replica sets to run a compact on all the collections. I don’t even take the secondary offline to do this.

    Performance is another important reason to regularly compact. Since the DB uses memory mapped files then as documents are deleted the memory can be fragmented by reading in file ranges with no data. Regularly compacting collections that have high update or delete rates (such as a collection with a TTL index) will not only monotonically increase disk space use but also tend to impact the amount of memory needed for your working set.

    [Reply]

  8. I really got bitten 2 years ago as I used the 32 bit version on older machine. From one second to the next the database was inaccessible and my data were gone. I should have read the manual.

    [Reply]

    admin Reply:

    Well: at least they document their product limitations. :)

    [Reply]

  9. Show me any other 32-bit database you use in production.

    Why should this be a limitation specifiic to MongoDB?

    [Reply]

    admin Reply:

    Hi Tobias,

    this is not a limitation specific to MongoDB: just one that many people forget when getting started with it (many of them by pure accident)

    [Reply]

  10. That’s the thing about NoSQL persistence solutions…It’s your responsibility to understand the trade-offs between the different families of NoSQL database and select based on your requirements. A great book for this is Martin Fowlers NoSQL Distilled.

    MongoDB is popular because it’s easy to use (initially), provides limited adhoc query support etc. but you must understand your data access patterns and select the most appropriate soln. based on that understanding.

    [Reply]

    admin Reply:

    Yeap: you’re absolutelly right.

    [Reply]

  11. Hi,

    I am using a php framwork called Symfony2. And I have mongodb working on it using doctrine ODM. Is it possible to replace mongodb with TokuXM?

    Also, is robomongo compatible with TokuXM?

    [Reply]

    admin Reply:

    Completely compatible

    [Reply]

  12. Regarding the node limit of 12, the latest dev branch has increased the total limit to 50 nodes (targeted for 2.8, released in 2.7.8 in the dev branch) as part of SERVER-15060. Assuming no major issues with the testing, that limitation at least will be gone.

    [Reply]

    admin Reply:

    That’s great news. Thanks for sharing!

    [Reply]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>