18 Oct 2013, 19:57

The genius and folly of MongoDB


MongoDB is easy to make fun of. The global write lock (now just a database-level write lock, woo). The non-durable un-verifiable writes. The posts about how to scale to Big Data, where Big Data is 100gb.

It makes more sense when you look at how the underlying storage layer is implemented. Basically, MongoDB consists of a collection of mmap’d linked lists of BSON documents, with dead simple B-tree indexing, and basic journaling as the storage durability mechanism (issues with what the driver considers a “durable write”, before data necessarily hits the storage layer, is something others have dealt with in depth). Eventually writes get fsync’d to disk by the OS, and reads result in the page with the data being loaded into memory by the OS.

All the speed that was initially touted as the killer advantage, is a result of using the page cache. When you realize “it’s just mmap”, all the BS about optimizing your working set to fit within RAM, the cratering of performance once you hit disk, the fragmentation if you routinely delete or grow records, etc., make perfect sense. The OS doesn’t know you’re running a database; it just knows you want to mmap some stuff and gives it its best shot. Fortunately, the algorithm was written by some very smart people, so it tends to work pretty well up front (as long as you’re always hitting the page cache) - but when the OS schedules writes it doesn’t know about your storage layout, or even the difference between your indexes and your data. It certainly can’t infer what data to keep in cache, or preload, because it doesn’t know what or where your data is - it’s just some bytes somewhere you wanted for some reason.

But actually, that’s the Tao-like genius of MongoDB - having absolutely nothing new. Most databases are built with some killer idea: the consistency protocol for Cassandra, the crazy data structures of Redis, or the data-processing abilities of Hadoop. MongoDB has mmap, and by “has”, I mean “uses” (but hey, possession is nine tenths of the law). Not having to design your own caching algorithms or write strategies, and using the simplest possible implementations of everything else, lets you get to market quickly and focus on marketing your benchmarks, consulting to bring your customers up to Web Scale, responding to haters, or learning about concurrency. You can get a fair amount of traction before your customers realize there’s very little “there” there, and by that time you might have either cashed out or written an actual database (they’re starting down that road with their page-fault prediction algos, and good for them). In any event, your customers are more or less locked in, as they’ve jumped through so many hoops to accomodate your design decisions (“oh, long field names take up more space in every record? I guess we’ll just use the first N unicode characters and map on the application layer”). It’s no coincidence that you’re being compared to Oracle and IBM.

Like I said, MongoDB is easy to make fun of.

There’s exactly one situation where you should look at MongoDB. Focusing on the storage engine and ignoring all the issues with their broader durability strategy, the killer application is something like user data for an online game: a situation where you have a fairly consistent working set for a given chunk of time, it’s small relative to the whole database, reads and writes are in the same working set, you have lots of reads relative to writes, clients do a lot of the computation for you, you’d like schema flexiblity… You could jam that into a relational model and use something like hstore or JSON columns to fill in the gaps, or store documents as blobs / text in something like HBase or Cassandra, but you probably won’t have that bad of a time with MongoDB (until you get a surge of traffic; it doesn’t exactly degrade gracefully).

But in that case, it also wouldn’t be crazy to pull a Viaweb and store it on the file system (ZFS has a heckuva caching strategy, and you know when it accepts a write). It’s obvious and uninformative to say “it depends” regardless of the question; of course some things are generally better than others.

Being able to walk with them doesn’t change the fact that as of right now, MongoDB is clown shoes.