25 Nov 2013, 14:45

Clownshoes: An ENTERPRISE GRADE etc. etc. DOCUMENT STORE

Share

Everyone should write a databse. It’s a core area of practical software engineering, it’s fun, and turns out it’s not that hard.

(The real reason, of course, is to wind up rolling in dough. It’s basically the same thought process behind learning to play the guitar.)

So I threw together Clownshoes, the ADVANCED ENTERPRISE GRADE NOSQL BIG DATA HIGH PERFORMANCE SCALABLE DOCUMENT STORE FOR BIG DATA (capitalization is emphatically part of the description). It’s not quite as awesome as some other NoSQL databases, like IMS, but it’s fun in its own way. The core storage engine bears a strong resemblance to a certain other prominent document database, to wit: mmaped linked-lists of “documents”, where “documents” are in this case arbitrary bytestreams, which you can easily use to represent anything from JSON to BSON to entire web pages.

It supports atomic inserts, deletes, and updates (“supports” might not be a strong enough word; all updates are atomic since there’s a per-database write lock). It scales to collections over 180TB in size, and working sets over 1TB. It’s wicked fast on the storage layer. How does it do all this?

Clownshoes, being HIGH PERFORMANCE, knows that in order to scale to BIG DATA you need to run on BEAR METAL. No network overhead for us! Instead it is embedded in your running Go application process. You can easily scale horizontally by buying a couple of LSI cards and plugging in a disk enclosure to the left or right of your application server, or inserting more RAM to either side of your CPU socket. If you find yourself bound on CPU throughput, you can scale horizontally as long as you built with some creative geometry which gives you room to grow (see the diagram below):

CPU HERE…….EMPTY SOCKET

EMPTY SOCKET…….CPU HERE

The other main perfomance trick is journaling. Journaling essentially converts random writes into linear writes by recording a running log, eg, “write N bytes at X location”, and ensures that those linear writes actually hit disk. That way if there’s a problem actually committing to your main database file, you can replay your journal against the last consistent snapshot and recover. Not a bad idea, but we need all the IOPs we can get - our journaling strategy is to not journal, and let the user to decide when they want to snapshot their database (Clownshoes believes in moving fast, and also in breaking things). In general, complete configurability about how much to care about data is the secret to our webscale sauce.

Now, you might say to yourself,

This is almost exactly a linked list of strings, plus some hashmaps for indexing, plus gob-based serialization, plus the OS’s paging algorithm to support data bigger than physical memory capacity. The only difference is that you’re managing the memory a bit more manually via mmap, but because Go’s GC is currently non-compacting, if you’re using the built-in structures and memory pressure is high enough the OS will swap out a lot of untouched data to disk anyway. The wheel has been reinvented here.

This is true. However:

  • There is an advantage to managing the mmap-ing, compaction, etc. yourself, rather than letting the OS manage the swapping. Go’s GC does have to traverse pointers on the heap when it GC’s, which can be avoided if you manage the pointers by embedding them all in a single array as Clownshoes does. Depending on how & where the “native” pointers are actually stored in memory, they might actually generate some swapping you don’t need as they bring in entire 4k pages, and traversal will of course add more time to your collections.
  • I said everyone should write a database, not use a database they wrote themselves. Your probably shouldn’t use Clownshoes for important things; I am personally using it to manage some metadata about which movies I have watched. I have probably spent more time writing Clownshoes (a few hours cumulatively) than I have watching movies over the past month or so, so this is not mission-critical data. In any case, if you want an embedded data store, what’s wrong with SQLite? Just use SQLite already.

I guess that last one isn’t really a “however” so much as a “yup”.