NoSQL Ecosystem

Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years.  Collectively, these alternatives have become known as “NoSQL databases.”

The fundamental problem is that relational databases cannot handle many modern workloads.  There are three specific problem areas: scaling out to data sets like Digg’s (3 TB for green badges) or Facebook’s (50 TB for inbox search) or eBay’s (2 PB overall), per-server performance, and rigid schema design.

Businesses, including The Rackspace Cloud, need to find new ways to store and scale large amounts of data. I recently wrote a post on  Cassandra, a non-relational database we have committed resources to. There are other non-relational databases being worked on and collectively, we call this the “NoSQL movement.”

The “NoSQL” term was actually coined by a fellow Racker, Eric Evans when Johan Oskarsson of wanted to organize an event to discuss open source distributed databases. The name and concept both caught on.

Some people object to the NoSQL term because it sounds like we’re defining ourselves based on what we aren’t doing rather than what we are. That’s true, to a degree, but the term is still valuable because when a relational database is the only tool you know, every problem looks like a thumb.  NoSQL is making people aware that there are other options out there. But we’re not anti-relational-database for when that really is the best tool for the job; it’s “Not Only SQL,” rather than “No SQL at all.”

One real concern with the NoSQL name is that it’s such a big tent that there is room for very different designs.  If this is not made clear when discussing the various products, it results in confusion.  So I’d like to suggest three axes along which to think about the many database options: scalability, data and query model, and persistence design.

I have chosen 10 NoSQL databases as examples.  This is not an exhaustive list, but the concepts discussed are crucial for evaluating others as well.


Scaling reads is easy with replication, so when we’re talking about scaling in this context, we mean scaling writes by automatically partitioning data across multiple machines.  We call systems that do this “distributed databases.”  These include Cassandra, HBase, Riak, Scalaris, Voldemort, and more.  If your write volume or data size is more than one machine can handle then these are your only options if you don’t want to manage partitioning manually.  (You don’t.)

There are two things to look for in a distributed database: 1) support for multiple datacenters and 2) the ability to add new machines to a live cluster transparently to your applications.

Non-distributed NoSQL databases include CouchDB, MongoDB, Neo4j, Redis, and Tokyo Cabinet.  These can serve as persistence layers for distributed systems; MongoDB provides limited support for sharding, as does a separate Lounge project for CouchDB, and Tokyo Cabinet can be used as a Voldemort storage engine.

Data and Query Model

There is a lot of variety in the data models and query APIs in NoSQL databases.

(Respective Links: Thrift, map/reduce views, Thrift, Cursor, Graph, Collection, Nested hashes, get/put, get/put, get/put)

Some highlights:

The columnfamily model shared by Cassandra and HBase is inspired by the one described by Google’s Bigtable paper, section 2.  (Cassandra drops historical versions, and adds supercolumns.) In both systems, you have rows and columns like you are used to seeing, but the rows are sparse: each row can have as many or as few columns as desired, and columns do not need to be defined ahead of time.

The Key/value model is the simplest and easiest to implement but inefficient when you are only interested in querying or updating part of a value.  It’s also difficult to implement more sophisticated structures on top of distributed key/value.

Document databases are essentially the next level of Key/value, allowing nested values associated with each key.  Document databases support querying those more efficiently than simply returning the entire blob each time.

Neo4J has a really unique data model, storing objects and relationships as nodes and edges in a graph.  For queries that fit this model (e.g., hierarchical data) they can be 1000s of times faster than alternatives.

Scalaris is unique in offering distributed transactions across multiple keys.  (Discussing the trade-offs between consistency and availability is beyond the scope of this post, but that is another aspect to keep in mind when evaluating distributed systems.)

Persistence Design

By persistence design I mean, “how is data stored internally?”

The persistence model tells us a lot about what kind of workloads these databases will be good at.

In-memory databases are very, very fast (Redis achieves over 100,000 operations per second on a single machine), but cannot work with data sets that exceed available RAM.  Durability (retaining data even if a server crashes or loses power) can also be a problem; the amount of data you can expect to lose between flushes (copying the data to disk) is potentially large.  Scalaris, the other in-memory database on our list, tackles the durability problem with replication, but since it does not support multiple data centers your data will be still be vulnerable to things like power failures.

Memtables and SSTables buffer writes in memory (a “memtable”) after writing to an append-only commit log for durability.  When enough writes have been accepted, the memtable is sorted and written to disk all at once as a “sstable.”  This provides close to in-memory performance since no seeks are involved, while avoiding the durability problems of purely in-memory approaches.  (This is described in more detail in sections 5.3 and 5.4 of the previously-referenced Bigtable paper, as well as in The log-structured merge-tree.)

B-Trees have been used in databases since practically the beginning of time.  They provide robust indexing support, but performance is poor on rotational disks (which are still by far the most cost-effective) because of the multiple seeks involved in reading or writing anything.

An interesting variant is CouchDB’s append only B-Trees, which avoids the overhead of seeks at the cost of limiting CouchDB to one write at a time.


The NoSQL movement has exploded in 2009 as an increasing number of businesses wrestle with large data volumes.  The Rackspace Cloud is pleased to have played an early role in the NoSQL movement, and continues to commit resources to Cassandra and support events like NoSQL East.

NoSQL conference announcements and related discussion can be found on the Google discussion group.

Rack Blogger is our catchall blog byline, subbed in when a Racker author moves on, or used when we publish a guest post. You can email Rack Blogger at


  1. I thought Scalaris definitely handled the adding of machines live. There is certainly code written for it, and the papers written on Scalaris talk about nodes joining and automatic load balancing (I’ve never used Scalaris myself, so there may be things I’m unaware of).

    Also, I don’t think describing Scalaris as an in-memory database in the traditional sense hits the mark. Scalaris has a fail-stop model where availability is determined by the number of replicas; if a majority of the replicas are on-line, the data is accessible. This implies >2 replicas per item. Avaliable memory is per-cluster, and can also include disk space (one storage option is disk-based tokyo cabinet), but in this case even the disk space is regarded as volatile. This of course makes the ‘very very fast’ claim a bit relative when applied to Scalaris. It offers performance through massive scalability – not necessarily per node, and one can imagine that latency can be substantial sometimes, esp. for writes.

    Lastly, I am not sure why Scalaris would be limited to one data center, given the above. Could you expand on that?

  2. Ulf: you’re right, Scalaris does support adding machines live. We’ll fix the table.

    By Multi-DC support I mean more than the naive “it works if you put all the machines on a vlan” approach: it needs to be able to route requests from clients to replicas in the same DC (or, preferably, same rack), and ideally it should allow waiting only for writes to nodes in the same DC as well. I couldn’t find anything indicating Scalaris does either, but I could have missed something.

    You’re also right about optionally using TC (I’m trying for “broad strokes” in this article :), although I would note that TC’s performance itself falls off a cliff when it grows out of RAM, so I’m skeptical as to how useful that is in practice.

  3. I know we often disagree about definitions (eg “distributed”), but I’m fairly certain CouchDB should have a checkbox for multi-datacenter support. Not in the Hadoop-style sense of rack awareness, but in the sense that lots of people are using CouchDB replication to provide multi-datacenter redundancy for their applications. And there are even more folks who are using CouchDB in a single datacenter, knowing that when it’s time to get more geographically distributed, their data-layer won’t need changing.

  4. I thought the very reason that B-trees are used in relational databases is so that the number of seeks are reduced as B-trees are essentially *flat* trees.

    • I agree: RDF triple stores are faster than SQL, more intuitive, and can exploit standard ontologies (vice private schemas).

      I am a happy user of Allegro Store which is ripping fast.

  5. Hi,

    what about backing up a NoSQL database?
    Please don’t tell me that backups are not required e.g., because the database is replicated.


  6. How ironic that when high volume transaction work is honestly explored, an old idea comes back to light. I wonder if anyone has looked at ACF, TPF or zTPF, particularly ACFDB/TPFDB. This is IBM’s high volume transaction processing operating system and database component used in airline reservation systems and more. The database component is most assuredly a NoSQL database.

  7. This might be as good a place to ask as any…

    I’ve heard some people complain that EC2 (Amazon) is a sub-optimal environment for Cassandra, because of the I/O characteristics of their virtual machines. Anyone here with experiences to share?

    • Yes, I/O sucks, even with EBS (your best choice) cassandra was 2-4 times slower than running on dedicated hardware
      Of course it all depends on HDD characteristics so you mileage might vary.
      Although crucial, I/O is not the only thing you should be worried about – it’s don’t let your nodes GC or become un-responsive in any other way. It’s like domino effect.

  8. is anyone working on decoupling extent management from higher levels of abstraction? That is, treating “extent” as the basic unit of persistence instead of “sector” or “spindle” or “block”, and re-engineering up from that instead of from SCSI

  9. Hi,

    Working with MySQL Cluster, I would have loved to have it part of the comparison. MySQL Cluster is a distributed hash-table based on a relational model, we support both in-memory data (with (optional) checkpointing/logging) and (since quite recently) data stored on disk.
    We have a native api, with which can give very good throughput ( and control, and is also a storage engine for MySQL to provide sql-access. And, we support multiple data-centers using ordinary MySQL replication, and online adding of nodes (providing full transaction semantics during the adding)

    End of advertising

  10. Actually there is a long history of key/value or document store databases starting with Pick in the 70s. They are not opensource but they are current products supported and maintained with rich query languages and a range of APIs. Have a look at UniVerse, UniData or other multi-value databases.

    • I believe the point here is OpenSource. Being able to fiddle with the code and or have other developers in the wild fiddle/fix code is very important in today’s world. It might benefit some of these mature applications if they joined the movement.

    • We’ve been using the ZODB for all our web projects for the past 10 years now. Its a object oriented, transactional, clusterable, etc etc.

      Its strangely absent from this list, seeing as it is one of the oldest non-sql databases in general use.


  11. […] Ellis ha scritto un bell’articolo che analizza il movimento Nosql e quali sono le differenti strategie dei database non relazionali […]

  12. […] Ellis ha scritto un bell’articolo che analizza il movimento Nosql e quali sono le differenti strategie dei database non relazionali […]

  13. […] হাইচার্টস * কিভাবে পিএইচপিতে নন রিলেশনাল ডেটাবেজ ম্যানেজমেন্ট সিস… (NRDBMS) কাউচডিবি ব্যবহার করবেন তা জেনে […]

  14. NoSQL Ecosystem…

    Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as “NoSQL databases.”…


Please enter your comment!
Please enter your name here