A customer approached us recently with a new NoSQL-data requirement for an instrumentation system we had helped them with before.
As a little bit of history, they had collected large amounts of critical sensor data and stored it in a dedicated Redis database. Redis’ key-value structure had worked well when they collected only one value for each key. However, it felt clumsy when a new error-checking requirement required that they add a time-stamp for each datapoint and therefore had to store two values for each key. To do this in Redis requires the use of an artificial delimiter in the value field or the use of two Redis databases that each use the same key to retrieve the separate values associated with each key.
Both are low-cost to no-cost to implement and they opted for the delimiter approach, using ‘::’ as a delimeter. Of course, they knew they could not keep stretching the software key-value paradigm like this as new data-logging requirements appeared but it worked well. It required hardly any change to the existing production code, and used only enough RAM to store each additional data-point.
They also used Rackspace’s Redis-as-a-Service offering to test their new code while continuing to write real-time data to their existing data-store.
When they approached us for a risk assessment of their need to upgrade their sensor-count from 276 to almost 8,000 (a 29x data-volume increase), it felt as though the decision to migrate to a more complex NoSQL database had already been made.
They had concerns that Redis’ in-memory data model might at some time limit the volume of data they could quickly access. However, when we calculated the memory their Redis data store used and its rate of growth, we quickly showed them they had nothing to worry about for, oh, the next decade or two.
However, we identified during conversation that they had a wrenching need to be more statistically sophisticated. Exploring that need actually powered our recommendation – and their eventual move – to MongoDB.
Here’s the background: the commonly recognized methods for dealing with data-sampling problems are quite easy to implement in Redis. Taking three time-separated or multiple independent sensor readings for each data-point probably being the most common.
That looks like a data volume problem but they also wanted to gather much more information about the environment around each sensor, as well as sensor diagnostics information, sensor-layout data, and to perform experiments with sensor calibration and record sensor-test results.
Their data would look more complex, and would change format more often. They could bend Redis to the task and document what data was where, hoping – in vain perhaps – that future developers would read it. Or they could exploit MongoDB’s NoSQL-style flexibility coupled with each MongoDB document’s free-form nature to create a flexible self-documenting datastore. To put it technically, they now had a requirement for the sophistication and on-the-fly flexibility that schema-less NoSQL databases are designed for.
We barely had to show them a MongoDB document to convince them that MongoDB’s JSON-style document layout enabled on-the-fly flexibility as they developed their product.
A key enabler of this migration was that with Rackspace offering both Redis and MongoDB as a service we had no hardware to retire or re-configure as we offered the datastores for them to trail coding against. Instead, all they needed were IP addresses and ports, leaving them to write code.
Although their sensor data remains the same in nature, their meta-data – timestamps, location, environment factors – about each sensor’s datapoints is much more complex. And fast-changing. But MongoDB handles speed of data-writes and data-reads, as well as the speed with which they change their data formats!
Have they dropped Redis? Far from it. They’re using it more than they were. But they’ve changed how they use it. They still use it for key-value data with needs that they know will remain simple into the future. They’re using it as a simple caching store for session data – a technique that, in the different world of web-applications, enables session-heavy web applications to scale by freeing them of the painful out-of-disk or slow disk-IO conditions they so often suffer. Probably the most common of these scenarios are customers who use it as a cache to speed up applications like Magento, where Redis is the primary cache layer in the defense-in-depth caching approach to Magento scalability challenges.
For an excellent exposition of Redis’ flexibility, see Ken Seguin’s blog post on its role as the “Swiss army knife of data-stores” at: http://openmymind.net/Redis-Is-The-Most-Important-Tool-In-My-Toolbelt/
They’re also using it for its speed-of-implementation to create on-the-fly data storage for code-in-progress. This speeds up development because it creates a simple-to-interrogate queue that allows easy review of data regardless of the state of the code that uses it. The fact that it standardizes data transfer between different – sometimes conflicting – sets of code, as well as languages and that the developers that love them, is a real help.