It wasn’t too long ago that developers and database administrators answered with a simple “no, thank you” when asked about implementing any database on virtualized or cloud environments. The state of database-as-a-service solutions has come a long way in a relatively short period of time. Today, the number of choices available to developers in the data services tier has exploded.
MongoDB, a frontrunner in the NoSQL world, has had a unique trajectory. In about four years it has become an important tool for cloud applications requiring a document-based data engine. As a dynamic-schema data engine, it is friendly to developers employing agile development processes. It allows applications to be built quickly and lets developers use the native data structures of the programming languages they’re already using. As MongoDB continues to evolve and strengthen, the selection of the underlying cloud platform represents a critical decision for architects and developers. You can run it yourself on the public cloud (the do-it-yourself approach, or DIY) or you can find a MongoDB cloud service.
3 PROBLEMS WITH DIY MONGODB ON THE CLOUD
For the purposes of this discussion, we will assume you have looked at your data tier needs closely and made an informed decision to use MongoDB in your application. The question then becomes: do you implement MongoDB yourself or use a cloud service provider?
The concerns with DIY MongoDB on the cloud can be summarized in three areas:
- Scalability: Traditionally, it’s difficult to web-scale the data tier when using relational databases. NoSQL solutions like MongoDB address this concern through sharding (horizontal scaling). But sharding can be misunderstood. Maybe the wrong shard key is used, or sharding is done late in the process when the database is already running at full capacity. Any cloud provider for MongoDB should have the ability to provision infrastructure on demand, across a variety of regions and data centers, and across a variety of configurations and sizes. It might even make those choices for customers through automation and make sharding a natural part of the platform. In a DIY approach, these infrastructure choices and automation issues are transferred to either the application operations teams or the application developers themselves.
- Performance: MongoDB, like any database, is defined by its performance relative to the application. But performance is affected by every layer of the application and the infrastructure, including the network, storage systems (SSD vs. HDD), file systems, mounting options, node specifications, and even the logical and physical design of the application (think about index design, for example). High performance is often correlated with high price, so it’s important to have a set of choices available for different stages in the development process, for applications with differing performance needs, or simply for different budgets. When dealing with performance, two issues are important: performance consistency (how reliable is the performance profile of the system) and the cost of performance (how much does it cost to achieve a given level of performance). Unfortunately, inconsistent storage performance and an over-provisioning tax to achieve performance are common among some cloud providers. A DIY approach also bears the burden of delivering performance to the application.
- Availability/Reliability: The database is typically thought of as a potential single point of failure. But that doesn’t have to be the case. Running MongoDB on the cloud can allow for more robust and resilient architectures. Basic implementations (for development and testing) can run on single unsharded nodes, but typical production deployments can leverage replica sets. Smart cloud implementations can even be aware of node locations, ensuring that copies of data are available on physically distinct nodes to reduce the impact of a failure. A key aspect of delivering high availability is monitoring the platform for issues, and automation features react to them to avoid downtime. Again, all of these are burdens that a DIY deployment must bear, and which take away from the resources available to develop and evolve the application.
In a post tomorrow, I’ll consider how to compare service levels and options from a MongoDB hosting provider. In the meantime, here’s a recent interview with ObjectRocket co-founder, Chris Lalonde, on the challenges of running MongoDB on the cloud.