In my previous blog post on Object Storage, I provided an overview of what Object Storage is, and how it compares to conventional storage platforms. In this post, I will discuss what benefits Object Storage can provide for you today. As there are a variety of solutions to choose from, each offering different pros, cons and price-points, I will focus on OpenStack Swift, the open-source Object Storage component of OpenStack, as it is vendor-agnostic and freely available to everyone.
Three Key Advantages
Object Storage vs. Block/File Storage
Block and File Storage solutions may be cheaper than Object Storage at small data sizes. In the low tens of TB, the economics of Object Storage are not very compelling. The overhead of 3x replication and having dedicated management/network infrastructure is significant. However, most commodity storage solutions begin to become challenging to scale once more than a single node worth of storage is required. By the time you need 10 or more devices, you are generally either taking on a large amount of administrative overhead (managing volumes/LUNs), or are starting to look at expensive proprietary solutions. This is the point at which Object Storage begins hitting its stride.
Object Storage prices today are typically less than $0.10/GB per month depending on platform and quantity. In the private cloud space, these costs can be significantly lower: you pay a premium to a public cloud provider both in terms of their profit margin and in terms of renting resources on a utility basis.
Object Storage vs. Tape
Simply put, when you have valuable data, the only way to store it more cheaply than Object Storage is on tape. Tape is still by far the cheapest option for long-term cold storage, and rumors of its death have been greatly exaggerated.
On the other hand, tape is basically a black hole for data from the standpoint of day-to-day operations. Tape is not appropriate for data that needs to be accessed regularly, or for data that might need to be retrieved rapidly at some point in the future. In short, tape is not suitable for data that needs to be alive to any extent.
Object Storage provides the closest durability and cost profile to tape on top of a solution that provides hot storage, and can, for example, act as a backend for Hadoop. There are a variety of other useful features, such as multiple ways to span data across several geographical locations, access control, the ability to make content public, CDN integration and others, that tape cannot provide.
By combining 3N redundancy, intelligent data placement, automatic recovery of lost or corrupted objects and automated handling of drive failures (ensuring 3N redundancy even in the period prior to drive replacement), Object Storage provides extremely high levels of durability when compared to conventional storage options. There are also ways to automatically synchronize data between multiple clusters in separate geographical regions, providing durability characteristics that suit virtually any use case.
Without going into excruciating detail, the type of events that would cause permanent data loss in a large-scale Object Storage cluster would typically be catastrophic in nature (the sort of thing that only having a secondary site would protect against: something that is also quite easy to do with an Object Storage platform).
An interesting mathematical analysis of data loss characteristics indicates a worst-case mean time to data loss of over 150 years using consumer quality drives with 3N redundancy. Imagine the cost and complexity of achieving that type of durability number using conventional storage technologies!
While enterprise storage offerings typically offer a variety of compelling features (albeit with a corresponding price tag attached), the feature-set on most free/open-source storage products is sorely lacking when it comes to multi-petabyte storage requirements. This is another place where Object Storage systems shine. Swift Object Storage features include:
- Self-Healing: Automatic identification of failed drives and replication of data to preserve 3N redundancy.
- Massively Scalable (with linear performance improvements with scale): No single points of failure and full horizontal scalability of all services means that environment-level performance increases steadily along with the size of the environment.
- Scalable Metadata: Decentralized object metadata means that billions of objects with megabytes of metadata per object can be managed without performance degradation or the need for an external database to manage data (something that becomes prohibitive at scale). Note: One current downside to Object Storage is that metadata search is currently challenging to implement. We will discuss this point further in an upcoming post on the future of Object Storage.
- Multi-region: Swift supports two methods of implementing multi-region clusters. First, specific containers (analogous to S3 buckets) can be set to synchronize between two distinct Swift clusters (“container sync”). Second, a true multi-region cluster can be set up where replicas are distributed across two or more clusters.
- Secure Multi-tenancy: Swift Object Storage handles multiple accounts, and allows for total isolation of the data associated with an account (in other words, users cannot access one another’s data). There is also the ability to share data between accounts, or even publicly.
Object Storage provides a variety of compelling benefits, and should be considered as a “first-class” shared storage option when designing scalable infrastructures.