To call Object Storage an emerging technology would be inaccurate. There are already trillions of objects and hundreds (perhaps thousands) of petabytes of data in Object Storage public clouds, such as Rackspace Cloud Files and Amazon S3, in private clouds based on the OpenStack Object Storage platform Swift, and other platforms such as EMC’s Atmos.
While Object Storage is already an established technology, it is one that is not widely embraced or understood. In this blog post, I will discuss what Object Storage is, what it is good for, and what it is not good for. I hope that by the end of this post you will feel that it is a technology that warrants some serious investigation!
Before we talk about Object Storage, let’s talk about the two other main types of storage: Block Storage and File Storage.
The most common examples of Block Storage are SAN, iSCSI, and local disks (be they JBOD or RAID). A Block Storage volume is attached directly to an operating system, and interactions generally happen within the parameters of a filesystem, although it is also possible to have a block device that is accessed directly at the bit-level. Block Storage is the “lowest level” of all storage types, and allows for convenient manipulation of data at a byte-level. This is useful for applications with heavy random I/O, and/or applications where only small portions of data are required.
The most common example of File Storage is a NAS (generally using CIFS or NFS). File Storage involves the use of a network file system that acts as an abstraction layer between the OS and the underlying filesystem on the NAS device. The OS still sees the storage as a local filesystem, but it is not actually interacting directly with the filesystem on which the storage resides. Instead, its commands are interpreted by the network filesystem, and translated to commands of the underlying filesystem. This is convenient, because it allows different operating systems that may or may not support the actual underlying filesystem to interact with it in a uniform manner, which is very valuable when multiple machines need to be able to access the same content on a remote server. In this same vein, features like file locking (to prevent inconsistent states when multiple servers are writing to the same file) and access control are almost universal in the File Storage world.
So what makes Object Storage different from Block Storage and File Storage?
First of all, Object Storage is not directly accessed by the operating system. It is not seen as a local or remote filesystem. Instead, interaction occurs at the application level via an API. Tersely, Block Storage and File Storage are designed to be consumed by your operating system, Object Storage is designed to be consumed by your application.
This has several implications:
Second, Object Storage uses a flat structure, storing objects in containers, rather than a nested tree structure. Once again, many implementations of Object Storage can emulate a directory structure, and give the illusion of heirachy, but in reality the underlying storage is flat. This is another feature of Object Storage that allows for massive scalability: by eliminating the overhead of keeping track of large quantities of directory metadata, one major performance bottleneck that is typically seen once tens of millions of files are present on a filesystem is eliminated.
Another difference between Object Storage and the other storage types is that object metadata lives directly in the object, rather than e.g. in a separate inode. This is useful, because the amount of metadata that is desirable in a storage platform that is tens or hundreds of PB is typically orders of magnitude greater than what conventional storage engines are designed to handle at scale.
For example, imagine if you wanted to store all of the books in the Library of Congress in a single storage platform. In addition to the contents of the books, you want to store metadata including the author(s), date of publication, publisher, subject, ISBN, OCR date and method, copyrights, etc. etc. This data could range from a few KB to several MB per object. Traditionally, all of this data would have to be stored in a relational database, and an application built to relate this data to a specific object. Doing this for 35 million (and growing) objects represents a major challenge with traditional storage platforms. In an Object Storage system, there is no scalability issue, as this data lives directly with the object, and can be retrieved with a single API call without the overhead associated with a relational database.
Many of the features of Object Storage seem inconvenient at very small scales, but as data scale reaches hundreds of TB and moves into the PB range and beyond, these features become invaluable, and allow continued horizontal scalability for virtually any quantity of data.
Due to the design of most Object Storage systems (3 file replicas is the most common paradigm), durability levels at scale are extremely high compared to conventional storage solutions (think 99.99999% to 99.999999999%, 7 to 11 nines). Object Storage systems have internal mechanisms to verify file consistency, and handle failed drives, bit-rot, server and cabinet failures, etc. These features allow the system to automatically replicate data as needed to retain the desired number of replicas, which results in extremely high durability and availability of data.
Because many Object Storage platforms are designed to run on commodity hardware, even with 3x overhead, the price point is typically very attractive when compared to block or file storage. At scale, costs of pennies per gig per month are typical. Think the comparable or better durability than tape, nearly the same cost as tape, and the convenience and performance of hot storage, plus all the benefits of a “cloudy” storage platform.
Currently the datasets best-suited for Object Storage are the following:
Note that projects like ZeroVM are turning Object Storage into a computable platform, making it suitable for semi-structured data applications (e.g. Hadoop/MapReduce analytics). I will be discussing converged storage/computable storage in a future blog post.
For businesses with large data storage needs, Object Storage bears evaluation. It can almost certainly provide superior scalability, durability, and price compared to existing storage solutions at petabyte-scale.