Batch, Streaming and Serving: Hortonworks Data Platform version 2.2 is Here

Over the last two years Rackspace has released several enhancements to its Cloud and Managed Big Data Platforms powered by the Hortonwork Data Platform, including support for different versioning, OnMetal, Spark and other enhancements. Our commitment to ensuring that our HadoopTM users have a current and relevant big data environment to meet the fast changing ecosystem needs is top of mind in our development efforts.

Normally, point releases in technology are not always newsworthy. For many legacy applications, the only glaring differences are subversive bug fixes and enhancements to keep up with competitive pressures. The Apache HadoopTM application is an exception because the technology is evolving so fast and in such high demand that each point release comes with notable improvements and several new tools that are actively changing the way users are running their queries, architecting their environments and developing their use cases.

Over 88 improvements were rolled into HDP version 2.2. At Rackspace, we believe the key to understanding all the changes needed to adopt a new application version is partnering with someone who has the expertise to not only get you to the next version but also ensure your success. That’s why we are happy to report today that Rackspace will be supporting the Hortonworks Data Platform 2.2 (also known as HDP 2.2) with trained experts ready to help users adopt and implement this new version.

Following fast on the heels of Hortonworks release in December of 2014, Rackspace will offer HDP 2.2 on both Cloud and Managed Big Data Platforms. While each offering has a unique set of supported tools and operations, the underlying distribution will be available in both cloud and dedicated models. For HDP 2.2, we will focus on our Managed Big Data Platform, which allows users to design custom, robust dedicated Hadoop environments with added security and network capabilities, all while being fully managed by the Rackspace team.

Users can expect:

New Versatile Hadoop Architecture: The new LAMBDA architecture for Hadoop is a layered approach to serving up query performance in the best possible manner to meet a variety of business objectives. Some queries are great to run as a batch job, however some other queries need increased resources and performance to handle streaming or serving. The LAMBDA architecture splits Hadoop operation into 3 layers: batch, streaming, and serving. The batch layer handles traditional Hadoop querying over very large data sets while the Streaming layer serves up real-time event processing. The serving layer is reserved for ad-hoc querying. By allowing these operations to exist simultaneously, users can design multiple workflows in Hadoop to maximize the compute operation on top of their file system.

Apache Spark Integration: As part of these new layers (mainly streaming) the Apache Spark project has grown in focus as a go-to-tool for memory optimized processing beyond the constraints of physical disks. Rackspace launched support for Apache Spark last October and has seen adoption of the new technology ramp significantly. As users begin to understand how Spark can optimize their operations, the Hadoop community is forming a tighter integration into Hadoop’s core services like HDFS and YARN. Deeper integration ensures Spark operates more like a Hadoop tool and less like an adjacent technology. Users are achieving once unthinkable query performance with Apache Spark and we are excited to see this new technology evolve.

SQL on Hadoop with Stinger and Phoenix: It is hard to ignore the prevalence of SQL semantics and growing expertise across the industry. Being one of the most focal enterprise application back-ends, the family of SQL variants are an essential part of the modern data architecture. As Hadoop adoption increases, it is more important that we put the power of big data into the hands of these DBA practitioners. That is why you will see continued efforts to add common SQL operation into Hadoop. In HDP version 2.2, SQL on Hadoop is performed in two ways – the Stinger Initiative (Stinger is not a single tool but a set of tools and operations) and Apache Phoenix. Stinger is often referred to as the SQL interface for Hadoop and it not only improves performance and scalability, it also allows for the most common SQL-based operation. The Stinger Initiative was delivered in three phases, and HDP 2.2 represents the final stage. Phoenix is a new tool that acts as the SQL skin for HBase, which is the NoSQL engine inside of Hadoop.

Full Text Search. Full text search tools like Apache Lucene, ElasticSearch, and Solr are common additions to NoSQL and big data environments. They allow for quick ad-hoc searching of full text without inhibiting cluster operation. Often, users will use full text search to gut check or verify aspects of their dataset while normal Hadoop operation is executed. Hortonworks has championed the Apache Solr technology as the go-to resource for full text search on Hadoop. Rackspace users are able to utilize Solr in our Managed Big Data Platform or choose the ObjectRocket ElasticSearch offering currently in limited beta to perform full text searching and indexing. For advice on the best tool to do this for your specific use case, please reach out to one of our data experts.

If the sheer amount of changes and fast cadence of releases is a point of hesitation for your organization, you need not fear. Rackspace Data Services is a highly focused team of developers, data engineers, DBA’s and expert consultants that can help you implement the most popular data platforms. We have cross-functional knowledge in Hadoop, MongoDB, Redis, MS SQL, MySQL, Oracle and ElasticSearch, and many connectors and services that can connect these environments. We can also help you orchestrate these technologies on the best-fit platform. If you need the security and isolation of a dedicated Hadoop environment or the instant access and flexible pricing of a public cloud platform, Rackspace is ready provide you the fanatical expertise you need to ensure your big data initiative is successful.

Rack Blogger is our catchall blog byline, subbed in when a Racker author moves on, or used when we publish a guest post. You can email Rack Blogger at


Please enter your comment!
Please enter your name here