OnMetal: The Right Way To Scale

For big or rapidly growing Internet companies, one of the largest pains is scaling their application. We often hear about companies that must re-write their entire architectures from scratch every time traffic grows by 10 times. And then they spend significant engineering resources continuously reworking major parts of their architectures even after those rewrites.

Scale is hard. And it can be costly. And the real cost of running and scaling large Internet applications isn’t just in hosting, but in engineering and management.

Think of it this way: humans are more expensive than computers. That’s why Rackspace’s Managed Cloud continues to make economical sense in the age of commoditization of the cloud hosting industry.

But let’s look closer at the engineering expenses. Back when I ran Mailgun, an email infrastructure startup Rackspace acquired, we were one of a few rare startups that ran 100 percent on bare-metal, dedicated infrastructure. Nearly everyone else used public clouds. We’d talk at “scaling dinners” with other YCombinator founders and compare notes on scaling various technologies, obstacles, victories, scars and hosting expenses. I heard about scaling horrors in the cloud; something we’d never experienced at Mailgun, despite our traffic being high for a startup our size.

Why did cloud-based startups have such a hard time scaling “in the cloud?” The short answer: multi-tenancy. Multi-tenancy in clouds leads to inconsistency in performance behaviors, a noticeable increase in the complexity of the application and increased engineering spend to address that complexity.

It’s subtler than simply a “noisy neighbor problem.” Let me illustrate in a simple example. Suppose that to serve a web request you need to fetch pieces of data from nine compute nodes (caches, DBs, what have you). Note that some calls are synchronous (like B1 to B2 to B3), but others are executed in parallel (A to B, C and D).

Now, let’s do the math: what is the total time required to service this request? The answer is simple: 21 milliseconds, which is the slowest of the three asynchronous “paths” (A->D1->D2->D3). One can assume that you grow by adding a number of nodes and continuing these relatively simple calculations.

Instead, here is what happens in public cloud environments:

  • The networks and physical machines are shared with other companies, and one can’t rely on consistency of performance in all three dimensions: network, storage and compute.
  • This means that the response time of each node is not constant, it is a range of values, oftentimes measured experimentally. In other words: you cannot know it ahead of time.
  • Some of the nodes may become unavailable or, similarly, may not respond within the experimentally determined performance envelope.
  • The overall response time becomes dependent on the complex probabilistic equations instead of a simple algebraic one as seen above.

So how do Internet companies deal with these issues?

  • They over-provision, getting more nodes than they need to raise the probability of a response coming back in a reasonable time.
  • They increase complexity of the application itself. Requesting, say, 10 nodes and running the benchmark to pick the most consistent node (and cancelling the other nine) is a very common technique.
  • They increase the complexity of the application to deal with nodes that may disappear.

In the end, over-engineering and driving up complexity seems to be the universal answer to pains caused by multi-tenancy.

Over-engineering means spending more money hiring smart engineers to – you guessed it – over-engineer your app.

This is why many companies migrate off of clouds to go back to good old colocation hosting. The scaling curves push them toward making a hard choice: abandon elasticity and start managing their own gear.

Today, Rackspace wants to alleviate these pains. We’re waging war on complexity and want to offer a simpler way to scale. We do this with a new product line called OnMetal.

What Is OnMetal?

OnMetal servers are single-tenant, bare-metal servers provisioned via the same OpenStack API as our cloud. They can be spun up as quickly as VMs to offer the agility of multi-tenant environments with the performance of single-tenant hardware. OnMetal servers are our own design and are engineered in a highly opinionated way. We’ve made them 100 percent solid-state with external cooling, leading to increased mean time between failures (MTBF). They are also incredibly large, so you’ll need fewer of them.

Who Is OnMetal For?

OnMetal is for large or quickly-growing Internet businesses thinking about moving from colo to cloud, or vice versa.

Why Should You Care?

OnMetal combines the simplicity of consistent performance and the economy of colocation with the elasticity of the cloud. Running your high-traffic production environment on consistently performing bare-metal machines means less over-engineering, more simplicity and – ultimately – lower costs. Because OnMetal is a part of the Rackspace Managed Cloud portfolio, our customers won’t spend as much managing their servers. This is our answer to the creeping complexity of scaling.

Let’s Dig Deeper

As you may know, Rackspace is a member of Open Compute Project. This allowed us to leverage hardware that borrows from the wisdom of Internet giants who’ve developed server designs optimized for high server count and economy. Open Compute also allowed us to make changes, particularly around reliability and serviceability, of the gear we’re offering.

We’ve created opinionated chassis designs. The chassis is all solid-state. We’ve removed cooling fans from the boxes and do not use any spinning media. This reduces heat and vibration, and helps increase MTBF.

The configurations are opinionated as well. In order to deliver the economy of colocation that customers require, we had to figure out how to optimize the configuration based on specific workload requirements like “database transactions per second per dollar” or “total RAM per dollar per hour.” This led to the following configurations:

Note that we went with the fast 10-gigabit network for all instance types, because network performance is becoming increasingly important.

For detailed descriptions of OnMetal servers and exact pricing, we invite you to sign up for a limited availability program to test them out or come and talk to us about your scaling needs. We expect OnMetal servers to be generally available in the Rackspace Northern Virginia data center next month.

Why Bare-Metal?

We’re not offering these machines as single-tenant VMs for a couple of reasons:

First, several customers expressed concerns with virtualization tax, which becomes more important as the number of servers grows. While hypervisors continue to get better, we routinely meet customers who feel the impact of this tax.

Second, and even more exciting, we see a technology trend in software that renders virtualization less useful. In a single-tenant environment like OnMetal, one does not need to isolate tenants/customers, resulting in the application management and isolation that virtualization brings to the table.

Third, the progress in operating systems has delivered a native capability to isolate apps using containers. Companies like Docker and CoreOS provide tools to run fully isolated applications without relying on virtualization, and we see this as an emerging trend to run at scale: containers on bare-metal.

And, ultimately, OnMetal is all about scale.


  1. The example of request timings seems wrong to me – the slowest path is A-D1-D2-D3 (21ms) – I can’t see how it’s ever “20ms – the slowest path” as that would mean that the slower the request paths, the faster the time taken to service the request. Am I misunderstanding?

    • Hi Sam,

      You’re right! Thanks for pointing that out. It was our mistake. We’ve updated the post to correct the error. Thank you so much for reading it so closely. You rock!


  2. This sounds useful. How do you stop the tenants patching the machine’s firmware to trojan later guests though? Can you reset it between tenants?

    • Thanks for the comment. When the instance is cancelled it goes through decontamination process. The BIOS on the machines, as well as the firmware on the peripheral devices, gets re-flashed to a band new from-the-factory state.

      Hope that helps!

  3. “containers on bare-metal”…I like that approach……..Its very difficult to have reliable performance with noisy neighbors ….Looking forward to try it out 🙂


Please enter your comment!
Please enter your name here