This is a guest post written by Stephen Pope, Partner, at Project Ricochet, a Rackspace partner and a full service web development firm that specializes in Drupal development and responsive web design..
When we set out to build a new social community using Drupal during a recent project, how to scale dominated the technical discussions. Scaling seems to have become an amorphous topic, and people get lost trying to think through every possible pain point. Because in a startup, funds might be limited, the trick is to find a balance, and using Drupal and Rackspace services is just that balance.
We attacked the project by breaking it down into the fundamental pieces:
We think of things on a very practical level; we could nerd out on any of the subjects above, but we know our time and our developers’ is valuable., With a focus on using off-the-shelf software, a bit of research and a hosting partner like Rackspace, we can realize a huge cost savings and eliminate some of the unnecessary complications. In our project, we started with hosting as a point to make smart choices – to maximize our scalability and cost savings in the long term. In general, we find that this saves us from having to implement custom solutions before we need them (if in fact we ever do!).
Our scaling challenge in this project involved authenticated users. You can’t cache the majority of content like you can for a website with anonymous users. It’s very common for anonymous sites to use a front end cache proxy, such as Varnish. In our case, each page’s content is different because its content revolves around the user in question.
Because the memory requirements for PHP (and Drupal more specifically) can be quite high, even just a few continued modules and a single Drupal page request could hog 128 MB or more per process. If you have continuously heavy traffic, you could bottleneck and dogpile fairly quickly.
Load balancing isn’t always straightforward, but with Drupal it’s fairly transparent. You simply have to decide how many Apache processes you’ll need by using your worst case PHP memory usage (a limit set in your php.ini file), then divide that into the memory size you have available on your cloud server (after taking into account overhead of services like Apache, MySQL, Varnish or Tomcat). As you grow, you’ll need X number of www cloud servers to handle the given load. At that point, it’s simple arithmetic.
If we’re going to stretch MySQL as far as we can, we need to reduce and focus the work that we ask MySQL to accomplish. Drupal stores its sessions and cache data by default in MySQL. It works, but this data is used often, the tables can be large and they have a simple index and only a column or two. Other services like MongoDB and Memcache are perfect replacements for this commonly used data because they have been uniquely specialized to handle the type of operations we need.
Using this module http://drupal.org/project/mongodb and a dedicated backend-cache MongoDB cloud server is a super simple way to achieve this setup.
Scaling your MySQL database might be one of the harder parts of this project. There isn’t going to be a one-size-fits-all solution. Typically a multiple read, single write setup will take you pretty far. The basic idea is that your www servers can request information from any number of read nodes, and writes are directed to a single server.
One of the typical drawbacks of cloud hosting is the shared I/O devices such as the hard drive. If you have a busy neighbor processing a lot of files, your application may suffer.
Using Rackspace’s Cloud Block Storage, you can get a dedicated hard drive for your MySQL server. You can even add an SSD drive to ensure blazing fast access to your data. It will also help extend the life of your MySQL server as you grow. SSD is recommended for high volume sites, however you can start off with a traditional drive and move to SSD later if you’d like to keep costs down. The main point is to get a dedicated I/O device so you don’t need to share resources.
In our project, our specific web community had a unique challenge: users could upload extremely large images files that could be shared and sold. There were a few main challenges:
http://drupal.org/project/cloud_files is a contributed Drupal module that will help you solve all of these issues in a single stroke.
As images are uploaded into the Drupal system (normally stored in the /sites/My_Site/files) they are instead transferred to the blazing fast Rackspace CDN. The module will seamlessly serve URLs from Akamai, a world leader (and partner of Rackspace) in distributed file hosting. Not only will you not require large amounts of hard drive space on your servers, but your assets will be spread across the globe and served up from servers closest to where they’re being requested.
You’ll reduce load, stress and bandwidth on the Apache server as well, allowing each to dedicate itself to processing the dynamic parts of your site.
Now that you’ve set up your basic architecture, most of the leg work as you start to grow will involve replicating additional servers into the mix. Rackspace lets you clone any of your servers by making a virtual image of that server, then create new server clones from those images.
How should you start? I recommend a larger number of smaller nodes (as opposed to fewer higher power instances) – even as you continue to grow. The minimum setup would probably look like this:
You may wonder, “Why not start with all the services on a single cloud server if I don’t have the traffic yet to justify breaking services up into additional servers?”
Well, sure you could stack all these services on a single “larger” cloud server until you get more traffic, but that’s not always as simple as you might think (as we all know, it’s *never* as simple in practice as you might think beforehand). The point is to setup something that can scale without a lot of rework, complicated and tedious migration and maybe even a developer’s or sysadmin’s help. By going with more servers from the get-go, you’ve separated the various channels of concern into distinct areas. Trouble spots or bottlenecks are much easier to spot, you become less dependent on a single cloud server’s performance, and you can add servers to the areas where your site actually needs more juice.
We use Rackspace for most of our client projects. We find that the versatility of the platform and the multitude of affordable services help us serve all manner of clients, large and small. We host our own internal projects and websites on Rackspace too. By following the rough guidelines I outlined above, you too can partner with Rackspace for highly scalable Drupal applications.
And if you have any questions, do what we do – hit Rackspace up on chat. They are always helpful and willing to walk you through complicated implementations or services, even for cloud products (their least expensive tier of services).
As the web becomes a larger and larger part of our lives and day to day services, scalability will continue to grow in importance. Just remember the fundamentals outlined above and you’ll be just fine.