Coding in the Cloud
By Adrian Otto
This continues my series,– rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.
Avoid Unnecessary External Dependencies
Time after time on Cloud Sites, a new site will come online that displays information from another web site, like, say, stock quotes. Let’s say the site sells dump trucks, and there are stock quotes for CAT and other equipment manufacturers they sell, and they want to show those stock quotes on their web site. Every time there’s a page view, the site makes an outgoing HTTP connection to a stock web site, downloads the stock ticker data for those companies and then displays it as part of the HTML output of their own web site.
This works just fine—provided you’re not doing a whole lot of it. But if your site suddenly becomes exceedingly popular because of press mentions, links from very busy web sites or Twitter, all of a sudden two million people are trying to access your site (and consequently the stock site), which can crash the stock site and take yours down with it.
The first thing the frustrated customer does is ask us why their site crashed. When we look, we see that it jammed up waiting for stocks.whatever.com to respond. So what happens is not that the load crashes your site running on Cloud Sites, but it crashes the remote site, the stock site in this scenario, and that dependency causes a train wreck that results in customer frustration.
Lesson learned: be smart about external dependencies. Eliminate all external dependencies you don’t need and be smart about the ones you do – from sites that offer stock quotes, or geo location services, or any of these things that require you to call somebody else’s web service – because you just can’t trust that their site is going to scale as well as your own. This can happen no matter what the size of the external site. We’ve seen it happen with cases where the external site was big, like stocks.yahoo.com. There are some use cases where we’ve clogged stocks.yahoo.com in this very way because they see all of our requests coming from a single place, and it becomes completely unreachable from our network because of the way the request routing works. You must not assume that because the remote web site is big or hosted by a big company that it’s running on an infrastructure that’s going to scale when you access it from your web app. That’s not necessarily the case.
An increasingly popular feature for adding into sites is geolocation services, where you get the location of the person browsing your site. You go to a site, and it might say, “Thanks for browsing from San Antonio. We have a special offer for you in our store at River Center Mall.” These services work by looking up the user’s IP address and using it to determine the user’s location. Some geolocation services are free and not very accurate; others available for a price and tend to be more accurate. Regardless, this is just the kind of external dependency that can bring down your site. The service starts responding slowly. Since we are charging for the time that your application is running, that slowness translates directly into dollars. Now you’re paying a premium to have geo location services on your site. If you really must have geo location, don’t do it with a remote web service. Do it with some kind of a local logic map, like a lookup database that you consult directly and that’s under your own control.
Mashups are another popular use case for external dependencies, and they don’t scale well unless you have a way of caching the results from the dependent web site. If you include a mashup that passes all of your traffic through a remote site, you are trusting that site to scale as well as yours will. Unfortunately, unless that remote site is running on Cloud Sites, it’s probably not going to scale well, simply because it’s not backed by hundreds of servers.
If you must reference external data, be smart about it. I wrote a piece of that you can use as an example for how to help mitigate this problem. This PHP code allows you to display information from another site, but, because it uses a caching approach, it can get fresh remote data in a generally non-blocking fashion and at reasonable time intervals. You can configure the refresh interval to suit your needs. It allows you to have a remote dependency on your site, by limiting the frequency with which you interact with that site.
The bottom line with external dependencies is that they are evil when used blindly. Do everything you can to avoid them, or put a suitable buffer between your web app and any external dependency so that if the remote site does crash, your web app can still run.