In my previous blog post, I discussed the importance of load testing and monitoring your application in preparation for a big event. This could be something that was planned, such as a marketing campaign or an online sale, or something unexpected, such as your application going viral.
In this post, I’ll discuss additional architectural patterns you can adopt to enhance the performance, scalability and availability of your application ahead of a high-traffic occurrence.
For a typical web application, there will always be a theoretical limit to what any given component of the infrastructure (for example, a web server) can handle before it is overwhelmed and the user experience is affected.
Obviously, enhancements can be made to the application code, the configuration of the web server and the underlying operating system. However, once the resource limits of the component have been reached, the only way to maintain performance under increasing load is to either scale vertically (get a bigger server) or horizontally (add more servers).
Auto-scaling is a mechanism that allows you to automatically provision additional resources as the demand on your application grows, and terminate those resources once demand has died down. Before the advent of cloud computing, handling huge amounts of traffic required pre-provisioning servers for the projected maximum load on a system ahead of time, and thus incurring costs for that capacity even when there was no demand for it.
Before diving into how to scale your application effectively, a number of architectural factors need to be considered, as scalability does add some additional complexity.
User session management
When a user loads your application in their browser, assuming your application is more than just a static website, user sessions allow a web server to store persistent information about that particular user as they interact multiple times with your application.
The user session may contain their authentication information (such as an authentication token), shopping cart contents or some other user-specific configuration/information. If user sessions are handled on each individual web server, a user’s requests will always need to be returned to the same web server. If you have a farm of web servers (be that auto-scaled or static servers) behind a load balancer, there needs to be a way for the user to either be returned to the server that holds their session data every time, or have some way for every server in the farm to retrieve their user session, so the user’s experience is consistent and stable (for example, they’re not logged out every time they interact with a different server because their authentication data was stored in the user session).
One method is to enable sticky sessions on the load balancer. Sticky sessions allow the load balancer to generate a browser cookie that associates the client to the individual server. While this method works, it’s not ideal; particularly in the case of an auto-scale group. If one of the web servers in the auto-scale group is terminated, users associated with that server will lose all session state and be returned to an empty session. Logins, shopping cart data and other session information would be lost.
Sticky sessions can also result in uneven demand on the servers. As servers are added to the pool, only new sessions will be directed to those servers. The existing servers will continue to struggle under the load of previously established sessions, rather than traffic being distributed evenly across all servers in the group resulting in the development of “hot spots”, or an uneven distribution of load across the fleet.
A better method of managing user sessions is to decouple user sessions from the web servers. The most common and preferred method is to utilise a caching engine such as Redis or Memcached, on one or more servers running separately to the web servers.
Both of the above caching engines, while slightly different in their capabilities, are broadly categorised as in-memory key-value databases. Holding user sessions in a cache will allow each web server to refer to the cache for session data, giving users a consistent and stable experience regardless of which server they interact with, and enabling the load balancer to distribute all load evenly across all hosts. An added benefit of externalising user sessions is that if a web server were to fail or be removed from the auto-scale group, users won’t be affected.
In a scalable infrastructure, all servers will typically need access to any files uploaded by users. Because they’re usually uploaded to a specific server, keeping these files in sync across the fleet can be a challenge. Fortunately, there are a number of different ways to address this issue, each of which can be used in isolation, or in combination with each other:
- Use a distributed file system such as a Network File System, GlusterFS or Amazon’s EFS
- Host static assets on an Amazon’s S3
- Automate the synchronisation of files across all web servers
Each solution has benefits and drawbacks. Using a distributed file system is the most transparent method as little needs to be changed from the application perspective. As far as your application is concerned, files are accessible as if they were on a normal disk, and only the underlying operating system needs to be made aware of the change. This has the downside of possible performance bottlenecks as all read-and-write operations to the distributed file system would then go across the network.
Amazon EFS has further implications that need to be factored in for disk IO. EFS throughput scales as the file system grows. This means the more data stored in an EFS volume, the more throughput is provided. Proper bench marking of your application and careful observation of the associated Cloudwatch metrics should be done when first implementing EFS to ensure it meets your application’s performance requirements.
S3 is especially useful for static assets, however, your application will need some modification to point to the location of each object. This may also require the use of a software development kit that can handle the reading and writing of objects to and from S3.
Additionally, S3 has an eventual consistency model for read-after-write due to its transparent replication of files. This means a file that is updated in S3 and then immediately read back from S3 may not return the most recent version until the changes have been fully propagated across the entire S3 fabric in the region. Typically, this delay will be measured in either milliseconds or single digit seconds. However, if your application requires immediate read-after-write consistency, this could have a negative impact on your users.
Finally, automating the synchronisation of files across all web servers using a tool such as rsync gets around possible disk IO issues as files are now local to each server. However, this method requires scheduling the synchronisation job using Cron, the Windows Task Scheduler or a similar utility. This also means changes to files will have an even longer delay than S3’s read-after-write consistency, sometimes five minutes or more depending on the size and number of changes made, and the interval for the synchronisation job.
Depending on the database technology you’re utilising, setting up read-replicas of your database may be an option. Read-replicas are beneficial when your application is read intensive, as it allows you to split out read-and-write applications to separate servers, freeing up your master database for write operations while more resource-intensive reads are performed on a replica. Additionally, read-replicas allow for backups, business intelligence, reporting and analytics to run during production hours without impacting the performance of the master database. Read-replicas can also be promoted to master if you experience an outage on the master database server, which increases the fault tolerance of your architecture.
You will need a method to split out reads and writes at an application level in order to fully take advantage of read-replicas. Most application frameworks provide such a mechanism.
I previously mentioned Redis and Memcached in relation to session state. These same tools can be used to store database responses in-memory, providing extremely fast response times for common queries that may take longer when pulled from the database.
Additionally, because they cache database responses, this frees the database from having to respond to every single request, potentially negating the need for a larger database server and allowing you to save costs without sacrificing performance.
Most application frameworks will have support for Redis and Memcached — either built-in or through libraries. Implementation could be as simple as a one-line change in your application’s configuration or, it may require a more involved refactor to fully take advantage of the features of the engine.
Content Distribution Networks
A Content Distribution Network or CDN is a globally-distributed network of servers that deliver either static or dynamic content to end-users based on their geographical location.
For example, let’s say your application is hosted on the east coast of the U.S., but the majority of your users are in Southeast Asia — it can be assumed that the latency incurred when traffic goes halfway around the world will negatively impact the user experience. A CDN will improve the end-user experience by caching content in a cache node in Southeast Asia, providing your users in that region with vastly improved response times.
Additionally, because the CDN is caching regularly-requested content, it has the added benefit of relieving your web servers from handling every single request. This can potentially allow you to scale down parts of your infrastructure such as the web servers, which can save costs while handling the same volume of traffic.
CDNs can also act as a buffer from some types of denial-of-service attacks that would otherwise cause you to scale the infrastructure to meet the added demand.
There are different types of CDNs available. They can be broadly categorised as either push or pull CDNs. A push CDN requires you to upload your content to the CDN then reference the CDN’s distribution URL within your application. A pull CDN is more like a reverse proxy that will dynamically cache content as it is requested from the end client. Beyond caching static content, some CDN’s also provide the ability to cache dynamic content based on headers or query strings.
While a CDN is transparent to the end-user, it is not completely transparent to you or your application. The configuration steps required will vary depending on the type of CDN you use. However, at a high level you will need to:
- Provision the CDN and configure your origin. The origin will typically be your web servers or a load balancer. However, you can also choose to host static assets in an object store such as S3. In the case of Amazon’s CDN, CloudFront, you can configure S3 as an origin for your static assets.
- Re-configure your application to output the URL of your static assets from the CDN’s URL, rather than your own. For example, rather than an image’s URL being https://www.myapplication.com/image.png it may now be https://cdn.mycacheprovider.com/myapplicationscdn/image.png
- Configure DNS for your application. This will involve creating a DNS record for your application that points to the CDN, rather than your origin servers or load balancer. This will ensure that your site’s visitors only reach your application via the CDN, and don’t accidentally go directly to your servers.
Throughout the last two posts in this series, I’ve looked at load testing and monitoring your application, as well as all of the above changes you can make to your application’s infrastructure to help it scale. Stay tuned for my next post in this series, where I’ll cover right-sizing your infrastructure for a high-traffic event.
Want to learn more about optimising your web applications? Visit Rackspace to find out about all of the ways we help businesses succeed with AWS.