So far in this series on preparing for the big event, we’ve covered several topics related to scaling your infrastructure. In this final blog post we’ll be covering some final actions you can take ahead of a big event to make sure your web application can stay online and perform during your big event.
By Default, AWS places service limits on the resources an AWS account can launch. Service limits exist to help guarantee availability of AWS resources and protect account holders from inadvertently launching more resources than they intend to, either intentionally or by automation such as a misconfigured Auto Scaling group. Limits also protect the account holder against nefarious actors who may have gained access to your account from provisioning resources for their own use.
However, in a situation where your application is about to be faced with a large onslaught of traffic, these service limits during the peak of the storm could prevent you from provisioning further resources at exactly the moment you need them.
You can see EC2 limits for your account by going to the EC2 console and clicking on the ‘Limits’ link in the top left of the screen – review any service limits that may affect your infrastructure. For example, you might want to request a limit increase if you make heavy use of M5.large instances and are at risk of reaching the set limit after scaling up your infrastructure. You can do this by clicking the ‘Request limit increase’ link to submit a ticket to AWS support.
For extra points, it’s possible to receive automatic e-mail notifications as you reach limits via the use of a quick-start stack provided by AWS1. This stack will monitor one or more accounts for approaching service limits via the AWS Trusted Advisor and send an e-mail when a service limit is approaching. For this to work, you’ll need a Business or Enterprise-level AWS Support plan. All AWS accounts managed by Rackspace have this by default.
Pre-warm load balancers
Depending on the type of load balancer you’re using for your web application, you may need to request AWS support to pre-warm it. This is not an issue with the advent of Application and Network Load Balancers. The original Classic Load Balancers still require this to be done. When traffic to a web application increases gradually over time, the infrastructure underlying the load balancer will automatically scale to meet demand. However, when you have a sudden onslaught of traffic that quickly peaks, the load balancer does not have enough opportunity to scale. This will usually result in HTTP 500 errors and timeouts. You’ll be able to see this in Cloudwatch, by observing the following metrics:
These metrics should normally be zero or close to zero. Anything above this for a sustained period of time (5-10 minutes or more) indicates the load balancer is failing to meet demand.
To prevent this, request AWS pre-warm the load balancer ahead of time by submitting a support ticket. Typically, they’ll need the following information:
- Expected Start and end date/time of the increased traffic
- The expected requests per second
- Average size of an HTTP request/response (this can be found by enabling ELB access logs)2
- The percentage of traffic using SSL
Infrastructure event management
Finally, AWS offers a service called Infrastructure Event Management for its Business and Enterprise Support customers. IEM is best described as hyperscale care directly from AWS leading up to and during your big event. They will work with you in the weeks leading up to your event to help plan and assess your environment’s ability to handle the expected influx of traffic to your site.
AWS recommends that you contact them three to four weeks ahead to allow for sufficient preparation time. During the lead up to your event, AWS will assess your application’s readiness, provide you with recommendations on how you can remediate your application’s architecture and identify any risks, so they can be mitigated. They will also assist in documenting your plan for the day which may include operational runbooks, a list of key contacts and contingency plans.
Once the event is over they will help you in scaling back your infrastructure and analysing the results so you can improve your application for next time, if required.
During this four-part series we’ve covered a lot of content that should help you in your preparations.
If you need further assistance, I encourage you to reach out as Rackspace has a number of services that can assist including our Fanatical Support for AWS and Professional Services. Finally, I wish you all the best for your big event!