AWS X-Ray is a fairly new service that helps developers analyze the performance of applications. It’s especially well-suited for microservice architectures, because it is capable of tracing requests across a distributed system, and also builds a service graph automatically as a side effect.
A few weeks ago, AWS released a preview of X-Ray supporting Lambda functions (they just announced general availability). At that time, I was investigating why certain API calls hitting one of our backend microservices running on a serverless stack took over one second to respond at the 95th percentile. This request tracing service sounded like the perfect tool for the job.
Because the feature was still in preview, it meant that not all Lambda runtimes were supported equally well, and unfortunately, Python was still missing from the list (currently supported SDKs: Node.js and Java).
X-Ray works in a way that there’s a daemon running locally on every supported environment (e.g. on an EC2 instance, or in our case, even the Lambda function’s runtime environment), and the SDKs send data to it over UDP, resulting in low overhead.
The daemon is responsible for forwarding that data to the X-Ray API that can later be queried for detailed performance information, and visualization as well. This custom protocol is documented well enough, allowing me to implement a small library in Python to send data to the daemon.
Our team at Rackspace maintains a collection of helpful utilities for Lambda functions written in Python, called fleece. It now also includes several wrappers to help report performance metrics to X-Ray:
- Automatically wrap all boto3 (AWS SDK) calls, and annotate metrics that help the X-Ray console visualizations.
- Automatically wrap all HTTP requests made through requests
- A decorator that can be applied to any function or method to report how long that particular block of code took to execute.
My investigation started out with applying the wrappers, so all external calls would be captured, and also adding the decorator (@trace_xray_subsegment()) to a few key functions and methods in our codebase, which resulted in the following X-Ray trace:
This made it immediately obvious that there’s something going on within our lambda_handler function (entry point of the Lambda function) taking a lot of time even before the actual code starts running (that would be the _action method).
After identifying a few more methods along the execution path, and adding the decorator to them, the trace changed to this (the Lambda function initialization part skews the chart a bit, please ignore that for now):
The reason why this endpoint was so slow became very obvious: inside the constructor (__init__) of our handler classes (a collection of methods that handled the incoming requests), we fetched all the DynamoDB tables (just their metadata, using boto3.resources.DynamoDB.Table) that we were using during any of the requests.
To make this even worse, we instantiated each handler class at the beginning of our lambda_handler. First, this was totally unnecessary overhead, second, we didn’t follow the best practice of instantiating resources in the module scope, so they can be reused between requests. After making some small changes, this was the final trace:
This resulted in the overall response time for all API endpoints handled by this Lambda function
This is just one example of how AWS X-Ray makes it easy to dig into performance problems. At Rackspace, we will now include support for AWS X-Ray with all of our new microservices.
Do you have other questions about AWS features or workloads? Visit Rackspace to find out more about our managed services for AWS and our growing team of AWS-certified engineers who are helping customers architect, optimize and manage their AWS deployments every day.