Hacking with AWS Lambda and Python

johnyi

Hacking with AWS Lambda and Python

Ever since AWS announced the addition of Lambda last year, it has captured the imagination of developers and operations folks alike.

Lambda paints a future where we can deploy serverless (or near serverless) applications focusing only on writing functions in response to events to build our application. It's an event-driven architecture applied to the AWS cloud, and Jeff Barr describes AWS Lambda quite well here, so I'll dispense with all the introductory stuff. What I will do is document a simple AWS Lambda function written in Python that's simple to understand, but does something more than a "Hello World" function.

One of the reasons it's taken so long for me to explore AWS Lambda is that it was only offered with Java or Node.js. Once AWS announced Python with Lambda at re:Invent, it's been a lot easier for me to give it a try (although there was a hack to use Python with AWS Lambda I was just too darn lazy to try. By the way, here is a hack for Go if you're interested).

Having some time this week, I thought I'd explore AWS Lambda a bit, and I came up with a simple service. Everyone in the Unix world has heard of /dev/null, so why not a devnull S3 bucket? I found it useful for understanding how to work with Python and AWS Lambda and thought I might share that here.

So devnull S3 bucket is exactly what you might expect, as any object that is uploaded into the bucket will be deleted. This actually has a few complexities to overcome if you were implementing this in a traditional manner. First, there would be the EC2 instance that you would need to spin up, complete with an IAM role for access to the S3 service and resource. Then you'd have to figure out some method to detect when an object was uploaded to the S3 bucket.

Finally, you'd have to handle passing the correct parameters for deleting the object. None of this is particularly hard mind you, but it is a bit tedious. How about if a 1,000 objects hit your bucket at once, would you have to scale up the number of EC2 instances to handle the load? Maybe you use SQS and queue up the tasks and handle it with a worker queue pattern. Of course that requires some logic to mark tasks completed, and by the time it was all said and done you'd have a little more work than you thought. What if I told you this could be done easily, in less than 30 lines of code? Here it is:

One of the coolest things about this snippet of code is it will run at scale without much tweaking on my part. If 1,000 objects were uploaded to this bucket, I don't have to worry about the infrastructure to handle those actions.

The code above was largely taken from the s3-get-object-python blueprint and modified. The AWS Lambda Python runtime is version 2.7. Additionally, it comes with Boto3, the AWS Python SDK that makes interfacing with AWS services a snap. If needed, you can add other Python modules and those can be zipped up into a runtime package (Note that there is a limitation on the size of the deployment package of 1.5GB that you should be aware of, I've listed AWS Lambda limitations at the end of this blog post).

When we create the Lamdba function for the first time, we needed to assign it an IAM role. Just like an EC2 instance, we need to give it the permissions to access AWS services and resources to perform its function. The AWS Lambda wizard makes this fairly easy, but it might be useful to take a look at the security policy attached to my S3 Lambda IAM role:

The first part of the policy gives the Lambda Function access to CloudWatch. This will be vitally important for us to debug any issues, generate logs streams and otherwise get a better understanding of what might be going on during execution. The next set of policies govern the permissions we'll need to work with the S3 bucket. Here, the action s3:GetObject and s3:DeleteObject is necessary to get metadata information about the object and then of course to delete the object.

def lambda_handler(event, context):

  ...

The lambda_handler method is what is called when your Lambda function is invoked. In order to invoke this method, I wired it to an event in S3. In this case I wired it to an object creation event for a given S3 bucket. To illustrate this, let me show you what it looked like in the AWS Lambda console (i.e. Lambda in the AWS Console):

Here, I simply selected S3 as the event source. The event type allowed me to chose Object Created (All). I could have chosen a specific HTTP method like Put or Post, which allows you to be more granular on what can trigger the event. Of course, in our case any object placed in the bucket by any HTTP method should trigger this function.

You can also see that I've selected the bucket rax-devnull-000 where the event source will be triggered from, converting this otherwise normal S3 bucket into a bottomless pit. Also, note events that trigger AWS Lambda are either a push or pull event model. In the push event model, AWS Lambda is invoked or pushed by an AWS service. Services that do this are S3, SNS, CloudFormation, CloudWatch, Cognito and SES. The pull model is if AWS Lambda has to poll the AWS service to determine if something happened, as in the case of streams like Kinesis or DynamoDB streams.

bucket = event['Records'][0]['s3']['bucket']['name']

key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'])

The event object is the event message that the event source creates, in this case S3. Every AWS service is going to have a slightly different message structure, so it helps to know what the event message structure is, otherwise this might seem very arbitrary. If you refer to this event message structure for S3, this becomes easier to decipher.

To get the bucket name with:

bucket = event['Records'][0]['s3']['bucket']['name']

we are traversing this simplified structure:

Here the 0 is indicating the first and only record.

To grab the object key, I needed to make sure that I converted it from the url encoded version of the string. S3 object keys with spaces in them would appear with "foo+bar.txt" as opposed to "foo bar.txt". This of course caused issues when I tried to use those keynames with s3.get_object or s3.delete_object.

key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'])

I didn't explore the context object much, but it certainly has a lot of valuable information about the execution runtime. An important one is context.get_remaining_time_in_millis(). This gives you how much time is left before your Lambda Function is forced to return. Using this value, you can progress your function to a stable state and persist any data. This gives you a deterministic way to essentially save your work before you lose it forever. AWS Lambda Functions are stateless in nature and have a maximum runtime of 5 minutes (the configurable setting is 3 - 600 seconds and is called the Timeout parameter). Generally you'll need to read and write data from DynamoDB or some other persistent datastore. You can read a lot more about the Python context object here.

An interesting aspect of working with S3 is that objects you upload should generally be available immediately, but you have to be aware that there will be multiple copies that need to be distributed. Occasionally, I've run into issues where the object couldn't be accessed because it didn't exist. To guard against that, I used the boto3 waiter object to block until it did exist. This wasn't generally needed but just a precaution.

waiter = s3.get_waiter('object_exists')

waiter.wait(Bucket=bucket, Key=key)

I retrieved some metadata information about the file, printed it and than deleted the object.

response = s3.head_object(Bucket=bucket, Key=key)

print("CONTENT TYPE: " + response['ContentType'])

print("ETag: " + response['ETag'])

print("Content-Length: ", response['ContentLength'])

print("Keyname: " + key)

print("Event-> principalId: " + principal_id)

print("Deleting object..." + key)

s3.delete_object(Bucket=bucket, Key=key)

Starting out you will undoubtedly encounter issues with your code. Fortunately AWS Lambda creates a logstream that is available in CloudWatch. To find your log group, you simply combine the prefix:

/aws/lambda/null_bucket

Note that all Lambda Functions will have a prefix of /aws/lambda

Here is a sample view of the logs generated when I uploaded a file:

This page can be found within the CloudWatch console in the navigation pane "Logs".

I'd like to conclude with some limitations that AWS Lambda has. While I think AWS Lambda is quite awesome, it isn't going to work for every use case. The following limitations should help frame the use cases that AWS Lambda is suited for:

  • 100 concurrent Lambda executions per account
    • Average execution time * number of events or executions processed = number of concurrent executions
    • i.e. 3 sec average function time * 100 events from s3 bucket to lambda function = 300 concurrent executions
  • 300 seconds/5 minute execution duration
  • 512 MB /tmp capped temporary storage
  • Invoke method call to kickoff a Lambda Function is limited to 6MB for request and response
    • Invoke call can't issue a HTTP POST with a payload of 6MB or greater
    • Invoke call can't receive/respond with payload of 6MB or greater
  • Process and Threads can't exceed 1024 combined
  • File descriptors can't exceed 1024 combined
  • Package Deployment Sizes
      • Compressed Deployment jar can't exceed 50MB
      • Uncompressed Deployment jar can't exceed 250MB
      • Entire set of deployment jar files for all of your Lambda Functions can't exceed 1.5GB

In my next blog post, I'll explore some more advanced use cases using AWS API Gateway and third party services. Hopefully this has been an informative introduction to Python on AWS Lambda. I look forward to writing another blog post soon!

Visit http://www.rackspace.com/aws for more information about Fanatical Support for AWS and how it can help your business. And download our free white paper Best Practices for Fanatical Support for AWS, which offers further detail regarding implementation options for AWS, including identity management, auditing and billing.