Optimize Your AWS Lambda Code Based on My Mistake

Originally published at: Optimize Your AWS Lambda Code Based on My Mistake - Skycrafters

Serverless is cool. Running serverless code, even cooler. And if you are using AWS, you either are, or are about to start, using AWS Lambda. For those unaware, Amazon defines AWS Lambda as a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration.

This great for two reasons:

  1. It allows builders to focus on business logic instead of infrastructure.
  2. It’s also secure, since AWS Lambda invokes your function in an execution environment, which provides a secure and isolated runtime environment.

Paying only for the compute time that is consumed is also one if its benefits, which made it a perfect choice for a project that my team was working on since this code would run a few hundred times over two days and that would be it. Even better, AWS handles the trouble of spinning up the execution environment when the function is triggered (a HTTP request in our case), executes the code, and terminates it right away. Or does it?

Let me show you how our code made me realize I was wrong about AWS Lambda, and how you can capitalize on my mistake to make your code run more efficiently in AWS Lambda and save you a few bucks in the process!

Our Case

Our AWS Lambda function code had a clear purpose. It would verify a specific S3 Bucket for the existence of 2 objects, where each of their key (“name”) had to match one of the two prefixes we were looking for. Below is a reduced and simplified version of our Python code:

This code worked well in our tests and was approved in the code review process. It returns True when there are two files with the right prefixes, and it returns False when there isn’t. Simple enough.

That wasn’t what happened in real life, however. It would still work in the scenario where the right files are there, but it would, only sometimes, return True when just one of the files were there. And this was frustrating because it was happening in production and only sometimes. Nothing is more annoying than “sometimes.”

Let’s figure this out

Don’t do this at home kids, but the first idea that came to mind to troubleshoot the problem was to edit the Function code and add a “print(files)” between the two “for” loops. This way I would be able to check the files array content before looping through all its contents. Then I checked the bucket and saw that there was just one file there, with the right prefix. Right away I executed the code and in that line the code printed exactly what I was hoping for, array had only the right item, and it returned False, as expected as well.

I was confused. I ran the code again, just because I was out of ideas. To my surprise, the code returned True this time. I was twice as confused as before. That print output now showed the right file there… twice! There were 2 objects with the same key in the same bucket, which we know is impossible. Right away I went to check if versioning was enabled on said bucket, maybe that was the issue! But it wasn’t. Versioning was off. I ran the code again, and once again, it returned False and printed the filename just once.

Feeling more lost than I already was in the beginning of all this, I stopped, took a deep breath, and tried all over again. First execution, False with printing the file name once. Second, returned True printing the file name twice. Third, returned False, printing the filename thrice. Clearly there was something wrong with this files variable.

What I got wrong

Any beginner developer knows the concept of Scope, even if the name isn’t known. For those who don’t, Scope can be defined as a variable that is declared is only available inside the region it is created. If we go back to our code, we can see that the variable files are declared in the global scope (in the function root, outside of any function/method). I moved this declaration to inside the lambda_handler function scope (right before the counter declaration) and it worked! An execution would ways return False and print the filename just once given my bucket content. So, what happened?

We tend to oversimplify everything for our own sake. In Lambda’s case, why would we overthink how the process behind running our code works if in the general case, “it just works?” This was one of the cases that understanding the (fascinating) process was helpful, and it can be to you too.

How it actually works

The Lambda execution environment lifecycle has 3 distinct phases: Init, Invoke and Shutdown. Let’s read their definitions straight from AWS where I’ll highlight a few passages:

  • Init: In this phase, Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers, initializes any extensions, initializes the runtime, and then runs the function’s initialization code (the code outside the main handler.) The Init phase happens either during the first invocation, or in advance of function invocations if you have enabled provisioned concurrency.
  • Invoke: In this phase, Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function invocation.
  • Shutdown: This phase is triggered if the Lambda function does not receive any invocations for a period of time. In the Shutdown phase, Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment. Lambda sends a Shutdown event to each extension, which tells the extension that the environment is about to be shut down.

Not surprisingly, the documentation shoves in my face what we were getting wrong all along: Code in the global scope is executed once in the Init process. An execution environment is only shutdown if it doesn’t receive invocations for a period of time. While there are executions to be processed, the same initiated execution environment will be reused across executions. They even add an illustration to make it extra clear!

How you can capitalize from my mistake

On top of now knowing you don’t want to declare variables that are intent to be unique between executions globally, there is an even bigger way to capitalize on this story. As bad as it is to declare such variables, it is a great practice to declare constants, or variables that are reused amongst executions to save you on both execution time and money.

For instance, let’s say you have a code that connects to a database to query customer data. You could write it as seen below:

Which is great, it works really well. But it isn’t Lambda optimized. We now know we can do better. See below:

These small changes bring a lot of benefits for your code execution.

  1. We are moving the initialization of the library to the global scope, doing this only once across executions.
  2. We are moving the potentially dangerous expose secretCustomerData to inside the function handler. This means it won’t be shared between execution, reducing its exposure.
  3. We are making sure that the act of connecting to the database is made just once across executions, potentially drastically reducing the amount of time to execute this code.

I’m glad I could share my experience so that others could learn from what I’ve been through. But I would like to know, did you know all of this about AWS Lambda? Do you see potential to optimize your own Functions? Tell us more by joining the discussion below!

3 Likes

Hey @raphabot,
Thats an awesome post and an awesome explanation!
I can easily see how these kinds of optimizations could improve the efficiency of an application at scale…
Do you or anyone else have any more tips/best practices/resources to optimize code specifically for lambda functions?

2 Likes

:joy: so true!

I don’t do this at home but I do it at work all the time. Unexpected behaviour? Throw in a log to understand the state when it happens. You can’t put a breakpoint in prod, so this is the next best thing. Maybe don’t edit the function in the console in prod :scream: but adding logs is a superpower.

Great post and a great warning about how you need to be careful with scoping! This can happen in any language, so great reminder not just for Pythonistas.

3 Likes

It is my understanding that this is not strictly true. The execution environment may be recycled even if the function is being called frequently. AWS does not currently define a maximum lifetime for Lambda execution environments, but common lore suggests that a couple of hours is a good guess. This lets the Lambda machinery re-pack execution environments, move them around, and update them without breaking the contract.

Sometimes it would be nice to specify the max lifetime so that you can force re-initialization without having to manage a refresh cycle in your function. So far this feature hasn’t materialized, but one can dream!

3 Likes

Really interesting. I guess I never had a function that was called frequently enough that I could see its runtime environment being recycled after despite being busy. Thanks for the info, @glb!

1 Like

Great writeup @raphabot, sorry you had to learn that the hard way :sweat_smile:

I’ve hit this problem many times in the past, and though it has some very annoying side-effects when mixing the lambda invocation scope with other invocations.
There are interesting usages one can make out of it such as caching, connection persistence and obviously loading libraries, although the latter can be done with Lambda layers

Very true @glb, especially when Lambdas are invoked in parallel to cater to a higher frequency of events to be treated.

Yes, that would be an awesome feature! Did you take a look at the [issue tracker] of (Issues · firecracker-mi) Firecracker (the open-source software behind Lambda orchestration)? Maybe we could create an issue asking for this feature to be built, or even better build it ourselves if somebody knows Rust.

2 Likes

I don’t have specifics about how the code can be optimized, as it is the same node version as elsewhere taking care of the code execution. However, there are a few things to keep in mind when designing a Serverless system.

  • The size of the Lambdas should be kept minimal: this reduces the cold-start times, and removes the dead code that also benefits security
  • One Lambda should be dealing with one concern: divide and conquer, the purpose of a function should be minimal, and the Serverless systems should be designed to use multiple L functions and other Serverless services to do one job each and do it right.
  • Assign one specially defined IAM roles/policies per function, with the Principle of Least Privileges. I.e. you don’t want to give your function a write permission when the function only reads
1 Like

I did not. I’ll check that out as a possible option, thanks!

1 Like

I actually just saw that one of my peers shared another great tip in the same topic. I didn’t know, but the code outside of the handler, isn’t billed for the first 10 seconds and it has access to 3072 MB of RAM!

1 Like