Hello everyone, I know that it has been a long time since i wrote a post. I hope every thing is going well with you. I always wanted to consistently write here. But lets see if I can make it possible sometime soon :)
What is this post about?
Today, as usual i started off the job in the morning. I’m at a juncture point of building a part of the project. There again, we need yet another micro-service. So, I started writing a piece of script (Python) that needs to be deployed in AWS Lambda. This usually should only take couple of hours, max. But, after that i wrote it, I realized that, half the time writing the script was spent on debugging an issue that i did not even understand how/why it happens. After finding the why/how, I thought it’ll good to share it here and someone will find it useful on the way.
About the script
For someone who is unaware about ‘micro-services’, its just a piece of code sitting outside of your app doing some part of your app. This includes, collecting data from some API, populating database and more. This piece of code could be sitting anywhere, and not only in AWS Lambda. I mean, we use AWS because we’re lazy enough to Dockerize the scripts, maintain/update it and use Kubernetes on top of that to make it even worse. This does not mean I/anyone discourage you, not not use it :)
Define the problem!
Again, the reason why half the time writing the script was spent on debugging was because, In Python, I had a list/array which was defined in global space and some functions were updating the same.
myresut = []
def foo() -> None:
# collect some info from api, append to 'myresult'
myresult.append(newdata)
def bar() -> None:
# do something with myresult
def main():
foo()
bar()
Generally speaking, this code is super valid and not an issue what so ever. But, in Lambda, this causes an issue which will tell you nothing about the issue itself but affect the overall result.
Why is this a problem in AWS Lambda
As you saw above, foo()
is writing something to myresult
and bar()
is using the result to do something.
The issue you’ll see here is that, even after the Lambda exits, when run it again after couple of seconds, the data in myresult
is persisted (unless you re-deploy). Which means, foo()
is writing the new data where old data is still inside myresult
. bar()
without knowing it, uses myresult
presuming that its the new data.
In fact, one dirty solution is to clear myresult
after use.
myresut = []
def foo() -> None:
# collect some info from api, append to 'myresult'
myresult.append(newdata)
def bar() -> None:
# do something with myresult
def main() -> None:
foo()
bar()
return myresult
def lambda_handler (event, context) -> list: #<-- Entrypoint of Lambda
final_result = main() #<-- new list
myresult.clear() # <-- Clears the list and leaves nothing inside
return final_result
Just FYI: when we call main()
and assign its result to final_result
, you are creating a new reference of final_result
to the object returned by main()
and is a distinct list
object from the myresult
. This is just to say, not to confuse with pass by reference.
More explanation and references
This should not happen and not how we expect any ephemeral container to work. A container (ephemeral by default unless you setup a volume/storage) runs your code and exits. After exit, there’ll be no trace what so ever that your code just ran inside it. Which means, when you use the container again, It should be starting again as new and fresh like the beginning.
This issue should occur in AWS Lambda AFAIK. But a good suggestion will be to stop using global variables which gets updated (obviously) by other functions. This means, only define constants in global space at least in Lambda.
I could have easily passed mylist
as a param to bar()
and the issue would not have happened. But since it happened, here I/we learnt about it. I guess there is nothing much we can do here to change this behavior.
I know that only some of us are aware about this. And, its the 2nd time that I’m hitting upon this very same issue, which is now an year since the last one. So, I totally forgot about this issue and why I had to spend the same amount of time on it again this time. Not any more! I took the time to write this article and its hard as rock in my mind that “I won’t use global vars in AWS Lambda to get wasted again”.
I must also share an amazing and more technical video explaining this whole Issue in a YouTube video, below. Have a good week then :)
AWS Lambda Global Variables - The Good the Bad & the Ugly - AWS Service Deep Dive