How to add Python Pandas Layer to AWS Lambda

Abhik Chakraborty
The Startup
Published in
5 min readApr 12, 2020

--

There is no question about the brilliance of the Pandas library. One of the most used libraries in data science, Pandas is a household name for data scientists, analysts and general data enthusiasts.

However, as a person who is fascinated by serverless architectures, I was aghast to see that AWS Lambda — one of the most famous serverless architectures by AWS doesn’t support Pandas out of the box.

Error shows how AWS Lambda is incapable of importing ‘pandas’

It is quite frustrating to see how such an awesome library is left out by AWS, leaving data engineers like me looking for hacks and fixes around it.

However, this had to end somewhere. And, this ends here. After following numerous tutorials, forums, articles and knowledge manuals, I am now able to narrow down the steps of loading Pandas library into AWS and after following this article, I hope you will be able to to do that as well.

So let’s get started:

Getting required files

Pandas library is nothing but a set of files and that we ‘import’ to our python code to perform required operations. To ensure that the same operations are possible on a serverless architecture like Lambda, those files have to be available for to the code whenever we import the library. To ensure that, we would first do the following steps and gather all the files together at one point.

  • AWS Lambda works on Linux. Therefore, it is very important to note that only files that are compiled to run under a linux environment can be used here. To get the linux compiled files, go to the link: https://pypi.org/project/pandas/#files and download the relevant wheel (.whl) file.
  • The wheel file which works for Python 3.7 on AWS Lambda is : pandas-1.0.3-cp37-cp37m-manylinux1_x86_64.whl
  • For Pandas to run on Lambda, an additional support of Pytz library is also required. To download the required wheel file, go to https://pypi.org/project/pytz/#files and download the wheel file: pytz-2019.3-py2.py3-none-any.whl

Preparing the package

Once the files are downloaded, it is required by us to prepare a package that can be uploaded to AWS Lambda and is self sufficient to power any python program on that serverless architecture. Following steps are to be followed for it:

  • Create a folder named python. This name is very important. Please ensure that the folder name is exactly that
  • Once the folder is created, unzip the .whl files for Pytz and Pandas into that folder. Once the unzip is done, the folder will look like this:
Screen capture of ‘python’ folder after .whl files are unzipped
  • Once the unzip is completed, the final step is to zip the python folder to python.zip. Here again , the name is very important and should be exactly that

Adding AWS Layer

Once the zip folder is created, this is the last and most important step. To ensure that the files are reusable throughout AWS Lambda on any python code, we will add the zip folder as a Layer on AWS. Now, you may ask, what is a layer. A layer is nothing but a medium to add and reuse additional code throughout Lambda. For more information, you can go through the following link: AWS Lambda Layers

Now, getting back to the solution, we will now add our zip folder as a layer to Lambda and these are the steps that are to be followed:

  • Login to your AWS account and go to the service Lambda
  • After going to Lambda, click on the Layers link on the left
AWS Lambda layers can be found on the left
  • After clicking on layers, a new interface would open, please click on Create Layer
Create Layer can be found on the top right
  • After clicking on that, the new interface would open with some options.
Layers Interface
  • Enter Name as pandas and add an optional description. You can choose to upload the zip file through S3 or do a direct upload. I went with the latter
  • For compatible runtimes, which refers to the version with which the layers would be compatible, I chose Python 3.7 as the .whl file was compatible with Python 3.7
  • License information is optional and can be left blank
  • DONE! Do a victory dance now!

Code Setup

Of course, after going through all the steps, we would so want to import the pandas library and enjoy the magic. However, before doing that, we will have to set up our Lambda code every time with the following two steps:

  • Once the Lambda function is created, add the pandas Layer that was just created
  • In addition to that, since pandas is dependent on numpy, we add numpy layer as well. It is good for us that AWS provides numpy support out of the box. To add it, add the AWS provided AWSLambda-Python37-SciPy1x layer to the code
The two layers required for pandas to run

Testing the library

Now is the time to see the magic. After the layers are created and added to the code, simply import the pandas library, add your required code to the Lambda function, save it, and Test.

Code snippet with pandas imported
VOILA! 😎

This concludes my tutorial. I hope this will help all the data engineers and scientists to explore the endless possibilities of serverless architecture and enjoy the great features of Pandas library along with it.

--

--