Minimizing AWS Lambda deployment package size in TypeScript

CATEGORIES

Tech

AWS Lambda package size matters because of at least two reasons. The first one is the size limitations of the platform. At the time of writing of this article, deployment package size limits are 50 MB for zipped and 250 MB for unzipped functions including layers.

The second reason is the cold start time. AWS Lambda is a proprietary platform and we cannot check how exactly the function start is implemented but the experiments show that the functions with many dependencies can be 5-10 times slower to start. Though these numbers can be changed in the future for some AWS internal optimizations, they still give us food for thought and encourage minimization of the size of Lambda functions if possible.

Speaking about the future, Amazon announced Provisioned Concurrency, a feature that ensures that the Lambda function begins executing developers’ code within double digit milliseconds of being invoked.

In this article, we will step-by-step decrease the size of a simple GraphQL + DynamoDB Lambda function written in TypeScript. Other tools that we are going to use are Serverless Framework and webpack. You can find the initial project on Github in the ‘master’ branch. All optimizations are stored in the ‘step-*’ branches. Most of the concepts can be applied to the AWS Lambda functions written in JavaScript.

Initial Project

The simplest way to start development of the Lambda functions in TypeScript is to use Serverless Framework with serverless-plugin-typescript. This is the content of a serverless.yml file of our initial project:

From the serverless.yml file, we can see that we have two Lambda functions: authorizer and handler. The authorizer function is provided just to give an example of multiple functions inside of one project. In fact, it always allows execution of a request if it contains a non-empty ‘Authorization’ header.

And here is the content of the package.json file:

We have only a few dependencies. But even for such a simple project and small number of dependencies, the size of an AWS Lambda package will be 5.3 MB.

Also pay attention that by default, Serverless Framework creates one package and deploys it to all our Lambda functions. So that’s what we have for the initial project after we package our functions with the `sls package` command. You can also deploy the packages with ‘sls deploy‘ (if you are unfamiliar with Serverless Framework and AWS account configuration, read more here):

– handler package size: 5.3 MB
– authorizer package size: 5.3 MB

These deployment packages contain all npm dependencies (devDependencies are excluded) and JavaScript files transpiled from our TypeScript sources.

Step 1 – Introducing webpack

Webpack is a well-known tool serving to create bundles of assets (code and files). Serverless Framework has a webpack plugin that integrates into serverless a workflow and bundles the lambda functions.

We can now delete `serverless-plugin-typescript` and install `webpack`, `serverless-webpack` and `ts-loader` – a loader that will transpile our TypeScript code into JavaScript:

`npm remove serverless-plugin-typescript && npm install –save-dev webpack serverless-webpack ts-loader`

Usually “webpack for backend” tutorials recommend installing and using a `webpack-node-externals` plugin. Let’s follow this advice and then analyze the results:

`npm install –save-dev webpack-node-externals`

Let’s replace `serverless-plugin-typescript` with `serverless-webpack` in the serverless.yml file.

Now we can add the webpack configuration. By default the plugin will look for a webpack.config.js file in the project root directory.

Here is our webpack.config.js:

There are three important things to mention here. The first one is that the webpack plugin will create a chunk for each function defined in the serverless.yml file. It’s achieved with the help of the `slsw.lib.entries` object. The second one is the webpack rule to apply ts-loader to our ‘*.ts’ files.

The third one is to include our npm dependencies into the bundle as externals (which means without processing them with webpack). From `serverless-webpack` docs:

“All modules stated in externals will be excluded from bundled files. If an excluded module is stated as dependencies in package.json and it is used by the webpack chunk, it will be packed into the Serverless artifact under the node_modules directory.”

`webpack-node-externals` scans the node_modules folder to create an array of modules and sub-modules that shouldn’t be bundled. So we only need to add this parameter to the serverless.yml file to make our new solution work:

Okay, we are now ready to check the size of the package again. Let’s run the `sls package` and… it’s the same 5.3 MB. Technically it actually became 3 KB bigger.

Let’s analyze the size of our bundle. We can do it using an excellent webpack-bundle-analyzer plugin.

Following the instructions in the plugin’s README, we can generate this image:

webpack-bundle-analyze result without bundled packages

It shows that the size of the bundled files is only 6.11 KB for the handler function and 1.16 KB for the authorizer. It means that a significant part of our final package is taken by node modules that we copied there without any processing. It’s interesting that even if our own code is now minimized, we still have an extra 3 KB of size comparing to the initial package. The reason is that our package now contains the package-lock.json.

It’s worth mentioning that if our project contained more of our own code then even after this step we should be able to see smaller package sizes compared to our starting point. But so far we have the same numbers:

– handler package size: still 5.3 MB
– authorizer package size: still 5.3 MB

Step 2 – Bundle node_modules (be extra careful!)

Okay, we now understand that node modules obviously account for most of the space of our package. And we intentionally did it using `webpack-node-externals`. But do we really need it? As the documentation says:

“When bundling with Webpack for the backend, – you usually don’t want to bundle its node_modules dependencies”

and it refers to an article Backend Apps with Webpack that provides a detailed explanation:

“Webpack will load modules from the node_modules folder and bundle them in. This is fine for frontend code, but backend modules typically aren’t prepared for this (i.e. using require in weird ways) or even worse are binary dependencies. We simply don’t want to bundle in anything from node_modules.”

As an example, the author provides express.js framework that has some binary dependencies that can lead to an error if run with bundling.

But in our case we most probably don’t have any binary dependencies. So let’s try to bundle our project without `webpack-node-externals`.

After removing ‘externals’ from webpack.config.js and running the `sls package` command the size of our result zip file is 1.2 MB. Here is the image produced by webpack-bundle-analyzer plugin:

webpack-bundle-analyze result with bundled packages

And we can see something interesting. Yes, we have all our npm dependencies bundled, but among them we can see `aws-sdk` which is provided by the AWS Lambda environment and because of that, was purposely moved to the devDependencies. But with the current configuration, webpack doesn’t know that it should ignore devDependencies. Let’s add `aws-sdk` to the array of externals in the webpack.config.js and package our functions one more time (again, ‘externals’ prevent bundling of certain imported packages and instead retrieve these external dependencies at runtime). Now `aws-sdk` has disappeared from the bundle:

webpack-bundle-analyze result without aws-sdk

And the size of our functions is:

– handler package size: 445 KB
– authorizer package size: 445 KB

This step is marked as be extra careful. And the reason is that you should double check that your bundled dependencies don’t rely on any binaries, otherwise you will have troubles in production. One option to check that you’re safe is to implement good end-to-end tests of your deployed Lambda functions. Pay attention that unit tests won’t help you here because all node_modules will be in scope without webpack processing. If you happen to know that a specific npm package has binary dependencies, you can add it to the ‘externals’ block in the webpack config and still bundle all other packages.

* bundling the dependencies in our sample project will cause two warnings: “Module not found: Error: Can’t resolve ‘bufferutil'” and “Module not found: Error: Can’t resolve ‘utf-8-validate'”. It’s not a fault of our solution and definitely not a flaw in webpack. The reason is that one of our dependencies is trying to import these modules but they are not listed in any of the package.json files. If you want to understand the reason and find the ways to get rid of the warning, you can read this discussion on GitHub.

Step 3 – Package: individually

You already noticed that we always show the package size for two functions: handler and authorizer. But so far we always had one package deployed for both of them and the numbers were the same. But it doesn’t make sense especially because the authorizer function is of hundreds times smaller than the handler. You can see it in the last picture of the bundle analyzer. The small violet rectangle displays the relative size of the authorizer bundle very well. To produce separate packages for the separate lambda functions, we can simply add the following option to our serverless.yml file:

And here we get our final numbers that are ~10 times smaller than the initial one for the handler and ~7000 times smaller for the authorizer package:

– handler package size: 445 KB
– authorizer package size: 744 B

This step was very trivial and probably could be the first one, but `serverless-plugin-typescript` ignores the `individually: true` option so we delayed it until the webpack config was in place.

Conclusion

To summarize, when you are writing AWS Lambda functions in TypeScript, you can start with the convenient `serverless-plugin-typescript`. But once you need to optimize the size of the deployment packages you most probably need to tune your packaging process with webpack. You can start with individual packaging and continue with not only your source code bundling but also with the npm dependencies bundling. But make sure that these dependencies don’t use any binaries that can be dropped by webpack during the bundling process because this can lead to errors in production.

This article provided the basic configurations that served only one goal – showing how to minimize the size of AWS Lambda functions in TypeScript.

Webpack is a very powerful tool with many different configuration options that can help you to tune the bundle according to your needs, for instance, to add source maps or improve build process speed using the caching mechanism.

Blog

Minimizing AWS Lambda deployment package size in TypeScript

Our Senior Developer Vitalii explains how to significantly reduce the deployment package size of AWS Lambda functions written in TypeScript...

Blog

Problems with DynamoDB Single Table Design

Single Table Design is a database design pattern for DynamoDB based applications. In this article we take a look at...

Blog

Modular GraphQL server

Read about Kari's experiences with GraphQL modules!

Get in Touch

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








Modular GraphQL server

CATEGORIES

Tech

I have been working on a customer project to build a new eCommerce platform with GraphQL API. We ended up using GraphQL API instead of REST API, because there were multiple backend services that needed to be used and compose the actual API for the frontend. We also needed to reduce loading times, as the frontend would only send one request instead of multiple requests for different REST endpoints. We could have used custom endpoints for the frontend, but that would have been a maintenance burden. It would have required updates for each change, when the  frontend needs new data or because of some other reasons.

We started to build the API using Apollo GraphQL server and Typescript. We had the first versions running in no time to support frontend development. We started to add features and data structures one at a time. 

After a while, as we were adding new features and data structures, it was obvious that we needed to keep the code base modular and split the main types into their own components. It was difficult to find a good way to split the code and schemas, and not to introduce a custom solution and complexity when using the Apollo server.

I tried to split the code so that we had resolvers and schemas in their own folder for each main type and merging them at a  higher level, but there were some modules whose dependencies were too tight. I also tried to use schema stitching, which introduced its own problems, like resolvers were not executed when another schema called this one.

One issue was also with the request context, that it was defined in one place in the Apollo server configuration when the server was created. We had multiple things that we needed to pass to modules via the context object, like getting user data and language selection etc. All those were coming via the HTTP headers, so we needed to parse it and pass it along as context. There were things that only one module needed, so defining them at the main application level didn’t seem to be the correct solution.

Also testing in both solutions was a little bit difficult. You could do unit tests for resolvers, but couldn’t really test the schema itself, unless you loaded the whole application. So the module couldn’t really test its own schema.

GraphQL modules

One day I came across a post about a new library called GraphQL modules. The name already told me that this was something that I had been looking for, so I went ahead and started to learn more about it.

GraphQL modules, as the name implies, was just the thing we were looking for. Its purpose is to make it possible to split your GraphQL server into modules and also have clear dependencies between modules.

A basic application with GraphQL modules can be defined like this:

Not that much different compared to creating a schema for an Apollo server.

Module dependencies

The power of GraphQL modules comes with splitting the application into modules, and defining dependencies between the modules and providers (more about providers in the next chapter). Importing a module as a dependency can be done like this with the imports property:

A module can import other modules and schemas that will be automatically merged. A common way is to define one application module, which only imports all the needed higher level modules for the application:

Providers

GraphQL modules introduce dependency injection, and providers are classes, functions, configuration or other values that can be injected to resolvers or other providers. Providers are defined in the module and modules can also get them from the imported module, so you can create a module with shared services, like a library to be imported.

It is possible to manage provider lifetime with the decorator configuration option `scope`, and there are three options available:

  • Application – Singleton, created once for the application.
  • Session – Created for each GraphQL request.
  • Request – Created for each injector request.

This helps to manage things like connection pools for databases or create ids for logging to group by client request etc.

Testing

With the plain Apollo server, testing was mostly unit tests that tested / called single resolver functions. If you wanted to test resolvers through the GraphQL schema, most of the time you ended up loading the whole application. 

With GraphQL modules it is easier to test at the module level and not only at the function level. Each module is a complete application, so tests can load them easily and run queries against the module. Dependency injection also makes testing easier as tests can mock services that the injector will return. Below is an example of how to load a module mock services, and execute a query in a test case for a module that is in the providers section.

With the ability to test at the module level, you can have a better view of what data is coming in, as well as its format. GraphQL schema validation is done before resolvers are executed, and by running the queries in test cases, you can make sure that the resolvers are executed as expected and with the “real” inputs from the query.

Conclusions

After finding the GraphQL modules, it has been my number one choice when starting to build new GraphQL servers. Coding is much more straightforward when code can be split into its own modules. 

Code reusability is also higher when you can create modules that are just libraries that can be imported. There are many common things that different applications have, like getting user tokens from headers or getting user language settings etc. You can have common modules for those things that different applications share.

One caveat when starting with GraphQL modules and Typescript is the need to setup the ‘reflect-metadata’ library. It must first be imported into your application to ensure that type definitions work with GraphQL modules. It also needs to be in your test files. Nevertheless, it is just one line that needs to be added as the first thing in the file.

To make type definitions easier to split, I have also used a library called `graphql-import`. It makes it possible to import `.graphql` files and use type imports in the schema definitions.

The documentation of the GraphQL modules is really good with clear examples, so you should definitely read more from there.

I also created an example project, based on my colleague’s demo application built on AWS AppSync, where you can see these in action: https://github.com/KariHe/demo-graphql-modules-in-gcp

Blog

Minimizing AWS Lambda deployment package size in TypeScript

Our Senior Developer Vitalii explains how to significantly reduce the deployment package size of AWS Lambda functions written in TypeScript...

Blog

Problems with DynamoDB Single Table Design

Single Table Design is a database design pattern for DynamoDB based applications. In this article we take a look at...

Blog

Modular GraphQL server

Read about Kari's experiences with GraphQL modules!

Get in Touch

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.