AWS Serverless Common Mistakes - Performance, Scalability, Optimization, and Monitoring (3/7)

Even with serverless you can have problems with performance and scalability. Learn how to avoid them and how to optimize and monitor your solution. This is part of a multipart blog post about serverless common mistakes. Friday, February 15, 2019

This is continuation of a blog about common mistakes when building serverless systems on AWS. It is split into the following parts:

Cold Start

When a container that runs a function starts, you get a slow first execution called a cold start. It occurs once for each concurrent execution of your function. So, do not think it will just be for the first user and the following 1,000 users will not be affected. If they send the request at the same time, they will be. API Gateway might queue up requests for a short time in hope that one of the concurrent executions will finish and can be reused.

So, a slow cold start is important for every user-facing function and other scenarios where you need a fast response. If you take into account the following recommendation, you will be safe and your user will not notify a cold start:

  • Use Nodej.js, Python, and Go, which offer a fast cold start. Java and .NET do not!
  • You should avoid VPC, which adds another 10 seconds to a cold start up. The problem will soon be significantly reduced.
  • Do not over bloat your code and do not do time-consuming operations on start up. Limit the number of libraries you use.
  • Use a higher memory setting for critical functions.

C# (.NET) and Java need to bootstrap a heavy virtual machine and language runtime. Node.js (JavaScript) and Python are interpreted languages and have a lightweight runtime.

A partial solution for a cold start is running cron job (CloudWatch Rules) every 5 mins or if you use VPC, every 15 minutes to run a function. Why every 15 minutes for Lambdas in VPC? Because they are kept warm longer, because of a huge cold start. Periodically triggering Lambda is already implemented in the excellent library Lambda Warmer. You can also set how many concurrent functions you want to be started. See a detailed explanation here. This will not completely remove the problem, but will significantly reduce it.

Use Lambda Warmer to mitigate the impact of cold start

Read more about cold starts in the article I'm afraid you're thinking about AWS Lambda cold starts all wrong.

When dealing with cold starts, also think about the time you need to initialize your service, such as loading data into memory and the huge libraries you use. If your service depends on a lot of preloaded data in memory then your cold start will be proportionally longer.

When you invoke warm Lambda there are no significant differences between languages. Most function are small and do a lot of IO operations and no amount of optimization could make this much faster. Only one request is processed in a function at a time, so even the most efficient concurrency system does not show any strengths here. If you are doing heavy processing in Lambda, C# and Java could be a better choice. But Lambda is limited to 15 minutes, so there may not be a lot of processing here.

VPC

The following will soon be obsolete:

Amazon Virtual Private Cloud (VPC) provides users with a virtual private cloud, by provisioning a logically isolated section AWS Cloud. With VPC, you can restrict access to resources (EC2 instances, databases…) and also connect your on-premises infrastructure to be part of the same network.

With VPC, Lambda functions can access services such as RDS, Elasticache, API Gateways, and EC2 instances. Lambda needs to create Elastic Network Interfaces (ENIs) to securely connect to resources within your VPC Lambda. A new ENI needs to be set up on most cold starts. It adds an additional 10 seconds to a cold start.

Do not put Lambda in VPC only for security reasons. Lambdas are secure enough. You should put Lambda in VPC when you absolutely need access to resources that can't be exposed to the outside world.

Do not put Lambda in VPC only for security reasons. Lambdas are secure enough.

For Lambda in VPC, it is a good recommendation to limit the number of concurrent executions so that you do not run out of ENIs or IPs. Number of ENIs is soft limit and can be increased upon request. Lambda currently does not log errors to CloudWatch Logs that are caused by insufficient ENIs or IP addresses.

Another common problem that can exhaust ENIs involves IAM permissions — if a Lambda execution role cannot detach ENIs, it will leave them running even after the Lambda has executed. Make sure the role executing your Lambdas has flowing permissions:

  • ec2:CreateNetworkInterface
  • ec2:DescribeNetworkInterfaces
  • ec2:DeleteNetworkInterface.

Higher memory functions need more ENIs and that means longer cold starts. A 1GB function would need to create a new ENI during one-third of cold starts, whereas a 1.5GB function would need to create a new ENI during one-half of cold starts (source).

When you add a VPC configuration to a Lambda function (AWS docs), it can only access resources in that VPC. If your function needs to access the internet, then you need to create a managed NAT Gateway. You pay for data transferred through it.

Other recommendation on VPC and resilience:

  • Always configure a minimum of two Availability Zones.
  • Set Lambda functions their own subnets.
  • Set Lambda subnets a large IP range to handle potential scale.

More about Lambda and VPC here.

Reusing Data

A container that runs functions are reused. You can store data you regularly use as a variable in memory in a file on disk, but do not exaggerate. Loading data before the handler is invoked increases a cold start. And Lambdas are limited in memory and disk. Global variables are especially suitable for secret data such as a connection string.

Enable HTTP keep-alive

AWS SDK by default does not enable HTTP keep-alive. For every request, new TCP connection has to be established with a three-way handshake. You can clearly see the benefits when communicating with DynamoDB.

(Credit Matt Lavin)

const AWS = require('aws—sdk');
const https = require('https'); 
const sslAgent = new https.Agent({ 
    keepAlive: true, 
    maxSockets: 50, 
    rejectUnauthorized: true 
});
sslAgent.setMaxListeners(0);

AWS.config.update({ 
    httpOptions:{ 
        agent: sslAgent 
    } 
}); 
const DynamoDB = new AWS.DynamoDB.DocumentClient()
Improve #DynamoDb execution with the HTTP keep-alive

Scalability

Lambda scales according to incoming traffic until you hit the concurrent execution limit. One Lambda processes only one request at the same time. The number of concurrent Lambdas that can run at the same time depends on the specified concurrent execution limit, which consist of two types:

  • Account level

    Amount is shared between all Lambdas, which does not have a configured reserved limit. You can quite easy hit the account level concurrency limit if you do some vast processing of events. This can be raised by sending a request to AWS.

  • Reserved for each Lambda

    Setting this means:

    • Lambda can reach this limit and it will not have to compete with other functions.
    • Lambda cannot exceed this limit. This is useful if you want to have only a limited number of Lambdas running at the same time.
    • For a specified number, lower the account concurrency that is shared between other functions. You can't reserve the last 100 concurrent executions, which are left to functions that have not set reserved concurrency.

You should set reserved concurrent execution limit and subsequently limit scalability when:

  • Connecting to SQL databases or other systems that have limited connections.
  • Communicating with systems that are not inimitable scalable.
  • Lambda in VPC to not run out of ENIs or IPs.
  • When you want to limit scaling for security, architecture, or other reasons.
You should set Lambda concurrent execution limit

Calculating Scalability

If you have functions requiring a high throughput, use this calculator to help identify where your limits lie and get customized recommendations for your account.

Testing for Scalability

Infinitive scaling is the promise of a serverless platform. But does that mean that everything will work out of the box? You have to test your solution for expected load. If you do not, you will probably hit one of the account limits that are protecting you from vastly overspending. Also, testing is needed to validate your architecture. There are many tools to do testing. Nordstrom created a special tool called Serverless Artillery.

Infinitive scaling is the promise of a serverless platform - if you do it right and also test it

Optimization

Serverless solutions are cheap because resources are used optimal. But if you do not use services appropriately or you do some stupid mistake, it can cost you. You must plan to optimize your solution — first, when designing it and second, after production if you want to pay only the minimum price. If you have fixed leased resources like in the traditional system, you can be much more reckless with them. It is all down to calculation. But do not prematurely optimize your serverless solution. If you do not make a stupid mistake, it will probably be quite cheap, and you can make it cheaper afterward. Just be aware that you must monitor your solution, especially at the beginning, and you have to plan to optimize your services when you find out that your spending is more than you expected.

Optimize function memory with AWS Lambda Power Tuning

Optimize Function Memory

Lambda is charged by the memory allocation you set in 100 ms intervals of execution time. The cheapest option is to use 128 MB. With more memory, you also get a better processor and network. With less memory, processing takes more time and, in the end, it can cost more. At the same time, your service will run slower. This depends very much on your scenarios. The sweet spot in common scenarios is 1024 MB. To estimate the best power configuration to minimize costs, you can use AWS Lambda Power Tuning. Here is a good video about Lambda optimization.

Optimize Use of Services

Here are some examples:

  • Kinesis Data Stream

    Kinesis Data Stream does not auto scale. You must ensure that you have an optimal number of shards.

  • CloudFront

    Is caching optimal? Are you caching static files and returning optimal caching headers?

  • DynamoDB

    Are capacities configured optimal? Set appropriate minimum and maximum read and write capacity. If you cannot predict your load, you can use a new on-demand capacity that adapts to your needs.

  • SQL database

    Is your SQL database at optimum capacity? Do not pay too much or too little. You could use Aurora Serverless that scales automatically, but it is still in its early stages.

Optimize the Amount of Data You Handle

When optimizing, do not forget to read as few data as possible and properly index the database. In SNS and S3, you can apply filtering that limits the events sent to the Lambda. With SNS message filtering, you specify which message topics are sent to a function. In S3 event prefix, suffix, and event type, you can filter which files are processed.

Optimize Communication Layers

You can invoke Lambda functions directly with AWS SDK instead of routing it through an API Gateway. This is especially good practice if you call Lambda from other Lambda functions.

You can invoke Lambda functions directly with AWS SDK instead of routing it through an API Gateway

On the other hand, you can write to DynamoDB, SQS, SNS, Kinesis, and so on, without Lambda thought API Gateway.

Monitoring, Logging, and Tracing

One challenge in microservice architecture is monitoring and logging. In moonlit solution, we mostly had only one log where we logged everything we would possible need, including stack trace that mostly just pointed us to the line which had the problem. This is much more complicated in microservices solution where requests flow from microservice to microservice. It is very important to correlate a log of the same request that goes through different services. The fundaments to linking those logs together is to have a common correlation ID that is used to identify each request in all of the logs to services that specific request was involved in.

AWS has basic services for monitoring, logging, and tracing, CloudWatch for logging and monitoring, and X-Ray for tracing. CloudWatch is on by default. It logs basic info about function execution. You can write additional data with console.log(data) if you are using Node.js. CloudWatch became even more useful with the release of CloudWatch Logs Insights, which enables you to search.

Another great service provided by AWS is X-Ray. It allows you to visualize connected services and calls that flows though deferent systems so you can see architecture, execution time, error rate, and so on, of your system.

If you use Serverless Framework, you can enable it in serverless.yml

provider: tracing: true

For more data, you should do small code changes. You can trace requests from service to service and add additional data to each request, which you can even search by. You can turn it into an incredibly powerful tool that can give you deep insight into what the system is doing with each request visualization.

You should also monitor your service. You can use CloudWatch Metrics. In addition to existing metrics, you can add your own. Based on metrics, you can create alarms to notify you of uncommon activities. Custom metrics can be especially useful because you can also monitor the business side of the system — for example, how many orders, registrations, or user actions were made in a certain period.

In production, you should log only what you need or you should sample debug logs for a small percentage of invocations. Default retention policy never expires! A common practice is to ship your logs to another log aggregation service and to set the retention policy to X days. You can set retention time in Serverless.yml with property 'logRetentionInDays' if you are using Serverless Framework.

AWS tools are pretty good now that we have CloudWatch Logs Insights and X-Ray. But for serious monitoring and logging, you should think about third-party solutions such as IOpipe, Logz.io, DataDog, Epsagon, Thundra, Sumologic, Splunk, Loggly, or AWS-hosted Elasticsearch. For some of these, you can get Lambda Layer, a new feature that simplifies adding additional libraries.

When you are sending logs to an external system, you must watch out for how much execution time is used. If you do some asynchronous batch processing, that may not be the problem, but if a user is waiting for response, logging could take more time than original processing. If you use CloudWatch or X-Ray, spent time is negligible because data is shipped asynchronously in the background. If you are calling other services, you have the following options:

  • Wait for response

    This is the simple solution but, of course, not optimal and suitable for offline processing. You can do some work while waiting for response to optimize this solution.

  • Fire and forget

    There is a chance your logs will not be written, but you will not spend unnecessary time waiting for confirmation.

  • Log to CloudWatch and then ship to another system

    This is the most complicated solution, carefully described here. CloudWatch triggers another Lambda that transfers data to another log system.