Improving CloudWatch Alarms for Lambda Errors

A simple solution to improve the CloudWatch alarm to receive Lambda error details by email. Available as CDK construct or as CloudFormation. Tuesday, October 3, 2023

This article is part of a series that looks at solutions for improving CloudWatch Alarms for Lambda errors:

  1. Improving CloudWatch Alarms for Lambda Errors (here)
  2. Tips and Tricks for Developing CDK Construct using Projen
  3. CDK Escape Hatches + How to Export CDK to CloudFormation

You can find a full solution here.

AWS CloudWatch is not the best service for monitoring your system. It has a ton of features, but many of them are awkward to use. But at the same time, it is baked into AWS, and you do not have to look for something else. And for a large number of solutions, it is good enough.

A simple solution to improve the CloudWatch alarm to receive Lambda error details by email. Available as CDK construct or as CloudFormation.

One of the not-so-nice things is notifying of Lambda errors. Typically, you would attach an Alarm to the Lambda error metric. The Alarm would send a message to SNS, which would send it to your email or SMS. That is a relatively common way, but not the only one. One benefit of this approach is that you do not get a ton of messages. You just get one for the Alarm being in the error state.

But this notification does not contain any details on an error. You will have to open the AWS console, find the appropriate log group, open CloudWatch Logs Insights, write a query, and after a few minutes, you will have your error message, which quite possibly is an unimportant one-time occurrence of some timeout. Of course, you can simplify this process, but wouldn't it be great to get a sample of errors by email? Most of the systems for monitoring and logging, like Sentry, provide that out of the box.

I will present a simple solution for getting error messages by email. It is great for smaller projects when you do not want to configure other error-logging solutions or for older systems where you do not wish to alter them just because of that.

The solution is in two forms:

  • CDK construct
    If you are building your system with CDK (or SST). Available for TypeScript, Java, C#, Python, and Go.
  • CloudFormation
    For existing solutions, so you do not have to modify them. You deploy and point to the existing SNS used for CloudWatch alarms.

How does it work?

  1. Lambda is subscribed to the SNS topic where you receive your alarms. There is a message body subscription filter that limits errors based on the Lambda error metric. You must change the filter if you defied your metric in some other way, not the default one.
  2. Lambda analyzes the message, finds a log of the failed Lambda, and queries the log group for the period that the metric was configured, plus some additional safety time, so we do not miss that error message. It extracts just a number of errors that can fit in one SNS message.
  3. Lambda sends errors to the same SNS that you use for alerts. So, apart from the Alarm message for changing the error state, you will receive an additional one with detailed error messages.

lambda-error-sns-sender

The solution is simple and effective.

Using CDK Construct

Construct is published in all repositories, so apart from TypeScript, you can use it in Java, C#, Python, and Go.

Here are instructions for TypeScript:

Install construct:

npm install lambda-error-sns-sender -D

Import the module:

import { LambdaErrorSnsSender } from "lambda-error-sns-sender";

Use the construct:

new LambdaErrorSnsSender(stack, "lambda-error-sns-sender", {
  // all SNS topics that you are using for sending Alarm notifications
  snsTopics: [snsTopicAlarm],
  // you probably do not need more than 15 error messages in the email
  maxNumberOfLogs: 15, 
});

You can find the code for the project here.

Using CloudFormation

  1. Open the CloudFormation from: https://lambda-error-sns-sender.s3.eu-west-1.amazonaws.com/lambda-error-sns-sender.yaml
  2. Specify at least one SNS topic in the form of ARN.

Summary

There are several drawbacks to this solution. The first is that you only get a sample of errors. This means that you could miss some critical ones. The other is that the email with the error log is unformatted and is not so easily readable.

But there are quite some benefits. It is straightforward to set up. You do not depend on any external system for error logging. And for me, the most crucial advantage is that when errors start raining in, you only get one email with error details. Not one for each error.

Although this solution is simple, it is an excellent base for additional articles. Stay tuned for the continuation, where I will present how I developed the CDK construct with Projen, some tips and tricks with CDK, and how to export CDK to CloudFormation.