codedeploy canary deployment

Without a good rollout and rollback strategy, there is greater risk of releasing breaking changes or broken software that impacts all users for an extended period of time. This can erode confidence in your releases and customers’ confidence in your products.

canary deployment

Canary deployments can help minimise this risk by first routing a small percentage of traffic to the new version for a configured amount of time, before routing the remaining traffic to the new version. If any errors are detected during the initial routing then all traffic is routed back to the previous version.

AWS CodeDeploy provides native support for canary deployments of Lambdas. The AWS Serverless Application Model (SAM) provides abstractions to more easily configure CodeDeploy canary deployments of Lambdas using CloudFormation.

This blog post describes how to implement Lambda canary deployments using CodeDeploy and SAM, with the added bonus of a pre-traffic automated test Lambda for smoke testing the new version. CloudWatch Alarms trigger automatic rollback on increased error detection during the initial traffic shifting phase of the deployment.

CodeDeploy requires Lambda Versions and a Lambda Alias to provide support for canary deployments. A Lambda Version is an immutable snapshot of a Lambda at a point in time, and a Lambda Alias routes traffic to a specific Lambda Version. CodeDeploy begins a canary deployment by creating a new Lambda Version, then routes a certain percentage of traffic to that Lambda Version. If no errors are detected during the deployment timeframe, which is configurable, CodeDeploy will point the Lambda Alias to the new Lambda Version, thereby shifting 100% traffic to the new version.

Example Serverless Application

Here’s a CloudFormation stack with a simple example of this in action. An API Gateway RESTful API backed by two Lambdas uses CodeDeploy for canary deployments. Test Lambdas are defined to run pre and post-traffic tests against the new versions of ExampleAFunction and ExampleBFunction. CloudWatch Alarms for the alias and current version of ExampleALambda and ExampleBLambda are also defined. The code for this example can be found here.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Runtime: nodejs12.x
    MemorySize: 128
    Timeout: 30

Resources:

  ExampleApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: live

  ExampleAFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      InlineCode: |
        exports.handler = (event, context, callback) => {
        	callback(
        		null,
        		{
        			statusCode: 200,
        			body: JSON.stringify({
        				message: 'Hello World A'
        			})
        		});
        };
      AutoPublishAlias: live
      Events:
        ExampleApiEvent:
          Type: Api
          Properties:
            RestApiId: !Ref ExampleApi
            Path: /example/a
            Method: get
      DeploymentPreference:
        Type: Canary10Percent5Minutes
        Alarms:
          - !Ref ExampleAAliasErrorMetricGreaterThanZeroAlarm
          - !Ref ExampleALatestVersionErrorMetricGreaterThanZeroAlarm
        Hooks:
          PreTraffic: !Ref PreTrafficLambdaFunction
          PostTraffic: !Ref PostTrafficLambdaFunction

  ExampleAAliasErrorMetricGreaterThanZeroAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Lambda Function Error > 0
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: Resource
          Value: !Sub ${ExampleAFunction}:live
        - Name: FunctionName
          Value: !Ref ExampleAFunction
      EvaluationPeriods: 2
      MetricName: Errors
      Namespace: AWS/Lambda
      Period: 60
      Statistic: Sum
      Threshold: 0

  ExampleALatestVersionErrorMetricGreaterThanZeroAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Lambda Function Error > 0
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: Resource
          Value: !Sub ${ExampleAFunction}:live
        - Name: FunctionName
          Value: !Ref ExampleAFunction
        - Name: ExecutedVersion
          Value:
            Fn::GetAtt:
              - ExampleAFunction.Version
              - Version
      EvaluationPeriods: 2
      MetricName: Errors
      Namespace: AWS/Lambda
      Period: 60
      Statistic: Sum
      Threshold: 0

  ExampleBFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      InlineCode: |
        exports.handler = (event, context, callback) => {
        	callback(
        		null,
        		{
        			statusCode: 200,
        			body: JSON.stringify({
        				message: 'Hello World B'
        			})
        		});
        };
      AutoPublishAlias: live
      Events:
        ExampleAApiEvent:
          Type: Api
          Properties:
            RestApiId: !Ref ExampleApi
            Path: /example/b
            Method: get
      DeploymentPreference:
        Type: Canary10Percent5Minutes
        Alarms:
          - !Ref ExampleBAliasErrorMetricGreaterThanZeroAlarm
          - !Ref ExampleBLatestVersionErrorMetricGreaterThanZeroAlarm
        Hooks:
          PreTraffic: !Ref PreTrafficLambdaFunction
          PostTraffic: !Ref PostTrafficLambdaFunction

  ExampleBAliasErrorMetricGreaterThanZeroAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Lambda Function Error > 0
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: Resource
          Value: !Sub ${ExampleBFunction}:live
        - Name: FunctionName
          Value: !Ref ExampleBFunction
      EvaluationPeriods: 2
      MetricName: Errors
      Namespace: AWS/Lambda
      Period: 60
      Statistic: Sum
      Threshold: 0

  ExampleBLatestVersionErrorMetricGreaterThanZeroAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Lambda Function Error > 0
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: Resource
          Value: !Sub ${ExampleBFunction}:live
        - Name: FunctionName
          Value: !Ref ExampleBFunction
        - Name: ExecutedVersion
          Value:
            Fn::GetAtt:
              - ExampleBFunction.Version
              - Version
      EvaluationPeriods: 2
      MetricName: Errors
      Namespace: AWS/Lambda
      Period: 60
      Statistic: Sum
      Threshold: 0

  PreTrafficLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      InlineCode: |
        "use strict";

        const AWS = require("aws-sdk");
        const codedeploy = new AWS.CodeDeploy();

        exports.handler = (event, context, callback) => {

          console.log("Entering PreTraffic hook.");

          // Read the DeploymentId and LifecycleEventHookExecutionId from the event payload
          const deploymentId = event.DeploymentId;
          const lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;
          var validationTestResult = "Failed";

          // Perform PreTraffic validation tests here. Set the test result
          // to "Succeeded" for this tutorial.
          console.log("This is where PreTraffic validation tests happen.")
          validationTestResult = "Succeeded";

          // Complete the PreTraffic hook by sending CodeDeploy the validation status
          const params = {
            deploymentId: deploymentId,
            lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
            status: validationTestResult // status can be 'Succeeded' or 'Failed'
          };

          // Pass AWS CodeDeploy the prepared validation test results.
          codedeploy.putLifecycleEventHookExecutionStatus(params, (err, data) => {
           if (err) {
             // Validation failed.
             console.log('PreTraffic validation tests failed');
             console.log(err, err.stack);
             callback("CodeDeploy Status update failed");
           } else {
             // Validation succeeded.
             console.log("PreTraffic validation tests succeeded");
             callback(null, "PreTraffic validation tests succeeded");
           }
          });
        }
      Policies:
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - codedeploy:PutLifecycleEventHookExecutionStatus
              Resource: !Sub arn:${AWS::Partition}:codedeploy:${AWS::Region}:${AWS::AccountId}:deploymentgroup:${ServerlessDeploymentApplication}/*
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - lambda:InvokeFunction
              Resource:
                - !GetAtt ExampleAFunction.Arn
                - !GetAtt ExampleBFunction.Arn
      FunctionName: CodeDeployHook_preTrafficHook
      Environment:
        Variables:
          ExampleAFunctionCurrentVersion: !Ref ExampleAFunction.Version
          ExampleBFunctionCurrentVersion: !Ref ExampleBFunction.Version

  PostTrafficLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      InlineCode: |
        "use strict";

        const AWS = require("aws-sdk");
        const codedeploy = new AWS.CodeDeploy();

        exports.handler = (event, context, callback) => {

          console.log("Entering PostTraffic hook.");

          // Read the DeploymentId and LifecycleEventHookExecutionId from the event payload
          const deploymentId = event.DeploymentId;
          const lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;
          var validationTestResult = "Failed";

          // Perform PostTraffic validation tests here. Set the test result
          // to "Succeeded" for this tutorial.
          console.log("This is where PostTraffic validation tests happen.")
          validationTestResult = "Succeeded";

          // Complete the PostTraffic hook by sending CodeDeploy the validation status
          const params = {
            deploymentId: deploymentId,
            lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
            status: validationTestResult // status can be 'Succeeded' or 'Failed'
          };

          // Pass AWS CodeDeploy the prepared validation test results.
          codedeploy.putLifecycleEventHookExecutionStatus(params, (err, data) => {
           if (err) {
             // Validation failed.
             console.log('PostTraffic validation tests failed');
             console.log(err, err.stack);
             callback("CodeDeploy Status update failed");
           } else {
             // Validation succeeded.
             console.log("PostTraffic validation tests succeeded");
             callback(null, "PostTraffic validation tests succeeded");
           }
          });
        }
      Policies:
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - codedeploy:PutLifecycleEventHookExecutionStatus
              Resource: !Sub arn:${AWS::Partition}:codedeploy:${AWS::Region}:${AWS::AccountId}:deploymentgroup:${ServerlessDeploymentApplication}/*
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - lambda:InvokeFunction
              Resource:
                - !GetAtt ExampleAFunction.Arn
                - !GetAtt ExampleBFunction.Arn
      FunctionName: CodeDeployHook_postTrafficHook
      Environment:
        Variables:
          ExampleAFunctionCurrentVersion: !Ref ExampleAFunction.Version
          ExampleBFunctionCurrentVersion: !Ref ExampleBFunction.Version

SAM Reduces Boilerplate

Just a few lines in the CloudFormation applies the CodeDeploy canary configuration courtesy of the SAM transformation:

DeploymentPreference:
  Type: Canary10Percent5Minutes
    Alarms:
      - !Ref ExampleAAliasErrorMetricGreaterThanZeroAlarm
      - !Ref ExampleALatestVersionErrorMetricGreaterThanZeroAlarm
    Hooks:
      PreTraffic: !Ref PreTrafficLambdaFunction
      PostTraffic: !Ref PostTrafficLambdaFunction

SAM transforms the CloudFormation to create the following resources:

  • CodeDeploy Application
  • CodeDeploy DeploymentGroup per Lambda
  • CodeDeployServiceRole
  • Lambda Alias with an UpdatePolicy applying CodeDeploy Application, Deployment Group, and pre/post-traffic hooks

Pre/Post-Traffic Tests

Pre-traffic tests against the new version, validating service contracts and using known test scenarios, provide greater confidence in a deployment being successful. Additional tests can be added over time as the system evolves and more weaknesses are revealed. If these tests fail then no traffic is shifted to the new version and no customers are affected.

pre-traffic tests

The example includes a post-traffic test Lambda for some post-traffic shifting smoke testing should the need arise.

post-traffic tests

Ideally these tests can be run in parallel to minimise execution time and provide rapid feedback. Tests in non-production environments can be more thorough and numerous, employing test data and mocked boundaries to exercise known scenarios that may not be possible to test in production.

CloudWatch Alarms Trigger Rollback

During initial traffic shifting, CloudWatch Alarms monitor error rates of the new version and if triggered will fail the deployment and automatically rollback to the previous version. Only a small number of requests will be negatively impacted before rollback occurs and traffic is shifted back to the previous version. If all goes well, the remainder of the traffic is shifted to the new version.

Phased Rollouts

This strategy encourages you to adopt a phased rollout strategy that introduces no breaking changes, and therefore requires no downtime for your applications. Thought has to go into architecting and coding your applications so that multiple versions can co-exist. For example how will database, contract, or configuration changes be handled and rolled back?

Business Case

There needs to be business cases behind the architectural design decisions that you make; in this case not all solutions require 100% uptime, with phased rollout of changes. It may not even be possible in certain circumstances. However, when services like AWS CodeDeploy make it so easy to apply these strategies it’s almost more work not to adopt this strategy.

AWS Well-Architected Best Practice

This strategy is described as a best practice by the Serverless Application Lens of the AWS Well-Architected Framework, falling within the Operational Excellence Pillar. The ability to build applications that run with 0% downtime is a valuable, marketable skill. Operation excellence is a bar that continues to rise and with expectations on developers only increasing it makes sense to keep your skills sharp.

With Lambdas, you pay for what you use, so previous versions, now unused, will not incur costs and can be retired as necessary. Adopting this strategy helps you optimise costs.

Caveats

CodeDeploy canary deployments use request based traffic shifting, not user based traffic shifting so some or all of your users could be impacted, albeit a much smaller number of requests.

CloudWatch Alarms gather metrics against the alias and not the version so it may lead to false positives against the new version should the previous version begin to fail.

The automatic cleanup of old versions is encouraged so as not to run out of Lambda storage space, which typically occurs at exactly the wrong moment, and can prevent you from deploying. An EventBridge Scheduled Event can be configured to periodically run a Lambda that removes older versions.

When using SAM to apply a CodeDeploy deployment configuration to a Lambda, pre/post traffic hook Lambda names must start with: CodeDeployHook_

Charges Apply!

Charges apply so be sure to remove all resources created after experimentation.

References

AWS Well-Architected Framework: Serverless lens

Serverless Application Model (SAM): AWS::Serverless::Function (DeploymentPreference)

AWS Serverless Application Model Developer Guide: Deploying serverless applications gradually

Burning Monk (guest post for Lumigo): AWS Lambda Canary Deployment

About the author

Hi, I'm Karl Kyck a cloud architect specialising in building sustainable serverless architectures on AWS.