Deployments and rollbacks using ECS and GitHub Actions

This post will walk you through setting up automated deployments as well as automatic rollbacks for an ECS set up using GitHub Actions.
I am a Software Developer with a passion for technical writing and open source contribution. My areas of expertise are full-stack web development and DevOps.

Deployments and rollbacks using ECS and GitHub Actions

This post will walk you through setting up automated deployments as well as automatic rollbacks for an ECS set up using GitHub Actions.

Amazon ECS offers native support for monitoring and automatically managing updates using Amazon CloudWatch metric alarms. However, in this article, we’ll explore how to accomplish this with GitHub Actions, providing more flexibility and integration with existing workflows.

We will set up a workflow to deploy releases when changes are pushed to the main branch. For rollbacks, we’ll configure CloudWatch to monitor HTTP 5xx errors, high memory utilization, and high CPU utilization. If any of these metrics show issues, the rollback will be triggered.

Prerequisites

To follow this tutorial, you need:

  • Basic understanding of ECS
  • An application already running on ECS
  • The following GitHub repository secrets:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION
    • ECS_CLUSTER
    • ECS_SERVICE

Workflow for Releases

Assuming you have a project on GitHub, create a workflow file for releases (.github/workflows/releases.yaml) and use the following code. This will build and push changes to Docker Hub and trigger a service update via the AWS CLI, allowing ECS to deploy the latest version of your project.

Note: Some of the credentials left in the config file below should be stored secretly.

name: Deploy to ECS
on:
  push:
    branches:
      - main
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      
      - name: Log in to Docker Hub
        run: echo "${{ secrets.DOCKER_HUB_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_HUB_USERNAME }}" --password-stdin
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Build and push Docker image
        run: |
          docker build -t khabdrick/ecsproject:${{ github.sha }} .
          docker push khabdrick/ecsproject:${{ github.sha }}
          echo "IMAGE_TAG=khabdrick/ecsproject:${{ github.sha }}" >> $GITHUB_ENV
      - name: Install AWS CLI
        run: sudo apt-get update && sudo apt-get install -y awscli
      - name: Configure AWS CLI
        run: |
          aws configure set aws_access_key_id ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws configure set aws_secret_access_key ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws configure set region ${{ secrets.AWS_REGION }}
      - name: Register new task definition revision
        run: |
          aws ecs register-task-definition \
            --family ecsproject_task \
            --execution-role-arn arn:aws:iam::925248302005:role/ecstaskrole \
            --task-role-arn arn:aws:iam::925248302005:role/ecstaskrole \
            --network-mode awsvpc \
            --requires-compatibilities FARGATE \
            --cpu "1024" \
            --memory "3072" \
            --container-definitions '[
                {
                    "name": "mongo",
                    "image": "mongo:latest",
                    "cpu": 0,
                    "memory": 2048,
                    "portMappings": [
                        {
                            "appProtocol": "http",
                            "containerPort": 27017,
                            "hostPort": 27017,
                            "name": "mongo-27017-tcp",
                            "protocol": "tcp"
                        }
                    ],
                    "essential": true,
                    "environment": [
                        {
                            "name": "MONGO_INITDB_ROOT_USERNAME",
                            "value": "mongo"
                        },
                        {
                            "name": "MONGO_INITDB_ROOT_PASSWORD",
                            "value": "password"
                        }
                    ],
                    "mountPoints": [
                        {
                            "sourceVolume": "mongo-mount",
                            "containerPath": "/data/db",
                            "readOnly": false
                        }
                    ],
                    "logConfiguration": {
                        "logDriver": "awslogs",
                        "options": {
                            "awslogs-group": "/ecs/ecsproject_task",
                            "awslogs-create-group": "true",
                            "awslogs-region": "us-east-1",
                            "awslogs-stream-prefix": "ecs"
                        }
                    }
                },
                {
                    "name": "project_container",
                    "image": "${{ env.IMAGE_TAG }}",
                    "cpu": 0,
                    "memory": 1024,
                    "portMappings": [
                        {
                            "containerPort": 3000,
                            "hostPort": 3000,
                            "name": "project_container-3000-tcp",
                            "protocol": "tcp"
                        }
                    ],
                    "essential": false,
                    "environment": [
                        {
                            "name": "MONGO_USER",
                            "value": "mongo"
                        },
                        {
                            "name": "MONGO_IP",
                            "value": "localhost"
                        },
                        {
                            "name": "MONGO_PORT",
                            "value": "27017"
                        },
                        {
                            "name": "MONGO_PASSWORD",
                            "value": "password"
                        }
                    ]
                }
            ]' \
            --volumes '[
                {
                    "name": "mongo-mount",
                    "efsVolumeConfiguration": {
                        "fileSystemId": "fs-0ae93a5984f5ff5c0",
                        "rootDirectory": "/"
                    }
                }
            ]' \
            --runtime-platform '{"cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX"}' \
            --output json > new-task-def.json
      - name: Update ECS service to use new task definition
        run: |
          NEW_TASK_DEF_ARN=$(jq -r '.taskDefinition.taskDefinitionArn' new-task-def.json)
          aws ecs update-service \
            --cluster ${{ secrets.ECS_CLUSTER }} \
            --service ${{ secrets.ECS_SERVICE }} \
            --task-definition $NEW_TASK_DEF_ARN

This workflow automates deploying a Docker-based application to Amazon ECS whenever changes are pushed to the main branch. It sets up QEMU for multi-platform builds and Docker Buildx for building and pushing Docker images.

Once the Docker image is built and pushed, the workflow installs and configures the AWS CLI using credentials stored in GitHub Secrets. It then registers a new task definition revision in ECS. This task definition includes two containers: one for a MongoDB database and another for the application itself. You can modify this portion to fit the specific requirements of your application running on ECS.

Finally, the script updates the ECS service to use the newly registered task definition. It extracts the ARN (Amazon Resource Name) of the new task definition from the output JSON file and updates the ECS service using this ARN, ensuring that the ECS service runs the latest version of the application.

Monitor and Rollback

Create another workflow (.github/workflows/rollback.yaml) to run every ten minutes, five times after deployment, checking CloudWatch alarms. If any issues are detected, the rollback to the previous task will be triggered.

First, create an SNS topic for alarm actions:

  1. Open the AWS Management Console and navigate to Amazon SNS.
  2. Create a new topic.
  3. Note the ARN of the created topic (e.g., arn:aws:sns:us-east-1:123456789012:MyTopic).

Next, create CloudWatch alarms for HighHTTP5xxErrors, HighMemoryUtilization, and HighCPUUtilization using the AWS CLI. Replace <arn:aws:sns:us-east-1:123456789012:MyTopic> with your SNS topic ARN and ECSproject with your cluster name.

aws cloudwatch put-metric-alarm \
    --alarm-name HighHTTP5xxErrors \
    --metric-name HTTPCode_Backend_5XX \
    --namespace AWS/ApplicationELB \
    --statistic Sum \
    --period 300 \
    --evaluation-periods 3 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=LoadBalancer,Value=note-api-lb \
    --alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
    --unit Count
aws cloudwatch put-metric-alarm \
    --alarm-name HighMemoryUtilization \
    --metric-name MemoryUtilization \
    --namespace AWS/ECS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 3 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=ClusterName,Value=ECSproject \
    --alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
    --unit Percent
aws cloudwatch put-metric-alarm \
    --alarm-name HighCPUUtilization \
    --metric-name CPUUtilization \
    --namespace AWS/ECS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 3 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=ClusterName,Value=ECSproject \
    --alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
    --unit Percent


And paste in the workflow for rolling back:

name: Rollback to Previous Deployment
on:
  workflow_run:
    workflows: ["Deploy to ECS"]
    types:
      - completed
jobs:
  rollback:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    strategy:
      matrix:
        attempt: [1, 2, 3, 4, 5]
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      - name: Wait before rollback attempt ${{ matrix.attempt }}
        run: sleep $(( ${{ matrix.attempt }} * 600 ))
      
      - name: Install AWS CLI
        run: |
          sudo apt-get update
          sudo apt-get install -y awscli
      - name: Configure AWS CLI
        run: |
          aws configure set aws_access_key_id ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws configure set aws_secret_access_key ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws configure set region ${{ secrets.AWS_REGION }}

      - name: Check for CloudWatch Alarms
        id: check_alarm_state
        run: |
          CPU_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighCPUUtilization" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
          MEMORY_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighMemoryUtilization" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
          HTTP_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighHTTP5xxErrors" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
          if [ "$CPU_ALARM_STATE" == "ALARM" ] || [ "$MEMORY_ALARM_STATE" == "ALARM" ] || [ "$HTTP_ALARM_STATE" == "ALARM" ]; then
            echo "ALARM"
            echo "::set-output name=alarm_state::ALARM"
          else
            echo "OK"
            echo "::set-output name=alarm_state::OK"
          fi

      - name: Get the second-to-last task definition revision
        id: get_previous_task_definition
        run: |
          if [ "${{ steps.check_alarm_state.outputs.alarm_state }}" == "ALARM" ]; then
            TASK_DEFINITION=$(aws ecs describe-services --cluster ${{ secrets.ECS_CLUSTER }} --services ${{ secrets.ECS_SERVICE }} --query 'services[0].deployments[1].taskDefinition' --output text)
            echo "::set-output name=task_definition::${TASK_DEFINITION}"
          else
            echo "No alarm, no rollback needed."
            exit 0
          fi
      
      - name: Rollback to previous task definition
        if: steps.check_alarm_state.outputs.alarm_state == 'ALARM'
        run: |
          aws ecs update-service \
            --cluster ${{ secrets.ECS_CLUSTER }} \
            --service ${{ secrets.ECS_SERVICE }} \
            --task-definition ${{ steps.get_previous_task_definition.outputs.task_definition }}

This workflow will rollback a deployment on Amazon ECS if specific alarms are triggered after a successful deployment. It is triggered upon the completion of the “Deploy to ECS” workflow and only proceeds if the deployment was successful. The rollback job uses a matrix strategy to attempt the rollback up to five times, with each attempt spaced 10 minutes apart.

The core function of the workflow is to monitor specific CloudWatch alarms for CPU utilization, memory utilization, and HTTP 5xx errors. The script checks the state of these alarms and sets an output variable, alarm_state, to “ALARM” if any are triggered. This condition determines whether the rollback should proceed.

If any alarms are in the “ALARM” state, the workflow retrieves the second-to-last task definition revision for the ECS service, representing the previous stable deployment. This task definition is then used to update the ECS service, effectively rolling back to the prior version. This ensures that if the latest deployment causes issues, the system can quickly revert to a stable state.

Conclusion

We covered how to use GitHub Actions to automate the deployment and rollback processes for an application running on Amazon ECS. This includes setting up a workflow for releasing updates when changes are pushed to the main branch and configuring CloudWatch alarms to monitor key metrics. If any alarms are triggered, the workflow initiates a rollback to a previous task definition to maintain application reliability.

To further improve your deployment strategy, consider advanced techniques like blue-green deployments, canary deployments, using AWS CodeDeploy, or integrating monitoring tools like Prometheus and Grafana for better insights.

aviator releases

Aviator.co | Blog

Subscribe

Be the first to know once we publish a new blog post

Join our Discord

Learn best practices from modern engineering teams

Get a free 30-min consultation with the Aviator team to improve developer experience across your organization.