Deployments and Rollbacks Using ECS and GitHub Actions
Amazon ECS offers native support for monitoring and automatically managing updates using Amazon CloudWatch metric alarms. However, in this article, we’ll explore how to accomplish this with GitHub Actions, providing more flexibility and integration with existing workflows.
We will set up a workflow to deploy releases when changes are pushed to the main branch. For rollbacks, we’ll configure CloudWatch to monitor HTTP 5xx errors, high memory utilization, and high CPU utilization. If any of these metrics show issues, the rollback will be triggered.
Prerequisites
To follow this tutorial, you need:
- Basic understanding of ECS
- An application already running on ECS
- The following GitHub repository secrets:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
ECS_CLUSTER
ECS_SERVICE
Workflow for Releases
Assuming you have a project on GitHub, create a workflow file for releases (.github/workflows/releases.yaml
) and use the following code. This will build and push changes to Docker Hub and trigger a service update via the AWS CLI, allowing ECS to deploy the latest version of your project.
Note: Some of the credentials left in the config file below should be stored secretly.
name: Deploy to ECS
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Log in to Docker Hub
run: echo "${{ secrets.DOCKER_HUB_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_HUB_USERNAME }}" --password-stdin
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build and push Docker image
run: |
docker build -t khabdrick/ecsproject:${{ github.sha }} .
docker push khabdrick/ecsproject:${{ github.sha }}
echo "IMAGE_TAG=khabdrick/ecsproject:${{ github.sha }}" >> $GITHUB_ENV
- name: Install AWS CLI
run: sudo apt-get update && sudo apt-get install -y awscli
- name: Configure AWS CLI
run: |
aws configure set aws_access_key_id ${{ secrets.AWS_ACCESS_KEY_ID }}
aws configure set aws_secret_access_key ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws configure set region ${{ secrets.AWS_REGION }}
- name: Register new task definition revision
run: |
aws ecs register-task-definition \
--family ecsproject_task \
--execution-role-arn arn:aws:iam::925248302005:role/ecstaskrole \
--task-role-arn arn:aws:iam::925248302005:role/ecstaskrole \
--network-mode awsvpc \
--requires-compatibilities FARGATE \
--cpu "1024" \
--memory "3072" \
--container-definitions '[
{
"name": "mongo",
"image": "mongo:latest",
"cpu": 0,
"memory": 2048,
"portMappings": [
{
"appProtocol": "http",
"containerPort": 27017,
"hostPort": 27017,
"name": "mongo-27017-tcp",
"protocol": "tcp"
}
],
"essential": true,
"environment": [
{
"name": "MONGO_INITDB_ROOT_USERNAME",
"value": "mongo"
},
{
"name": "MONGO_INITDB_ROOT_PASSWORD",
"value": "password"
}
],
"mountPoints": [
{
"sourceVolume": "mongo-mount",
"containerPath": "/data/db",
"readOnly": false
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecsproject_task",
"awslogs-create-group": "true",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "project_container",
"image": "${{ env.IMAGE_TAG }}",
"cpu": 0,
"memory": 1024,
"portMappings": [
{
"containerPort": 3000,
"hostPort": 3000,
"name": "project_container-3000-tcp",
"protocol": "tcp"
}
],
"essential": false,
"environment": [
{
"name": "MONGO_USER",
"value": "mongo"
},
{
"name": "MONGO_IP",
"value": "localhost"
},
{
"name": "MONGO_PORT",
"value": "27017"
},
{
"name": "MONGO_PASSWORD",
"value": "password"
}
]
}
]' \
--volumes '[
{
"name": "mongo-mount",
"efsVolumeConfiguration": {
"fileSystemId": "fs-0ae93a5984f5ff5c0",
"rootDirectory": "/"
}
}
]' \
--runtime-platform '{"cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX"}' \
--output json > new-task-def.json
- name: Update ECS service to use new task definition
run: |
NEW_TASK_DEF_ARN=$(jq -r '.taskDefinition.taskDefinitionArn' new-task-def.json)
aws ecs update-service \
--cluster ${{ secrets.ECS_CLUSTER }} \
--service ${{ secrets.ECS_SERVICE }} \
--task-definition $NEW_TASK_DEF_ARN
This workflow automates deploying a Docker-based application to Amazon ECS whenever changes are pushed to the main branch. It sets up QEMU for multi-platform builds and Docker Buildx for building and pushing Docker images.
Once the Docker image is built and pushed, the workflow installs and configures the AWS CLI using credentials stored in GitHub Secrets. It then registers a new task definition revision in ECS. This task definition includes two containers: one for a MongoDB database and another for the application itself. You can modify this portion to fit the specific requirements of your application running on ECS.
Finally, the script updates the ECS service to use the newly registered task definition. It extracts the ARN (Amazon Resource Name) of the new task definition from the output JSON file and updates the ECS service using this ARN, ensuring that the ECS service runs the latest version of the application.
Monitor and Rollback
Create another workflow (.github/workflows/rollback.yaml
) to run every ten minutes, five times after deployment, checking CloudWatch alarms. If any issues are detected, the rollback to the previous task will be triggered.
First, create an SNS topic for alarm actions:
- Open the AWS Management Console and navigate to Amazon SNS.
- Create a new topic.
- Note the ARN of the created topic (e.g.,
arn:aws:sns:us-east-1:123456789012:MyTopic
).
Next, create CloudWatch alarms for HighHTTP5xxErrors
, HighMemoryUtilization
, and HighCPUUtilization
using the AWS CLI. Replace <arn:aws:sns:us-east-1:123456789012:MyTopic>
with your SNS topic ARN and ECSproject
with your cluster name.
aws cloudwatch put-metric-alarm \
--alarm-name HighHTTP5xxErrors \
--metric-name HTTPCode_Backend_5XX \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 300 \
--evaluation-periods 3 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=LoadBalancer,Value=note-api-lb \
--alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
--unit Count
aws cloudwatch put-metric-alarm \
--alarm-name HighMemoryUtilization \
--metric-name MemoryUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ClusterName,Value=ECSproject \
--alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
--unit Percent
aws cloudwatch put-metric-alarm \
--alarm-name HighCPUUtilization \
--metric-name CPUUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ClusterName,Value=ECSproject \
--alarm-actions <arn:aws:sns:us-east-1:123456789012:MyTopic> \
--unit Percent
And paste in the workflow for rolling back:
name: Rollback to Previous Deployment
on:
workflow_run:
workflows: ["Deploy to ECS"]
types:
- completed
jobs:
rollback:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
strategy:
matrix:
attempt: [1, 2, 3, 4, 5]
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Wait before rollback attempt ${{ matrix.attempt }}
run: sleep $(( ${{ matrix.attempt }} * 600 ))
- name: Install AWS CLI
run: |
sudo apt-get update
sudo apt-get install -y awscli
- name: Configure AWS CLI
run: |
aws configure set aws_access_key_id ${{ secrets.AWS_ACCESS_KEY_ID }}
aws configure set aws_secret_access_key ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws configure set region ${{ secrets.AWS_REGION }}
- name: Check for CloudWatch Alarms
id: check_alarm_state
run: |
CPU_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighCPUUtilization" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
MEMORY_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighMemoryUtilization" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
HTTP_ALARM_STATE=$(aws cloudwatch describe-alarms --alarm-names "HighHTTP5xxErrors" --state-value ALARM --query 'MetricAlarms[0].StateValue' --region ${{ secrets.AWS_REGION }})
if [ "$CPU_ALARM_STATE" == "ALARM" ] || [ "$MEMORY_ALARM_STATE" == "ALARM" ] || [ "$HTTP_ALARM_STATE" == "ALARM" ]; then
echo "ALARM"
echo "::set-output name=alarm_state::ALARM"
else
echo "OK"
echo "::set-output name=alarm_state::OK"
fi
- name: Get the second-to-last task definition revision
id: get_previous_task_definition
run: |
if [ "${{ steps.check_alarm_state.outputs.alarm_state }}" == "ALARM" ]; then
TASK_DEFINITION=$(aws ecs describe-services --cluster ${{ secrets.ECS_CLUSTER }} --services ${{ secrets.ECS_SERVICE }} --query 'services[0].deployments[1].taskDefinition' --output text)
echo "::set-output name=task_definition::${TASK_DEFINITION}"
else
echo "No alarm, no rollback needed."
exit 0
fi
- name: Rollback to previous task definition
if: steps.check_alarm_state.outputs.alarm_state == 'ALARM'
run: |
aws ecs update-service \
--cluster ${{ secrets.ECS_CLUSTER }} \
--service ${{ secrets.ECS_SERVICE }} \
--task-definition ${{ steps.get_previous_task_definition.outputs.task_definition }}
This workflow will rollback a deployment on Amazon ECS if specific alarms are triggered after a successful deployment. It is triggered upon the completion of the “Deploy to ECS” workflow and only proceeds if the deployment was successful. The rollback job uses a matrix strategy to attempt the rollback up to five times, with each attempt spaced 10 minutes apart.
The core function of the workflow is to monitor specific CloudWatch alarms for CPU utilization, memory utilization, and HTTP 5xx errors. The script checks the state of these alarms and sets an output variable, alarm_state
, to “ALARM” if any are triggered. This condition determines whether the rollback should proceed.
If any alarms are in the “ALARM” state, the workflow retrieves the second-to-last task definition revision for the ECS service, representing the previous stable deployment. This task definition is then used to update the ECS service, effectively rolling back to the prior version. This ensures that if the latest deployment causes issues, the system can quickly revert to a stable state.
Conclusion
We covered how to use GitHub Actions to automate the deployment and rollback processes for an application running on Amazon ECS. This includes setting up a workflow for releasing updates when changes are pushed to the main branch and configuring CloudWatch alarms to monitor key metrics. If any alarms are triggered, the workflow initiates a rollback to a previous task definition to maintain application reliability.
To further improve your deployment strategy, consider advanced techniques like blue-green deployments, canary deployments, using AWS CodeDeploy, or integrating monitoring tools like Prometheus and Grafana for better insights.