Automated Failover and Git Rollback Strategies with GitOps and Argo Rollouts

When Kubernetes deployments fail due to misconfigurations, manual rollbacks are slow and disruptive. GitOps, with ArgoCD and Argo Rollouts, automates failover by detecting issues and reverting to the last stable version. ArgoCD enforces Git as the source of truth, while Argo Rollouts manages progressive deployments, preventing faulty updates from reaching users and ensuring rapid recovery.
CEO @ Aviator

Automated Failover and Git Rollback Strategies with GitOps and Argo Rollouts

When Kubernetes deployments fail due to misconfigurations, manual rollbacks are slow and disruptive. GitOps, with ArgoCD and Argo Rollouts, automates failover by detecting issues and reverting to the last stable version. ArgoCD enforces Git as the source of truth, while Argo Rollouts manages progressive deployments, preventing faulty updates from reaching users and ensuring rapid recovery.
git rollback strategies

Infrastructure failures can bring down applications, disrupt services, and frustrate users. When a Kubernetes deployment fails due to misconfigured manifests, incorrect image tags, or failing health checks, manual recovery takes time. Slow rollbacks can also impact dependent service.

GitOps ensures infrastructure stays aligned with Git, eliminating ad-hoc fixes. Tools like ArgoCD and Argo Rollouts detect failures, revert incorrect updates, and restore the last working version. ArgoCD enforces committed configurations, resetting changes made outside Git. Argo Rollouts enables canary and blue-green deployments, halting faulty updates before they impact users.

For example, if an update introduces a missing environment variable, Kubernetes marks the pod as unhealthy. Argo Rollouts stops the deployment, keeping the last working version active. If an engineer modifies a resource outside Git, ArgoCD reverts it to maintain consistency.

How GitOps Manages Deployments Through Git

GitOps ensures that the Kubernetes environment in the production environment always aligns with the configuration present in Git, covering infrastructure manifests, infrastructure configurations, and security policies such as Role-based Access Control, network policies, and PodSecurityPolicies. The configuration in Git acts as the source of truth, meaning every deployment, service, or policy change must be versioned and committed before being applied. 

If any deviation occurs, such as an untracked update to deployment or a manual configuration change, GitOps tools detect it and restore the environment to match the committed state, preventing unintentional drifts.

  • Version-Controlled Deployments with Git: Every infrastructure or application update must be committed to Git before being deployed in Kubernetes. This includes deployment configurations, service configurations, and policy rules. Git maintains a version history, making rollbacks easier when needed.
  • Automated state enforcement: Tools like ArgoCD continuously monitor the Kubernetes API, checking whether the deployed state matches the definitions stored in Git. If issues are found, ArgoCD automatically restores the necessary changes to bring the cluster back to its desired state.

For example, if someone manually scales down a deployment’s replica count in Kubernetes using the command kubectl scale deployment app –replicas=1, ArgoCD detects the difference and restores the replica count to match what is defined in Git. 

Similarly, if an application update includes an invalid configuration that causes pods to fail, GitOps rolls back to the last working version without requiring manual intervention.

How Git Enables GitOps: Tracking Changes and Rolling Back Easily

 Git enables version control for tracking and rolling back changes, enforces a structured change management workflow to prevent misconfigurations, maintains consistency through continuous reconciliation, and provides real-time feedback to detect and resolve issues before they impact production.

Ensuring Controlled and Verified Changes

All changes in GitOps go through a structured workflow to ensure they are reviewed and validated before being deployed. When an update is submitted, it is created as a pull request (PR) in Git. This PR undergoes automated validation, including unit tests, integration tests, and security scans to check for misconfigurations or vulnerabilities. Additionally, policy checks are enforced using policy frameworks like Open Policy Agent (OPA) to verify compliance with security and operational guidelines. 

Once the automated checks pass, team members review the PR, providing feedback and approvals before merging. After approval, ArgoCD detects the changes and enforces them in the Kubernetes cluster, ensuring that the environment remains aligned with the latest approved configuration in Git.

Failure Detection

GitOps tools like ArgoCD continuously monitor deployments and detect configuration drift. If a deployment fails or deviates from the desired state, ArgoCD flags the issue and can notify teams through webhooks or integrations with monitoring tools like Prometheus and Alertmanager. While ArgoCD does not trigger automatic rollbacks, it can revert changes when manually synced to the last stable commit. For automated rollback strategies, Argo Rollouts can be used in progressive deployments like Canary or Blue-Green. This ensures faster recovery and minimized disruptions.

Key GitOps Mechanism for Failover and Rollback

Failures in Kubernetes can disrupt applications, but GitOps tools help detect issues and restore stability automatically.

  • ArgoCD: Ensures that the Kubernetes cluster always matches what is stored in Git. If someone makes a manual change using kubectl edit or if an external system modifies a resource, ArgoCD identifies the difference and immediately restores the cluster to the correct state.
  • Argo Rollouts controls how application updates are introduced to users. Instead of making updates available everywhere at once, it gradually redirects traffic to the new version. If problems arise, such as increased error rates or slow response times, Argo Rollouts stops the update process and directs traffic back to the last stable version. This prevents faulty updates from affecting users and makes it easier to revert to a working state.
  • Kubernetes Health Probes: Built-in health checks monitor whether an application is working properly. If an application update causes multiple pods to fail, GitOps tools revert to the last stable version before the issue impacts users.

While Kubernetes offers native rollback options, GitOps provides a more automated and controlled approach, especially when managing numerous microservices at scale. In large environments with many interconnected services, a simple kubectl rollout undo may not be enough, as it only targets individual deployments without considering dependencies, configuration drift, or traffic management. 

GitOps ensures that rollbacks happen consistently across all affected services, restoring the entire system to a known good state while preventing cascading failures. Let’s dive into how Kubectl rollback differs from ArgoCD rollback and which approach is better.

Kubectl Rollback vs. ArgoCD Rollback: A Comparison

Rolling back to a stable version is essential when an update causes failures or unintended behavior. Kubernetes provides rollback capabilities through kubectl, while GitOps-based workflows like ArgoCD handle rollbacks differently by relying on Git history. Understanding these two approaches is important for choosing the right method based on automation needs, consistency, and tracking. Below, we compare kubectl rollback, which operates at the cluster level, with ArgoCD rollback, which restores the desired state from Git.

comparison table

Now that we understand rollback differences let’s examine how to set up an automated failure and rollback strategy using GitOps.

Rolling Updates and Rollbacks in Kubernetes with kubectl

Here, we will use kubectl to manage Kubernetes deployment. We will start by deploying an application using kubectl. This approach ensures that application updates are deployed gradually, minimizing downtime and allowing for quick recovery in case of failures.

First, we create a deployment named test-app using the official Nginx image version 1.19, setting up three replicas:

kubectl create deployment test-app –image=nginx:1.19 –replicas=3

output 1

This creates a deployment and ensures that three pods are running with the specified Nginx image.

To verify, we list the deployments using kubectl get deployments

output 2

To track changes made to the deployment, we check its rollout history:

kubectl rollout history deployment test-app

output 3

So far, there is only one version (revision 1) of the deployment.

Now, we simulate an issue by updating the nginx container image to an incorrect version (nginx:broken):

kubectl set image deployment/test-app nginx=nginx:broken

Since nginx:broken is an invalid image, Kubernetes will try to pull the image but will fail. This results in an ImagePullBackOff state for the new pods.

To verify this, check the pods:

kubectl get pods

output 4

One of the new pods is stuck in ImagePullBackOff, indicating a failure.

Since our deployment update introduced an issue, we need to roll back to the previous stable version. First, we check the deployment history again:

kubectl rollout history deployment test-app

output 5

Now, we roll back to revision 1:

kubectl rollout undo deployment test-app –to-revision=1

This reverts the deployment to its last known stable version (nginx:1.19).

We can confirm the rollback by listing the deployments again:

kubectl get deployments

output 6

Now, checking the pods:

kubectl get pods

output 7

The deployment is now back to its previous stable state.

Automated Failure and Rollback Using ArgoCD

First, let’s check if Argo CD is installed:

argocd version

output 8

If you see the “server address unspecified” error, it means the Argo CD server is not configured. We need to log in.

To log in, we first need to retrieve the initial admin password.

kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath=”{.data.password}” | base64 –decode 

This returns the password we will use to log into ArgoCD

output 9

Now that we have the password, we can log into Argo CD.

argocd login localhost:8080 –username admin –password b6eLbSiCqKscC7Rx –insecure

output 10

We are now successfully authenticated and ready to add a Git repository.

Next, we need to connect our Git repository, where our Kubernetes manifests are stored.

argocd repo add https://github.com/byteBardShivansh/Aviator-test.git –username byteBardShivansh –password <your-github-PAT>

output 11

Argo CD is now tracking this repository for changes.

Now, we need to deploy our application using Argo CD.

argocd app create gitops-app \

  –repo https://github.com/byteBardShivansh/Aviator-test.git \

  –path k8-manifests \

  –dest-server https://kubernetes.default.svc \

  –dest-namespace default

output 12

The application has been created but has not yet been deployed. To deploy, we have to sync it:

argocd app sync gitops-app

output 13

You can also check your deployment locally on https://localhost:8080/

output 14

Now, let’s check if the deployment was successful.

kubectl get deployments -n default

output 15

This shows that the deployment exists and is ready. Check pod status using:

kubectl get pods -n default

output 16

This shows that the pods are working correctly. And the application is now running successfully.

We now install Argo Rollouts for canary deployments.

kubectl apply -n default -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

output 17

Simulate rollback

Now, we simulate a failure by deploying a broken version by pushing a broken image reference in rollout.yaml

image: gcr.io/micro-environs-442313-i6/gitops-app:v2-broken

Now push this change:

git add . 

git commit -m “Simulated broken deployment” 

git push origin main

Resync the application using argocd app sync gitops-app

Now, check the state of the application:

kubectl argo rollouts get rollout gitops-app-rollout -n default

output 18

Now check for failing pods by the ImagePullBackOff status when we check pods using:

kubectl get pods -n default

output 19

To restore the application, we roll back to the last successful version. We undo the deployment:

kubectl argo rollouts undo gitops-app-rollout -n default

This will revert the deployment back to the previous stable version

After the rollback, check the rollout status using:

kubectl argo rollouts get rollout gitops-app-rollout -n default

output 20

Now, the rollout is fully healthy! The broken canary has been removed.

Best Practices for Automated Failure and Rollback in GitOps

Implement Progressive Delivery

Progressive delivery ensures that deployments roll out gradually, reducing the risk of failure. Techniques like canary deployments, blue-green deployments, and feature flags allow teams to expose changes to a small subset of users before a full rollout. This helps detect issues early and prevents widespread disruptions.

Automate Rollback Triggers

Automating rollback triggers is critical for reducing downtime. Argo Rollouts can automatically revert a deployment if health checks fail, while Kubernetes health probes assess application readiness. Monitoring tools like Prometheus and Grafana provide real-time alerts, enabling quick failure detection and rollback activation.

Version Control Everything

Version control should encompass all infrastructure components. Storing manifests, Helm charts, and environment variables in Git ensures consistency across environments. Pull request workflows enforce validation, reducing human errors. Preventing manual cluster modifications outside GitOps workflows strengthens reliability.

Regular Chaos Testing

Proactive failure simulation ensures that GitOps-driven rollback mechanisms work effectively when real outages occur. Chaos engineering tools like Chaos Mesh and Litmus Chaos introduce controlled failures into production-like environments by randomly terminating pods, disrupting network traffic, or increasing CPU load to test system resilience. 

Simulating deployment failures by intentionally pushing broken configurations helps verify whether ArgoCD and Argo Rollouts can accurately detect and revert faulty changes. Additionally, validating auto-recovery capabilities ensures that Kubernetes self-healing, health probes, and automated rollback processes function correctly under simulated failure scenarios, strengthening overall system reliability.

FAQs

What Problems Does Gitops Solve?

GitOps eliminates configuration drift, automates deployments, and ensures consistency across environments.

What is the Source of Truth in Gitops?

Git is the single source of truth, storing the desired state of applications and infrastructure.

What Is the Difference Between Flux and ArgoCD?

Flux is lightweight and simple, while ArgoCD provides a richer UI, more advanced deployment strategies, and better observability features.

What Is the Difference Between IaC and GitOps?

IaC (Infrastructure as Code) defines infrastructure in code but does not enforce its application. GitOps extends IaC by continuously enforcing the desired state using Git as the control mechanism.

Subscribe

Be the first to know once we publish a new blog post

Join our Discord

Learn best practices from modern engineering teams

Get a free 30-min consultation with the Aviator team to improve developer experience across your organization.

Powered by WordPress