The 2 AM Problem
You ran terraform apply. It succeeded. But the app is down. Now what?
Most teams handle infrastructure rollbacks one of three ways:
- Hand-write the reverse commands. Scroll through the plan output, figure out what changed, construct
terraform destroy -target=andterraform apply -target=commands manually. In a production incident, under pressure, at 2 AM. This is where mistakes happen — a wrong target, a missed resource, and you're deeper in the hole. - Revert the Git commit and re-apply. This works for configuration changes, but it doesn't undo creates. If the bad deploy created 3 new EC2 instances and a misplaced security group, reverting the code and re-running
terraform applywon't destroy those resources — Terraform will just see them as unmanaged drift. - Run
terraform destroyon the whole stack. Nuclear option. It works, but it takes down everything — including the resources that were fine before the bad deploy. Now you're rebuilding from scratch instead of reverting one change.
None of these are fast enough or precise enough for incident response. What you need is targeted rollback commands generated from the actual plan that caused the problem — before you apply it, so they're ready if you need them.
How DeployDiff Rollback Works
DeployDiff's rollback command reads your plan file and generates the exact reverse operations:
# Generate rollback commands before you deploy
deploydiff rollback --tf plan.json
The output is a list of provider-specific commands you can run immediately if the deploy goes wrong:
# Terraform Rollback Commands
# Run these in reverse order to undo the deployment
# Undo creates → destroy the newly created resources
terraform destroy -target=aws_instance.web_server -auto-approve
terraform destroy -target=aws_security_group.app_sg -auto-approve
terraform destroy -target=aws_db_instance.replica -auto-approve
# Undo destructive changes → re-apply deleted/replaced resources
terraform apply -target=aws_instance.old_backend -auto-approve
# Undo updates → restore previous config and re-apply
# To revert aws_lb.frontend, restore previous config and run:
terraform apply -target=aws_lb.frontend -auto-approve
# Or rollback the entire stack:
terraform apply -auto-approve # with previous .tf files
# OR destroy everything and re-apply from a known good state:
terraform destroy -auto-approve && terraform apply -auto-approve
The Three Rollback Patterns
DeployDiff generates different commands for each type of change in the plan:
| Plan Action | Rollback Command | Why This Works |
|---|---|---|
| Create | terraform destroy -target=RESOURCE -auto-approve |
Created resources don't exist in the previous state. Destroy them to get back to where you were. |
| Delete | terraform apply -target=RESOURCE -auto-approve |
Deleted resources still exist in the previous state file. Re-apply them to recreate. |
| Replace | terraform apply -target=RESOURCE -auto-approve |
Replacements are destroy+create. Re-applying from the old config restores the original resource. |
| Update | # Restore previous config, then terraform apply -target= |
Updates change in-place. You need the previous config version (from Git) + targeted apply. |
Key insight: Rollback commands are generated before the deploy, not after. When you run deploydiff rollback alongside deploydiff preview, you have both the forward plan and the undo path ready. If the deploy goes wrong, you copy-paste the rollback commands — no thinking required.
Multi-Provider Rollback: Same CLI, Different Syntax
The rollback logic is provider-specific because each IaC tool has different syntax for targeted operations:
Terraform Rollback
deploydiff rollback --tf plan.json
# Output:
# - terraform destroy -target= for creates
# - terraform apply -target= for deletes and replaces
# - Commented apply -target= for updates (needs previous config)
# - Full stack rollback option
CloudFormation Rollback
deploydiff rollback --cfn changeset.json
# Output:
# - aws cloudformation rollback-stack for native rollback
# - aws cloudformation delete-stack --retain-resources for partial cleanup
# - Resource-specific delete commands for created resources
# - Update commands with previous property values
Pulumi Rollback
deploydiff rollback --pulumi preview.json
# Output:
# - pulumi destroy -t for targeted resource destruction
# - pulumi up -t for targeted re-application
# - pulumi stack import for state-level rollback
One CLI, three providers. If your org uses Terraform for AWS and CloudFormation for legacy stacks, you don't need two different rollback tools. DeployDiff handles both from the same binary, with provider-correct syntax for each.
Pre-Deploy Safety: Preview + Rollback + Cost Gate
The most effective workflow runs all three DeployDiff commands before every production deploy:
#!/bin/bash
# pre-deploy-check.sh — run before every production terraform apply
set -e
PLAN_FILE="plan.json"
# Step 1: Generate the plan
terraform plan -out=tfplan
terraform show -json tfplan > "$PLAN_FILE"
# Step 2: Preview changes (human-readable diff)
echo "=== Infrastructure Changes ==="
deploydiff preview --tf "$PLAN_FILE"
# Step 3: Cost impact check — exit 1 if monthly increase exceeds $200
echo ""
echo "=== Cost Impact ==="
deploydiff cost --tf "$PLAN_FILE" --threshold 200
# Step 4: Generate rollback commands and save them
echo ""
echo "=== Rollback Plan ==="
deploydiff rollback --tf "$PLAN_FILE" > rollback-$(date +%Y%m%d-%H%M%S).sh
echo ""
echo "✓ Pre-deploy checks passed."
echo "✓ Rollback commands saved to rollback-*.sh"
echo ""
echo "Run: terraform apply tfplan"
echo "If things go wrong: bash rollback-*.sh"
The --threshold flag exits with code 1. If the cost impact exceeds your threshold, the script stops before the deploy. This prevents surprise bills — a $50/month EC2 upgrade becomes a $500/month RDS instance, and DeployDiff catches it before you apply. In CI/CD, this exit code naturally gates the pipeline.
CI/CD Integration: Automatic Rollback on Failure
Here's a GitHub Actions workflow that previews changes, gates on cost, saves rollback commands, and automatically rolls back if the deploy fails:
name: Terraform Deploy with Auto-Rollback
on:
push:
branches: [main]
paths: ['infra/**']
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install DeployDiff
run: pip install deploydiff
- name: Terraform Plan
working-directory: infra
run: |
terraform init
terraform plan -out=tfplan
terraform show -json tfplan > plan.json
- name: Preview Changes
working-directory: infra
run: deploydiff preview --tf plan.json
- name: Cost Gate (fail if increase > $200/month)
working-directory: infra
run: deploydiff cost --tf plan.json --threshold 200
- name: Generate Rollback Commands
working-directory: infra
run: |
deploydiff rollback --tf plan.json > rollback.sh
chmod +x rollback.sh
echo "ROLLBACK_COMMANDS<> "$GITHUB_ENV"
cat rollback.sh >> "$GITHUB_ENV"
echo "EOF" >> "$GITHUB_ENV"
- name: Terraform Apply
id: apply
working-directory: infra
run: terraform apply -auto-approve tfplan
- name: Smoke Test
id: smoke
run: |
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health)
if [ "$HTTP_STATUS" != "200" ]; then
echo "::error::Smoke test failed (HTTP $HTTP_STATUS)"
exit 1
fi
- name: Automatic Rollback on Failure
if: failure() && steps.apply.outcome == 'success'
working-directory: infra
run: |
echo "⚠️ Deploy succeeded but smoke test failed. Rolling back..."
bash rollback.sh
echo "✓ Rollback complete. Investigate the failed deploy before retrying."
- name: Alert on Rollback
if: failure() && steps.apply.outcome == 'success'
run: |
curl -X POST "$SLACK_WEBHOOK" \
-H "Content-Type: application/json" \
-d "{\"text\":\"⚠️ Terraform deploy to production was automatically rolled back. Check: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}\"}"
Rollback only if apply succeeded. If terraform apply itself fails, there's nothing to roll back — Terraform's partial state handling applies. The rollback commands are for when the apply succeeds but the result is wrong (broken health check, wrong config, missing dependency).
Three Real Rollback Scenarios
Scenario 1: Accidental Resource Creation
A developer adds 3 new EC2 instances to the Terraform config that should only exist in staging. The plan gets applied to production.
Without rollback: You discover the extra instances in the AWS console. You look up their IDs. You run terraform destroy -target= for each one, but you're not sure which state file tracks them. You might orphan them in state.
With DeployDiff: The rollback script already has the exact terraform destroy -target=aws_instance.extra_1 -auto-approve commands for each created resource. You run the script. Done.
Scenario 2: Destructive Replace Goes Wrong
A Terraform plan replaces an RDS instance (changing the engine version, which forces recreation). The new instance comes up but the application can't connect because the endpoint changed and the DNS hasn't propagated.
Without rollback: You either wait for DNS (10-30 minutes of downtime), or manually construct a rollback. If the old instance was already terminated, you're waiting for a new one to be created from a snapshot.
With DeployDiff: The rollback script has terraform apply -target=aws_db_instance.main -auto-approve. If you revert the code change (Git revert) and apply the target, Terraform recreates the old instance from the last snapshot. Back online in minutes, not hours.
Scenario 3: CloudFormation Stack Update Breaks Outputs
A CloudFormation stack update changes an output value that downstream stacks depend on. The update succeeds but 3 other stacks are now referencing a wrong export.
Without rollback: You use the AWS console to find the previous stack template, manually construct a aws cloudformation update-stack call with the old template, and hope you got the parameters right.
With DeployDiff: deploydiff rollback --cfn changeset.json generates the exact aws cloudformation rollback-stack command or the update-stack call with the correct previous template body. One command, no manual template reconstruction.
Rollback Commands Are Not Enough: The State Problem
Honest caveat: generated rollback commands are necessary but not sufficient. Here's what they don't handle:
- State drift between plan and apply. If the real infrastructure changed between when you generated the plan and when you applied it, the rollback commands might target the wrong resources. Always re-plan after a rollback.
- Data loss on destructive replaces. If a replaced RDS instance was terminated, rolling back creates a new instance from a snapshot — you lose data written between the apply and the rollback. DeployDiff can't recover that data.
- Dependent resources created outside Terraform. If a manual process created resources that depend on the deployed infrastructure, the rollback might break those dependencies.
deploydiff previewshows you what's changing; check for out-of-band dependencies before applying.
The rollback commands are a starting point, not a guarantee. They give you the correct Terraform/CloudFormation/Pulumi syntax for undoing each change. But you still need to verify the rollback worked — run terraform plan after rolling back to check for drift, and smoke-test your application before declaring the incident resolved.
Setting Up Rollback Generation in Your Workflow
Option 1: Pre-Deploy Script (Simplest)
Add one line to your existing deploy script:
# Before: just plan and apply
terraform plan -out=tfplan && terraform apply tfplan
# After: plan, generate rollback, then apply
terraform plan -out=tfplan && \
terraform show -json tfplan > plan.json && \
deploydiff rollback --tf plan.json > rollback.sh && \
terraform apply tfplan
# If things go wrong:
# bash rollback.sh
Option 2: CI/CD Pipeline (Recommended)
Use the GitHub Actions workflow above. The rollback commands are saved as an artifact, so even if the CI runner is gone, you can download and run them locally.
Option 3: MCP Server for AI Agent Rollback
If you use AI coding agents like Claude Code or Cursor, DeployDiff can run as an MCP server:
# Start DeployDiff as an MCP server
deploydiff mcp
Your AI agent can then call deploydiff rollback directly during incident response — no copy-pasting commands, no context-switching to a terminal. The agent reads the plan, generates the rollback, and executes it.
Install DeployDiff
# pip
pip install deploydiff
# Homebrew (macOS / Linux)
brew tap Coding-Dev-Tools/tap
brew install deploydiff
# Scoop (Windows)
scoop bucket add Coding-Dev-Tools https://github.com/Coding-Dev-Tools/scoop-bucket
scoop install deploydiff
Star DeployDiff on GitHub
Related Reading
- Preview Infrastructure Cost Before Deploy — DeployDiff getting-started guide
- Block Deployments on Config Drift — ConfigDrift CI/CD gating
- Before You Deploy: Check Config Drift AND Infrastructure Cost — cross-tool workflow
- Envault + APIAuth: Rotate API Keys Across Environments — cross-tool secret rotation