How to Catch Config Drift Before It Breaks Production
You know the scenario: staging passes all tests, but when you deploy to production — 500s everywhere. You spend two hours debugging, only to find that DATABASE_URL points to a read replica in staging but a completely different cluster in production. Or that LOG_LEVEL was set to debug in dev and the production logs are flooding disk. Or worse — a key was renamed in one environment and not the others, silently breaking a critical integration.
This is config drift: the silent divergence of configuration between environments. It's one of the most common causes of production incidents, and it's almost never caught by test suites because the code works fine — the configuration doesn't.
In this tutorial, you'll learn how to detect config drift with ConfigDrift, a CLI tool that compares configurations across environments, flags missing keys, and surfaces deprecated values — before they cause incidents.
What You'll Set Up
By the end of this tutorial, you'll have:
- ConfigDrift installed and ready on your machine
- Environment config files for dev, staging, and prod
- Drift detection running with colored table output
- JSON output for CI/CD pipeline integration
- A GitHub Actions workflow that blocks deploys on breaking drift
1. Install ConfigDrift
ConfigDrift is a Python CLI — install it in seconds:
Or install directly from GitHub:
Verify it's working:
You should see the available commands: check, scan, and init.
2. Create Sample Config Files
Let's simulate a real project with three environments. Create a directory for your configs:
Now create these three files to simulate a drifted environment:
config/dev/app.yaml
database:
host: localhost
port: 5432
name: myapp_dev
pool_size: 5
api:
timeout: 30
retries: 3
endpoint: http://localhost:8080
logging:
level: debug
format: pretty
feature_flags:
new_checkout: true
dark_mode: false
beta_search: true
Save this as config/staging/app.yaml — the same, but with a key that's drifted:
database:
host: staging-db.internal
port: 5432
name: myapp_staging
pool_size: 10 # ← different from dev
api:
timeout: 30
retries: 3
endpoint: https://staging-api.internal
logging:
level: info # ← different from dev
format: json
feature_flags:
new_checkout: true
beta_search: true # ← missing dark_mode!
And config/prod/app.yaml with an even bigger drift:
database:
host: prod-db.internal
port: 5432
name: myapp_prod
pool_size: 20
api:
timeout: 60 # ← different
# retries missing! # ← key removed
endpoint: https://prod-api.internal
logging:
level: warn
format: json
feature_flags:
new_checkout: true
retries is missing from prod entirely. dark_mode was added in dev but never made it to staging or prod. Pool sizes differ across all three environments. These are the kinds of silent divergences that cause "works on my machine" bugs in production.
3. Check for Drift
Compare dev vs prod directly:
ConfigDrift will output a colored table showing every key that differs, is missing, or has changed severity, organized by severity level.
You'll see something like:
╔══════════════════════════════════════════════════════════╗
║ Config Drift Report ║
╠══════════════════════════════════════════════════════════╣
║ BREAKING ║
║ api.retries prod → MISSING (was in dev) ║
║ feature_flags.dark_mode prod → MISSING ║
╠══════════════════════════════════════════════════════════╣
║ WARNING ║
║ database.pool_size dev=5 prod=20 ║
║ api.timeout dev=30 prod=60 ║
║ logging.level dev=debug prod=warn ║
╚══════════════════════════════════════════════════════════╝
ConfigDrift classifies each difference:
- 🔴 Breaking — Critical keys (
database*,auth*,api_key*,secret*,endpoint*, etc.) that differ or are missing. These should block deployment. - 🟡 Warning — Non-critical key changes or additions/removals that warrant attention.
- 🔵 Info — Value changes for informational purposes.
4. Scan Entire Environment Directories
When your project uses multiple config files per environment, use the scan command:
This scans all .yaml, .yml, .json, .toml, and .env files in each directory, merges them into a single config tree per environment, and compares everything against your baseline (dev).
The output adds one more column: which environment file the key came from — so you can trace each drift to its source file.
5. JSON Output for CI
For pipeline integration, use JSON output:
This produces structured JSON:
{
"baseline": "config/dev/app.yaml",
"target": "config/prod/app.yaml",
"breaking": [
{
"key": "api.retries",
"state": "missing",
"message": "api.retries is present in baseline but missing in target"
},
{
"key": "feature_flags.dark_mode",
"state": "missing",
"message": "feature_flags.dark_mode is present in baseline but missing in target"
}
],
"warnings": [
{ "key": "database.pool_size", "baseline_value": 5, "target_value": 20 },
{ "key": "api.timeout", "baseline_value": 30, "target_value": 60 },
{ "key": "logging.level", "baseline_value": "debug", "target_value": "warn" }
],
"infos": [],
"exit_code": 1
}
The exit_code field in the report tells you the severity:
- 0 → No drift found
- 1 → Breaking drift detected (should block deploy)
6. CI/CD Integration (GitHub Actions)
Here's the real payoff — gating your pipeline so that config drift never ships. Add this GitHub Actions workflow to your repo:
# .github/workflows/config-drift-check.yml
name: Config Drift Check
on:
pull_request:
paths:
- 'config/**'
- '.configdrift.yaml'
push:
branches: [main]
jobs:
check-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install ConfigDrift
run: pip install configdrift
- name: Check for config drift
run: |
configdrift scan config/dev config/staging config/prod \
--baseline dev --output silent || echo "::warning::Config drift detected!"
--output silent in CI. The exit code is all you need — non-zero means drift was found. You can capture the full report as an artifact for debugging.
For a more detailed pipeline that surfaces the actual drift in the PR, add this step:
- name: Generate drift report
id: drift
run: |
configdrift scan config/dev config/staging config/prod \
--baseline dev --output json > drift-report.json
echo "drift_found=$(jq -r '.exit_code' drift-report.json)" >> $GITHUB_OUTPUT
cat drift-report.json
- name: Comment PR with drift summary
if: steps.drift.outputs.drift_found == '1'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = JSON.parse(fs.readFileSync('drift-report.json'));
const breaking = report.breaking.map(b => `- 🔴 \`${b.key}\`: ${b.message}`).join('\n');
await github.rest.issues.createComment({
...context.repo,
issue_number: context.issue.number,
body: `## ⚠️ Config Drift Detected\n\n**Breaking changes:**\n${breaking}\n\nFix these before merging.`
});
Now every PR that changes config files gets automatically checked against your baseline environment. If drift is found, the PR gets a comment and the check fails — no more "it worked in staging" surprises.
7. Initialize a ConfigDrift Project
For projects with many environments, use a config file to keep things organized:
This creates a .configdrift.yaml file where you can define environment paths, severity rules, and exclusions:
baseline: dev
environments:
dev: ./config/dev
staging: ./config/staging
prod: ./config/prod
severity:
rules:
- pattern: "api_key*"
level: breaking
- pattern: "email_*"
level: info
Then run with just:
Supported File Formats
| Format | Extensions | Notes |
|---|---|---|
| YAML | .yaml, .yml | Full nested structure support |
| JSON | .json | Nested flattening |
| TOML | .toml | Python 3.11+ native support |
| .env | .env | Standard KEY=VALUE format |
Real-World Drift Horror Stories
Still not convinced? Here are real incidents caused by config drift:
- The $350K timeout: A fintech company had
API_TIMEOUT=5in dev and120in production. A payment processing function that worked instantly in dev timed out after 30 seconds in prod — and the retry logic multiplied the damage. ConfigDrift would have flagged this as a warning in PR. - The silent database switch: A dev accidentally pointed
DB_HOSTto a staging read replica. All tests passed because the schema was identical. A week later, the prod database got write traffic that should have gone to staging — corrupting customer data. - The compliance audit fail: An SOC2 auditor found that
LOG_LEVEL=debugwas running in production, writing PII to plaintext logs. It had diverged months ago in a hotfix and never got reconciled.
Next Steps
ConfigDrift is one of 10 tools in the Revenue Holdings suite, all designed to catch problems before they reach production:
- API Contract Guardian — Catch breaking API schema changes in PRs
- json2sql — Convert JSON data to INSERT statements with smart type inference
- DeployDiff — See the full cost and blast radius of every infra change
- DeadCode — Detect and remove unused exports, dead routes, and orphaned CSS
- SchemaForge — Bidirectional ORM schema converter (Drizzle, Prisma, SQL DDL)
Stay in the Loop
PyPI publishing is coming soon. Get notified when we ship.