Skip to content
SP StackPractices
intermediate

Database Migration Runbook Template

A database migration runbook template for executing schema changes safely with rollback procedures, verification steps, and communication plans.

Topics: databases

Database Migration Runbook Template

Use this template to execute database schema changes without downtime or data loss.

Template

# Database Migration Runbook: [Migration Name]

## Overview
| Field | Value |
|-------|-------|
| **Migration ID** | [timestamp or sequential number] |
| **Author** | [name] |
| **Reviewed by** | [name] |
| **Databases affected** | [list] |
| **Estimated duration** | [minutes / hours] |
| **Risk level** | [Low / Medium / High] |

## Pre-Migration Checklist

- [ ] Schema change reviewed by senior engineer
- [ ] Migration script tested on copy of production data
- [ ] Rollback script tested and timed
- [ ] Backups verified (last successful backup < 24 hours)
- [ ] Maintenance window scheduled (if needed)
- [ ] On-call notified
- [ ] Monitoring dashboards bookmarked

## Migration Steps

### Step 1: [Action]
```sql
-- Example: add nullable column
ALTER TABLE orders ADD COLUMN tracking_url VARCHAR(500) NULL;

Step 2: [Action]

-- Example: create index concurrently
CREATE INDEX CONCURRENTLY idx_orders_tracking ON orders(tracking_url);

Step 3: [Action]

-- Example: backfill data
UPDATE orders SET tracking_url = 'https://...' WHERE shipped_at IS NOT NULL;

Verification

CheckQueryExpected Result
Schema applied\d ordersColumn tracking_url exists
Index created\di idx_orders_trackingIndex is valid
No locks heldpg_locksNo long-running locks
App healthDashboardError rate < baseline

Rollback Procedure

-- Step 1: drop index
DROP INDEX CONCURRENTLY IF EXISTS idx_orders_tracking;

-- Step 2: drop column
ALTER TABLE orders DROP COLUMN IF EXISTS tracking_url;
Rollback StepTime to CompleteVerification
Drop index< 1 minuteQuery plan reverts
Drop column< 1 minuteSchema matches pre-migration

Post-Migration

  • Application error rate normal
  • Latency within baseline
  • Replication lag acceptable
  • On-call handoff notes updated
  • Runbook archived with actual duration

Communication

AudienceTimingMessage
EngineeringBeforeMaintenance window announced
On-callDuringReal-time status updates
StakeholdersAfterAll-clear + any issues encountered

## Migration Safety Rules

| Rule | Why | Exception |
|------|-----|-----------|
| **Add columns as nullable** | Existing rows need a value | Supply default in same transaction |
| **Create indexes concurrently** | Avoids table locks | Not available on all databases |
| **Backfill in batches** | Prevents lock escalation | Small tables (< 1M rows) |
| **Test rollback first** | Rollback you have never practiced is a guess | None |
| **Run during low traffic** | Reduces blast radius | Emergency fixes |

## Best Practices

- **Use expand-contract for breaking changes** — add new schema, deploy code, remove old schema in separate migrations
- **Batch large updates** — `UPDATE ... WHERE id BETWEEN 1 AND 10000` in a loop, with sleeps
- **Monitor replication lag** — large DDL can block replication; pause if lag exceeds thresholds
- **Keep migrations idempotent** — `IF NOT EXISTS` and `IF EXISTS` let you re-run safely
- **Document actual duration** — future estimates improve when you track reality

## Common Mistakes

- Running untested migrations in production — test on a copy with realistic data size
- Forgetting to use `CONCURRENTLY` — locks the table for writes, causing outages
- Large transactions without batching — a single `UPDATE` on 100M rows will lock and rollback slowly
- No rollback plan — "we will figure it out" is not a plan
- Migrating during peak traffic — even safe migrations add load; schedule off-peak

## Frequently Asked Questions

### Should I use a migration tool or raw SQL?

Use a tool (Flyway, Liquibase, Django migrations, Rails migrations). Tools track applied migrations, enforce ordering, and provide rollback hooks. Raw SQL scripts require manual tracking and are error-prone.

### How do I handle a failed migration in production?

Stop immediately. Do not apply subsequent migrations. Assess whether to rollback or fix forward. Rollback if data integrity is at risk. Fix forward if the fix is a small, well-understood script. Always have the rollback script ready before you start.

### Can I run migrations in a transaction?

Yes, for DDL-safe databases (PostgreSQL). For MySQL, DDL is implicitly committed, so transactions do not protect you. Know your database's behavior before you plan the migration strategy.