Skip to content
SP StackPractices
intermediate By StackPractices

Database Replication — Master-Slave, Multi-Master, and Beyond

A practical guide to database replication strategies: master-slave, multi-master, synchronous vs asynchronous, and how to handle failover and conflict resolution.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Database replication is the process of copying and maintaining data across multiple database nodes. It provides high availability, read scaling, and disaster recovery. But replication introduces complexity: lag, conflicts, split-brain scenarios, and consistency trade-offs. This guide covers the replication strategies used in production, from simple master-slave setups to multi-master clusters.

When to Use

  • You need read scaling beyond what a single node can handle
  • High availability requires automatic failover
  • Disaster recovery needs off-site data copies
  • You want to run analytics without impacting transactional workloads

Master-Slave Replication

One primary node handles writes; replicas handle reads. The simplest and most common topology.

┌─────────┐     write     ┌─────────┐
│ Master  │───────────────▶│  Slave  │
│  (R+W)  │                │  (R)    │
└─────────┘                └─────────┘
      │                          │
      │         read             │
      └──────────────────────────┘

Asynchronous Replication

The master commits locally, then sends changes to slaves. Low latency but potential data loss if master fails before slaves catch up.

-- MySQL
CHANGE MASTER TO
  MASTER_HOST='master_host',
  MASTER_USER='replica',
  MASTER_PASSWORD='password';
START SLAVE;

-- PostgreSQL (streaming replication)
-- Primary: wal_level = replica, max_wal_senders = 3
-- Standby: primary_conninfo = 'host=primary_host port=5432'

Semi-Synchronous Replication

The master waits for at least one slave to acknowledge receipt before committing. Balances safety and performance.

-- MySQL
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 10000;  -- 10 seconds

Synchronous Replication

Master waits for all slaves to confirm the write. No data loss but higher latency.

-- PostgreSQL synchronous_commit
SET synchronous_commit = 'remote_apply';
SET synchronous_standby_names = 'replica1, replica2';

Replication Lag

Lag is the delay between a write on the master and its appearance on replicas. Causes and mitigations:

CauseMitigation
Network latencyUse nearby regions, compression
High write volumeShard writes, add replicas
Large transactionsBreak into smaller batches
Slow replica hardwareMatch replica specs to master
Replica reads competingDedicated read replicas

Detecting Lag

-- PostgreSQL
SELECT
  now() - pg_last_xact_replay_timestamp() AS lag;

-- MySQL
SHOW SLAVE STATUS\G
-- Seconds_Behind_Master

Multi-Master Replication

Multiple nodes accept writes. Complex but enables writes scaling and geographic distribution.

┌─────────┐◀──────────▶┌─────────┐
│ Master A│            │ Master B│
└─────────┘            └─────────┘
      │                      │
      ▼                      ▼
┌─────────┐            ┌─────────┐
│  Slave  │            │  Slave  │
└─────────┘            └─────────┘

Conflict Scenarios

ScenarioConflict
Same key insertedPrimary key violation
Same row updatedLast-write-wins or merge
Row deleted on A, updated on BUpdate wins or conflict flag
Auto-increment IDsDuplicate IDs across masters

Conflict Resolution Strategies

  1. Last-write-wins — timestamp or vector clock decides
  2. Merge — application-specific logic combines changes
  3. Manual resolution — flag conflicts for human review
  4. Avoidance — partition data so each row has one master

Failover

Switching to a replica when the master fails. Manual vs automatic:

Manual Failover

# PostgreSQL: promote standby
pg_ctl promote -D /var/lib/postgresql/data

# MySQL: stop slave, reset, start
STOP SLAVE;
RESET SLAVE ALL;

Automatic Failover (Patroni)

# patroni.yml
scope: mycluster
restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.1:8008
etcd:
  hosts: 10.0.0.10:2379,10.0.0.11:2379,10.0.0.12:2379
postgresql:
  data_dir: /var/lib/postgresql/data
  pg_hba:
    - host replication replicator 10.0.0.0/24 md5

Read Replicas for Scaling

Route reads to replicas, writes to master. Application must handle replication lag.

class DatabaseRouter {
  private master: Pool;
  private replicas: Pool[];
  private currentReplica = 0;

  getWritePool(): Pool {
    return this.master;
  }

  getReadPool(): Pool {
    // Round-robin across replicas
    const pool = this.replicas[this.currentReplica];
    this.currentReplica = (this.currentReplica + 1) % this.replicas.length;
    return pool;
  }
}

// Usage
const db = new DatabaseRouter();
await db.getWritePool().query('INSERT INTO ...');
const result = await db.getReadPool().query('SELECT ...');

Common Mistakes

  • Ignoring replication lag — reading immediately after writing sees stale data
  • Writing to replicas — causes split-brain and data inconsistency
  • No failover automation — minutes of manual work become hours of downtime
  • Monitoring only Seconds_Behind_Master — lag can be zero while slave is still processing
  • Under-provisioned replicas — replicas that cannot keep up with master throughput

FAQ

Does replication replace backups? No. Replication handles node failure, not data corruption, ransomware, or accidental deletion. Maintain separate backups.

How do I handle schema changes? Use online schema change tools (pt-online-schema-change, gh-ost, or native online DDL) to avoid locking replicas.

Can I replicate across cloud providers? Yes, but latency and egress costs increase. Consider logical replication for selective table sync.