Skip to content
SP StackPractices
intermediate By StackPractices

Read Replicas — Scale Reads Without Changing Application Logic

A practical guide to read replicas: setting up replication, routing read queries, handling replication lag, and scaling read-heavy workloads with PostgreSQL, MySQL, and cloud-managed replicas.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Read replicas are copies of your primary database that handle read-only queries. They are the simplest and most cost-effective way to scale read-heavy database workloads. By offloading SELECT queries to replicas, you reduce load on the primary, improve response times, and increase availability.

This guide covers replication setup, query routing, replication lag management, and operational best practices.

When to Use

  • Read queries exceed 80% of your database workload
  • Analytical queries (reports, aggregations) slow down transactional writes
  • You need read scaling beyond what a single instance can provide
  • You want geographic read distribution (replicas in multiple regions)
  • You need a hot standby for failover without dedicated standby hardware
  • Your working set fits in memory but query volume exceeds CPU capacity

When NOT to Use

  • Your workload is write-heavy (>50% writes) — replicas do not help write scaling
  • You require strongly consistent reads immediately after writes — replication lag may violate this
  • Your queries are already CPU-bound on the replica — adding more replicas is better than bigger ones
  • You have not optimized queries and indexes on the primary — fix those first

Core Concepts

ConceptDescription
Primary (Master)The database instance that accepts writes
Replica (Slave)A database instance that replicates data from the primary
Replication LagDelay between a write on the primary and its appearance on the replica
Streaming ReplicationContinuous transfer of WAL (PostgreSQL) or binlog (MySQL)
Logical ReplicationRow-level replication with filtering (PostgreSQL 10+)
PromotionConverting a replica into the new primary during failover

Read Replica Architecture

         Writes + Critical Reads

         ┌────▼────┐
         │ Primary │
         │  (R/W)  │
         └────┬────┘
              │ WAL / Binlog
      ┌───────┼───────┐
      │       │       │
   ┌──▼──┐ ┌─▼───┐ ┌─▼───┐
   │Repl │ │Repl │ │Repl │
   │  1  │ │  2  │ │  3  │
   └──┬──┘ └──┬──┘ └──┬──┘
      │       │       │
      └───────┼───────┘

         Read Queries

Step-by-Step Read Replica Setup

1. Configure Streaming Replication

Set up physical streaming replication for near-real-time copies:

# postgresql.conf (Primary)
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/archive/%f'
hot_standby = on
# postgresql.conf (Replica)
hot_standby = on
hot_standby_feedback = on
max_standby_streaming_delay = 30s
# pg_hba.conf (Primary): Allow replication connections
host replication replicator 192.168.1.0/24 scram-sha-256
# Initialize replica with base backup
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -v -R
# my.cnf (MySQL Primary)
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 500M
# my.cnf (MySQL Replica)
[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin
log_bin = /var/log/mysql/mysql-bin
read_only = 1
-- MySQL: Configure replication on replica
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST='primary-host',
    SOURCE_USER='replicator',
    SOURCE_PASSWORD='password',
    SOURCE_LOG_FILE='mysql-bin.000001',
    SOURCE_LOG_POS=0;

START REPLICA;

2. Route Reads to Replicas

Direct read queries to replicas while keeping writes on the primary:

# Example: Python with read/write splitting
import psycopg2
from contextlib import contextmanager

# Connection pools
primary_pool = psycopg2.pool.ThreadedConnectionPool(1, 10, dsn="dbname=app primary")
replica_pool = psycopg2.pool.ThreadedConnectionPool(1, 10, dsn="dbname=app replica")

@contextmanager
def get_db_connection(read_only=False):
    """Get connection: primary for writes, replica for reads."""
    pool = replica_pool if read_only else primary_pool
    conn = pool.getconn()
    try:
        yield conn
    finally:
        pool.putconn(conn)

# Usage
def get_user(user_id):
    with get_db_connection(read_only=True) as conn:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        return cursor.fetchone()

def update_user(user_id, data):
    with get_db_connection(read_only=False) as conn:
        cursor = conn.cursor()
        cursor.execute("UPDATE users SET name = %s WHERE id = %s", (data['name'], user_id))
        conn.commit()
// Example: Spring Boot with read/write routing
@Configuration
public class DataSourceConfig {
    
    @Bean
    public DataSource routingDataSource(
            @Qualifier("primaryDataSource") DataSource primary,
            @Qualifier("replicaDataSource") DataSource replica) {
        
        ReplicationRoutingDataSource routing = new ReplicationRoutingDataSource();
        Map<Object, Object> targets = new HashMap<>();
        targets.put("primary", primary);
        targets.put("replica", replica);
        routing.setTargetDataSources(targets);
        routing.setDefaultTargetDataSource(primary);
        return routing;
    }
}

// Transactional annotation determines routing
@Service
public class UserService {
    
    @Transactional(readOnly = true)
    public User getUser(String id) {
        return userRepository.findById(id).orElseThrow();
    }
    
    @Transactional
    public User updateUser(User user) {
        return userRepository.save(user);
    }
}

Routing strategies:

StrategyImplementationBest For
Connection pool splittingSeparate pools for primary and replicaSimple applications
Proxy-basedPgBouncer, ProxySQL, HAProxyZero application changes
ORM integrationDjango database routers, Spring AbstractRoutingDataSourceFramework-based apps
DNS-basedSeparate endpoints (primary.db, replica.db)Microservices

3. Handle Replication Lag

Replication lag is the primary challenge with read replicas:

# Example: Lag-aware routing
import time

class LagAwareRouter:
    def __init__(self, primary, replica, max_lag_seconds=5):
        self.primary = primary
        self.replica = replica
        self.max_lag = max_lag_seconds
    
    def get_replica_lag(self):
        """Check current replication lag in seconds."""
        cursor = self.replica.cursor()
        cursor.execute("""
            SELECT 
                EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
            AS lag_seconds
        """)
        return cursor.fetchone()[0] or 0
    
    def route_query(self, query, requires_freshness=False):
        """Route to replica if lag is acceptable and freshness not required."""
        if requires_freshness:
            return self.primary
        
        lag = self.get_replica_lag()
        if lag > self.max_lag:
            # Fallback to primary if replica is too far behind
            return self.primary
        
        return self.replica

# Usage: Force primary for user-modified data
user = router.route_query("SELECT * FROM users WHERE id = %s", 
                          requires_freshness=True)
-- PostgreSQL: Monitor replication lag
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag
FROM pg_stat_replication;

-- MySQL: Monitor replication lag
SHOW REPLICA STATUS\G
-- Look for: Seconds_Behind_Source

Strategies for handling lag:

ApproachHow It WorksTrade-off
Session stickinessRead from primary after a write in the same sessionSlightly more primary load
Lag thresholdRoute to primary if lag exceeds X secondsSimple, but can spike primary load
Eventual consistencyAccept stale reads, document itBest performance, but user-facing inconsistency
Read-after-write redirectTrack recently modified keys, route those to primaryComplex, requires application logic
Causal consistencyTrack LSN per session, wait for replica to catch upPostgreSQL 14+ logical replication

4. Set Up Monitoring and Alerting

Track replica health and lag:

# Example: Prometheus rules for replication monitoring
groups:
  - name: replication_alerts
    rules:
      - alert: HighReplicationLag
        expr: |
          pg_stat_replication_pg_wal_lsn_diff / 1024 / 1024 > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Replication lag exceeds 100MB"

      - alert: ReplicationStopped
        expr: |
          pg_stat_replication_state == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Replication has stopped"

      - alert: ReplicaLagSeconds
        expr: |
          mysql_slave_lag_seconds > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MySQL replica lag > 10 seconds"

Key metrics to monitor:

  • Lag in bytes/seconds: How far behind is the replica?
  • Replication state: Is replication running, paused, or stopped?
  • Lag trend: Is lag increasing (indicates replica cannot keep up)?
  • Query latency on replica: Are reads still fast?
  • Connection count: Is the replica at connection capacity?

5. Plan for Failover

When the primary fails, promote a replica:

# PostgreSQL: Manual promotion
pg_ctl promote -D /var/lib/postgresql/data

# Or using repmgr (automated failover tool)
repmgr standby promote
# MySQL: Promote replica to primary
STOP REPLICA;
RESET REPLICA ALL;
SET GLOBAL read_only = OFF;

Failover approaches:

ApproachRTOComplexityBest For
Manual promotion5-30 minLowSmall teams, non-critical systems
Automated tools30-120sMediumrepmgr, Patroni, orchestrator
Managed service0-60sNoneRDS Multi-AZ, Cloud SQL HA
Synchronous replication0s (no data loss)HighFinancial systems (trade latency for safety)

Best Practices

  • Start with one replica. One well-configured replica solves 80% of read scaling needs.
  • Use replicas for reporting and analytics. Isolate expensive queries from the primary.
  • Monitor lag relentlessly. Lag that grows unchecked indicates the replica cannot keep up.
  • Test failover before you need it. Promote a replica in staging quarterly.
  • Keep replica hardware equal to primary. A slower replica creates lag during peak load.
  • Use connection pooling on replicas too. Replicas have the same connection limits as primaries.

Common Mistakes

  • Routing all reads to replicas. Session state, recently modified data, and critical reads should stay on the primary.
  • Ignoring replication lag. Users see stale data and report “bugs” that are actually lag.
  • Single replica with no failover plan. If the replica fails, your read capacity drops to zero.
  • Running writes on replicas. Accidental writes break replication and require re-initialization.
  • No lag monitoring. You only discover lag problems when users complain.
  • Using replicas for write scaling. Replicas only scale reads. For write scaling, consider sharding or partitioning.

Variants

  • Cascade replication: Replica → Replica → Replica for geographic distribution
  • Multi-primary (master-master): Writes accepted on multiple nodes — complex, use with caution
  • Logical replication: Selective table/column replication (PostgreSQL 10+, MySQL binlog filtering)
  • Read replicas in different regions: Cloud provider managed replicas for global latency reduction
  • Delayed replica: Replica intentionally lagging by hours for point-in-time recovery

FAQ

Q: How much lag is acceptable? For most applications, <1 second is ideal, <5 seconds is acceptable. Analytics workloads can tolerate minutes. Financial systems may require synchronous replication (zero lag).

Q: Can I write to a read replica? No — replicas are read-only by design. Some systems (MySQL Group Replication, PostgreSQL multi-master extensions) allow multi-master writes, but they add significant complexity.

Q: How many replicas can I have? PostgreSQL supports up to ~10 streaming replicas before WAL sender overhead becomes significant. For more, use cascading replicas (replica of a replica) or logical replication.

Q: Do I need replicas if I use caching? Yes — caching and replicas complement each other. Cache handles hot data; replicas handle cache misses and analytical queries.

Conclusion

Read replicas are the simplest way to scale database reads. By configuring streaming replication, routing queries intelligently, and monitoring lag, you can handle 10x read growth without changing your data model or application architecture significantly.