Home / Blog / Database Reliability

PostgreSQL High Availability with Read Replicas: Field-Proven Production Guide

TL;DR

Read replicas alone do NOT provide high availability in PostgreSQL. This guide clarifies what read replicas actually provide (read scaling and disaster recovery) versus what they do not (automatic failover). PostgreSQL has no built-in automatic failover mechanism. Production HA requires external tooling (Patroni, repmgr) or managed services (RDS, Aurora, Cloud SQL). Every section includes production-tested configurations, real failure scenarios, and operational warnings based on ScaleWeaver's experience managing PostgreSQL in production.

  • Read replicas provide read scaling and disaster recovery, not automatic HA
  • PostgreSQL requires external tooling for automatic failover (Patroni, repmgr, or managed services)
  • Replication lag means replicas serve stale data (in most self-managed environments, 1-5 seconds with async replication under steady-state conditions)
  • Production HA requires monitoring, automated failover, and split-brain prevention
  • Managed services handle HA automatically but reduce control and increase cost
  • Self-managed HA requires significant operational expertise and ongoing maintenance

What High Availability Actually Means in PostgreSQL

High availability means your database remains accessible and writable even when individual components fail. This requires automatic failover, not just data replication. See PostgreSQL's official high availability documentation for the foundational concepts.

The Critical Distinction

Read replicas alone do not provide high availability in PostgreSQL. They provide:

  • Read Scaling: Distribute read queries across multiple servers
  • Disaster Recovery: A copy of data that can be promoted manually
  • Backup Source: A live backup for point-in-time recovery

What they do not provide:

  • Automatic Failover: PostgreSQL has no built-in leader election
  • Zero Downtime: Manual promotion takes minutes, not seconds
  • Split-Brain Prevention: Nothing prevents two primaries from accepting writes
  • Health Monitoring: No automatic detection of primary failure

RPO vs RTO

Understanding these metrics is essential for designing your HA strategy:

  • RPO (Recovery Point Objective): Maximum acceptable data loss. With async replication, RPO equals replication lag (in most self-managed environments, 1-5 seconds under steady-state conditions, though network partitions or high write loads can increase this). With sync replication and assuming quorum is intact, RPO approaches zero. See synchronous_commit documentation for configuration details.
  • RTO (Recovery Time Objective): Maximum acceptable downtime. Manual failover takes 5-15 minutes in most scenarios (depending on operator response time and verification steps). Automated failover with Patroni takes 30-60 seconds under normal conditions (assuming quorum is maintained). Managed services (Aurora) can achieve sub-30 second failover when quorum is intact and network partitions are not present.

Failover vs Switchover

These terms are often confused but represent different scenarios:

  • Failover: Unplanned promotion when the primary fails. Requires automated tooling to minimize downtime.
  • Switchover: Planned promotion for maintenance or upgrades. Can be coordinated manually with zero data loss.

Leader Election

PostgreSQL does not implement leader election. When the primary fails, replicas do not automatically detect this or promote themselves. External coordination is required:

  • Distributed Consensus: Tools like Patroni use etcd, Consul, or ZooKeeper to coordinate
  • Quorum Requirements: Multiple nodes must agree before promotion to prevent split-brain
  • Fencing: The old primary must be stopped or isolated to prevent dual-write scenarios

Split-Brain Prevention

Split-brain occurs when both the old primary and a promoted replica accept writes, causing data divergence. Prevention requires:

  • Fencing Mechanisms: STONITH (Shoot The Other Node In The Head) to stop the old primary
  • Quorum-Based Promotion: Require majority consensus before allowing writes
  • Network Partition Handling: Designate which side of a partition can continue serving
  • Application-Level Fencing: Use distributed locks or external coordination

Why Replication Does Not Equal HA

Replication moves data. HA requires:

  • Automatic failure detection
  • Leader election and promotion
  • Split-brain prevention
  • Application routing updates
  • Health monitoring and alerting

PostgreSQL provides replication. Everything else requires external tooling or managed services.

Key Takeaways:
  • Read replicas provide read scaling and disaster recovery, not automatic HA
  • PostgreSQL has no built-in automatic failover mechanism
  • HA requires external tooling (Patroni, repmgr) or managed services
  • RPO depends on replication mode; RTO depends on failover automation
  • Split-brain prevention is critical and requires fencing mechanisms

Replication Fundamentals

Understanding how PostgreSQL replication works is essential for making informed decisions about HA architecture. See PostgreSQL's WAL documentation for the technical foundation.

Write-Ahead Logging (WAL)

PostgreSQL uses Write-Ahead Logging as its replication mechanism. Every database change is first written to WAL before being applied to data files. This ensures durability and enables replication.

Transaction commit flow:

  1. Write transaction to WAL
  2. Wait for WAL fsync to complete
  3. Return success to client
  4. Apply changes to data files asynchronously

This design means all committed transactions exist in WAL and can be replayed after crashes or replicated to other servers.

Streaming Replication

Streaming replication connects replicas directly to the primary to receive WAL data in real-time. This is the standard production method.

PostgreSQL Streaming Replication Flow ApplicationWrites to Primary • Reads from ReplicasPrimary PostgreSQLRead/WriteWrite-Ahead Log (WAL)ActiveWrites → WALStreaming to ReplicasReplica 1Read OnlyWrite-Ahead Log (WAL)SyncingReceiving WAL StreamServing Read QueriesReplica 2Read OnlyWrite-Ahead Log (WAL)SyncingReceiving WAL StreamServing Read QueriesWRITEREAD

Synchronous vs Asynchronous Replication

This decision directly impacts RPO and write performance.

Asynchronous Replication (Default)

The primary commits transactions without waiting for replica confirmation:

  • Pros: Best write performance, no latency impact
  • Cons: Risk of data loss if primary fails before WAL reaches replicas (in most self-managed environments, 1-5 seconds of potential loss under steady-state conditions)
  • RPO: Equal to replication lag (in most self-managed environments, 1-5 seconds under steady-state conditions)

Synchronous Replication

The primary waits for at least one replica to confirm WAL receipt before committing:

  • Pros: Zero data loss guarantee (if configured correctly)
  • Cons: Higher write latency (network round-trip), risk of blocking if replica is slow or down
  • RPO: Zero (no data loss)
Warning: If your synchronous replica goes down, all writes will block until it comes back or you change the configuration. Always have a backup synchronous replica or use synchronous_standby_names with multiple candidates.
Counter-Example: Synchronous Replication Trade-offs

While synchronous replication provides zero RPO, it introduces significant operational complexity:

  • Write Latency: Every write waits for network round-trip to at least one replica. In multi-region setups, this can add 50-200ms latency per transaction.
  • Quorum Loss: If the majority of synchronous replicas fail or become partitioned, all writes block until quorum is restored. This creates a trade-off between data durability and availability.
  • Performance Impact: High-write workloads see throughput reduction of 20-40% compared to async replication, depending on network latency and replica performance.
  • Operational Overhead: Requires careful monitoring of replica health, network partitions, and quorum status. Failures require immediate intervention to prevent write blocking.

Most production systems use async replication with acceptable RPO (1-5 seconds) rather than accepting the write latency and availability risks of synchronous replication.

Physical vs Logical Replication

PostgreSQL offers two replication types:

  • Physical Replication: Replicates the entire database cluster at the disk block level. The replica is an exact byte-for-byte copy. This is the standard method for HA and read scaling.
  • Logical Replication: Replicates specific tables or databases using a publish/subscribe model. Useful for selective replication, cross-version upgrades, or replicating to different database systems.
Key Takeaways:
  • WAL is the foundation of PostgreSQL replication
  • Streaming replication is the standard production method
  • Async replication provides better performance but non-zero RPO
  • Sync replication provides zero RPO but can block writes
  • Physical replication is used for HA; logical replication for selective use cases

High Availability Tooling Comparison

PostgreSQL requires external tooling for automatic failover. Here is when and why to use each option.

ToolBest ForFailover TimeComplexityLimitations
PatroniSelf-managed deployments requiring automation. Kubernetes, bare metal, VMs.30-60 seconds (assuming quorum maintained)High. Requires etcd/Consul/ZooKeeper.Requires distributed consensus store. Complex setup and maintenance.
repmgrSimpler deployments. Good for smaller teams.60-120 secondsMedium. Simpler than Patroni.Less sophisticated than Patroni. Requires manual configuration.
AWS RDSAWS-native applications. Teams wanting managed infrastructure.60-120 secondsLow. Fully managed.Limited PostgreSQL version control. Some extensions restricted.
AWS AuroraHigh availability requirements. Sub-30 second failover needed.10-30 seconds (when quorum intact, no network partitions)Low. Fully managed.PostgreSQL-compatible, not pure PostgreSQL. Vendor lock-in.
GCP Cloud SQLGCP-native applications. Google ecosystem integration.60-120 secondsLow. Fully managed.GCP-specific. Limited customization.
Azure DatabaseAzure-native applications. Microsoft ecosystem.60-120 secondsLow. Fully managed.Azure-specific. Limited PostgreSQL version control.

When to Use Each

  • Patroni: You need full control, specific PostgreSQL versions, custom extensions, or Kubernetes deployments. You have SRE expertise to maintain it.
  • repmgr: You want simpler automation than Patroni but still need self-management. Good for teams with moderate PostgreSQL expertise.
  • Managed Services: You want to focus on application code. You can accept vendor limitations and higher costs. You need rapid deployment.
Key Takeaways:
  • Patroni provides the most control but requires significant operational expertise
  • repmgr is simpler but less sophisticated than Patroni
  • Managed services handle HA automatically but reduce control and increase cost
  • Aurora provides the fastest failover but with vendor lock-in
  • Choose based on team expertise, control requirements, and RTO needs

Production Failure Scenarios

Understanding what actually happens during failures is critical for designing resilient systems.

Primary Node Crashes

When the primary crashes:

  1. Active connections are terminated immediately
  2. In-flight transactions are lost (not yet committed to WAL)
  3. Replicas continue streaming until the connection times out (in most configurations, 30-60 seconds, though this depends on wal_sender_timeout settings). See replication configuration parameters for timeout behavior.
  4. Replicas detect the connection loss and stop accepting new WAL
  5. Without automated failover, applications remain down until manual intervention
  6. With Patroni or similar tools, a replica is promoted automatically (30-60 seconds)
  7. Applications must reconnect to the new primary (may require DNS/load balancer updates)

Impact: Downtime equals failover time plus application reconnection time. Without automation, this is 5-15 minutes. With automation, 1-2 minutes.

Network Partition Occurs

When the network partitions the primary from replicas:

  1. The primary continues accepting writes (it cannot see replicas are disconnected)
  2. Replicas stop receiving WAL and fall behind
  3. With async replication, no immediate impact on primary writes
  4. With sync replication, writes block if the sync replica is on the wrong side of the partition
  5. If a replica is promoted on the other side, split-brain occurs
  6. Both sides accept writes, causing data divergence

Prevention: Use quorum-based systems (Patroni) that require majority consensus. Designate which side can continue (in most configurations, the side with the primary). Implement fencing to stop the old primary.

Replica Lag Spikes

When replica lag increases significantly:

  1. Replicas serve increasingly stale data
  2. Applications reading from replicas see outdated information
  3. If lag exceeds wal_keep_size, the primary may delete WAL files the replica needs
  4. The replica may need to be re-seeded if it falls too far behind
  5. With replication slots, WAL is retained but disk usage grows

Causes: Network issues, replica performance problems, primary generating WAL faster than replica can apply, long-running queries on replica blocking replay.

Monitoring: Track lag in both bytes and time. Alert when lag exceeds acceptable thresholds (in most self-managed environments, 1-5 seconds for async under steady-state conditions, zero for sync when quorum is intact).

Replication Slot Fills Disk

When a replica with a replication slot goes down permanently:

  1. The slot prevents the primary from deleting WAL files
  2. WAL files accumulate on the primary's disk
  3. Disk usage grows until the disk fills
  4. When the disk fills, the primary cannot write new transactions
  5. The database becomes read-only or crashes

Prevention: Monitor replication slot retention. Alert when retention exceeds thresholds (e.g., 10GB). Automatically drop inactive slots after a timeout. Use max_slot_wal_keep_size (PostgreSQL 13+) to limit retention.

Recovery: Drop the problematic slot if the replica is permanently gone. If the replica will return, temporarily increase wal_keep_size or re-seed the replica.

Synchronous Standby Becomes Unavailable

When a synchronous replica goes down:

  1. With a single sync replica, all writes block immediately
  2. The primary cannot commit transactions until the replica returns or configuration changes
  3. Applications experience write timeouts
  4. Read queries may continue if routed to other replicas

Prevention: Configure multiple synchronous replicas using synchronous_standby_names with a quorum (e.g., ANY 2 (replica1, replica2, replica3)). This allows writes to continue if one sync replica fails, assuming quorum is intact.

Recovery: Either wait for the replica to return, or temporarily change synchronous_standby_names to remove the failed replica. This requires a configuration reload.

Key Takeaways:
  • Primary crashes cause immediate connection loss and require failover
  • Network partitions can cause split-brain without proper quorum mechanisms
  • Replica lag spikes indicate performance or network issues
  • Replication slots can cause disk exhaustion if not monitored
  • Synchronous replication can block writes if the sync replica fails

Operational Warnings

These warnings are based on real production incidents. Heed them.

Warning: Replication Slots Can Cause Disk Exhaustion

If a replica with a replication slot goes down permanently, the slot prevents WAL cleanup. WAL files accumulate until the disk fills, causing the primary to stop accepting writes. Monitor pg_replication_slots retention and drop unused slots. Use max_slot_wal_keep_size (PostgreSQL 13+) to limit retention.

Warning: Synchronous Replication Can Block Writes

If your synchronous replica becomes unavailable, all writes block until it returns or you change the configuration. Configure multiple synchronous replicas with quorum (e.g., ANY 2 (replica1, replica2, replica3)) so writes can continue if one fails, assuming quorum is intact.

Warning: Async Replicas Can Serve Stale Data

Asynchronous replicas lag behind the primary, in most self-managed environments by 1-5 seconds under steady-state conditions, but potentially more during high load or network issues. Applications reading from replicas must tolerate stale data. Route critical reads that require fresh data to the primary.

Warning: Manual Failover Takes Minutes

Without automated failover tooling, promoting a replica takes 5-15 minutes: detecting failure, verifying, promoting, reconfiguring other replicas, updating DNS/load balancers, and application reconnection. This is unacceptable for most production systems. Use Patroni, repmgr, or managed services.

Warning: Split-Brain Causes Data Corruption

If both the old primary and a promoted replica accept writes, data diverges and cannot be automatically reconciled. Use fencing mechanisms (STONITH) or quorum-based systems to prevent split-brain. Do not promote a replica without confirming the old primary is stopped or fenced.

Warning: Replication Does Not Equal Backup

Replication protects against hardware failure but not logical errors (accidental deletes, corruptions, bugs). You still need regular backups with point-in-time recovery. Replicas are not a substitute for proper backup strategies.

Need Help Implementing PostgreSQL High Availability?

If you'd like guidance on setting up read replicas, configuring automated failover with Patroni, or designing a production-ready HA architecture, we can help review your setup and suggest improvements. Our database reliability engineers have implemented PostgreSQL HA solutions for multiple production environments.

Monitoring and SLOs

Comprehensive monitoring is essential for maintaining a healthy PostgreSQL HA setup. You need to track replication lag, replica health, and potential issues before they become critical.

Key Metrics

Replication Lag (Seconds and Bytes)

Monitor lag in both dimensions:

  • Bytes: How much WAL data the replica has not received or applied
  • Time: How far behind the replica is in wall-clock time
-- Primary: Check lag for all replicas
SELECT 
 application_name,
 pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
 pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS lag_mb,
 state,
 sync_state
FROM pg_stat_replication;

-- Replica: Check lag from replica perspective
SELECT 
 pg_last_wal_receive_lsn() AS received_lsn,
 pg_last_wal_replay_lsn() AS replayed_lsn,
 pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;

WAL Generation Rate

Track how fast the primary generates WAL to predict replication requirements:

-- Calculate WAL generation rate
SELECT 
 pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024 AS total_wal_mb,
 (pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / 1024 / 1024) / 
 EXTRACT(EPOCH FROM (now() - pg_postmaster_start_time())) * 3600 AS wal_mb_per_hour;

Replica Replay Delay

Monitor how long it takes replicas to apply received WAL:

-- On replica: Check replay delay
SELECT 
 pg_last_wal_receive_time() AS received_time,
 pg_last_wal_replay_time() AS replayed_time,
 pg_last_wal_receive_time() - pg_last_wal_replay_time() AS replay_delay;

Replication Slot Retention

Track how much WAL each slot is retaining:

-- Check replication slot retention
SELECT 
 slot_name,
 slot_type,
 active,
 pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_bytes,
 pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024 AS retained_mb
FROM pg_replication_slots;

Suggested Alert Thresholds

  • Replication Lag: Alert if lag exceeds 10MB (bytes) or 5 seconds (time) for async replicas. Alert if lag exceeds 1MB or 1 second for sync replicas.
  • Replication Down: Alert if no replicas are streaming for more than 2 minutes.
  • Slot Retention: Alert if any slot retains more than 10GB of WAL.
  • Replica Not in Recovery: Alert if a replica is not in recovery mode (may have been accidentally promoted).
  • Primary Unreachable: Alert if the primary cannot be reached from monitoring systems.

Metrics That Indicate Imminent Failure

  • Rapidly Increasing Lag: Lag growing faster than it can be applied indicates the replica cannot keep up. May require re-seeding.
  • Disk Space Decreasing: On primary, indicates WAL accumulation (check slot retention). On replica, indicates data growth or WAL accumulation.
  • Connection Errors: Frequent connection failures between primary and replicas indicate network issues.
  • Replay Delay Increasing: Replica receiving WAL but not applying it indicates performance issues.
  • Sync Replica Unavailable: With single sync replica, indicates writes will block.

Prometheus Exporter Setup

Use postgres_exporter to expose PostgreSQL metrics to Prometheus:

# Install postgres_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
tar xzf postgres_exporter-0.15.0.linux-amd64.tar.gz

# Create systemd service
# /etc/systemd/system/postgres_exporter.service
[Unit]
Description=PostgreSQL Exporter
After=network.target

[Service]
User=postgres
Environment="DATA_SOURCE_NAME=postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter
Restart=always

[Install]
WantedBy=multi-user.target

Critical Alerts

# Prometheus alerting rules
groups:
- name: postgresql_replication
 rules:
 - alert: PostgreSQLReplicationLag
 expr: pg_replication_lag_bytes > 10485760 # 10MB
 for: 5m
 annotations:
 summary: "PostgreSQL replication lag is high"
 description: "Replica {{ $labels.instance }} is {{ $value }} bytes behind"
 
 - alert: PostgreSQLReplicationDown
 expr: count(pg_stat_replication{state="streaming"}) < 1
 for: 2m
 annotations:
 summary: "No active PostgreSQL replicas"
 description: "All replicas are down or not streaming"
 
 - alert: PostgreSQLReplicationSlotRetention
 expr: pg_replication_slot_retained_bytes > 10737418240 # 10GB
 for: 10m
 annotations:
 summary: "Replication slot retaining too much WAL"
 description: "Slot {{ $labels.slot_name }} is retaining {{ $value }} bytes"
 
 - alert: PostgreSQLReplicaNotInRecovery
 expr: pg_is_in_recovery{instance="replica:9187"} == 0
 for: 1m
 annotations:
 summary: "Replica is not in recovery mode"
 description: "Replica may have been accidentally promoted"
Key Takeaways:
  • Monitor replication lag in both bytes and time
  • Track WAL generation rate to predict replication requirements
  • Monitor replication slot retention to prevent disk exhaustion
  • Set alert thresholds based on your RPO requirements
  • Watch for metrics that indicate imminent failure

Read Scaling with Replicas

Read replicas excel at distributing read load across multiple servers. This section covers how to use them effectively for read scaling.

Use Cases for Read Replicas

  • Read Scaling: Distribute read queries across multiple servers, reducing load on the primary. This is especially valuable after optimizing individual queries, as shown in our query optimization guide.
  • Analytics & Reporting: Run heavy analytical queries without impacting production traffic
  • Geographic Distribution: Place replicas closer to users for lower latency
  • Disaster Recovery: Replicas can be promoted to primary in case of failure, as detailed in our zero-downtime case study.
  • Zero-Downtime Upgrades: Upgrade a replica, promote it, then upgrade the old primary

Read/Write Routing

Applications must route reads and writes correctly. Writes must go to the primary. Reads can go to replicas, but applications must tolerate stale data.

Application-Level Routing

Most application frameworks support read/write splitting:

Django (Python)

# settings.py
DATABASES = {
 'default': {
 'ENGINE': 'django.db.backends.postgresql',
 'NAME': 'mydb',
 'HOST': 'primary-db.example.com',
 'PORT': '5432',
 },
 'replica': {
 'ENGINE': 'django.db.backends.postgresql',
 'NAME': 'mydb',
 'HOST': 'replica-db.example.com',
 'PORT': '5432',
 }
}

DATABASE_ROUTERS = ['myapp.dbrouter.DatabaseRouter']

# dbrouter.py
class DatabaseRouter:
 def db_for_read(self, model, **hints):
 return 'replica'
 
 def db_for_write(self, model, **hints):
 return 'default'

Rails (Ruby)

# database.yml
production:
 primary:
 <<: *default
 host: primary-db.example.com
 replica:
 <<: *default
 host: replica-db.example.com
 replica: true

# Application code
class ApplicationRecord < ActiveRecord::Base
 connects_to database: { writing::primary, reading::replica }
end

# Usage
User.connected_to(role::reading) do
 User.all # Reads from replica
end

Checking Replica Freshness

Monitor replication lag before routing critical reads to replicas:

-- Function to check if replica is acceptably fresh
CREATE OR REPLACE FUNCTION is_replica_fresh(max_lag_bytes bigint DEFAULT 1048576)
RETURNS boolean AS $$
BEGIN
 RETURN (
 SELECT pg_wal_lsn_diff(
 pg_current_wal_lsn(),
 pg_last_wal_replay_lsn()
 ) < max_lag_bytes
 );
END;
$$ LANGUAGE plpgsql;

-- Use in application
SELECT is_replica_fresh(1048576); -- 1MB max lag
Tip: For most applications, 1-5 seconds of lag is acceptable for read replicas. For financial or critical systems, use synchronous replication or route reads to primary only.
Key Takeaways:
  • Read replicas excel at read scaling and reducing primary load
  • Applications must route writes to primary and reads to replicas
  • Applications must tolerate stale data when reading from replicas
  • Monitor replication lag before routing critical reads to replicas
  • Use application-level routing or load balancers for read/write splitting. For production load balancing strategies, see our HAProxy production guide.

Setup Guide: Self-Managed PostgreSQL

This section covers setting up PostgreSQL replication on self-managed infrastructure. We use PostgreSQL 15, but concepts apply to versions 12+.

Step 1: Configure the Primary Server

postgresql.conf Configuration

Edit /etc/postgresql/15/main/postgresql.conf (or your config location):

# WAL Settings - Critical for replication
wal_level = replica # Minimum for streaming replication
max_wal_senders = 10 # Number of concurrent replication connections
wal_keep_size = 1GB # How much WAL to keep (prevents replica lag issues)
max_replication_slots = 10 # Number of replication slots

# Archiving (optional but recommended)
archive_mode = on
archive_command = 'test! -f /var/lib/postgresql/archive/%f && cp %p /var/lib/postgresql/archive/%f'
archive_timeout = 300 # Force archive every 5 minutes

# Performance Settings
shared_buffers = 4GB # 25% of RAM for dedicated server
effective_cache_size = 12GB # 50-75% of RAM
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9 # Spread checkpoints over time
wal_buffers = 16MB

# Connection Settings
max_connections = 200
listen_addresses = '*' # Or specific IPs for security
Warning: After changing wal_level, you must restart PostgreSQL. This setting cannot be changed on a running server.

pg_hba.conf Configuration

Edit /etc/postgresql/15/main/pg_hba.conf to allow replication connections:

# TYPE DATABASE USER ADDRESS METHOD

# Local connections
local all postgres peer

# IPv4 local connections
host all all 127.0.0.1/32 scram-sha-256

# Replication connections from replicas
host replication replicator 10.0.0.0/8 scram-sha-256
host replication replicator 192.168.1.0/24 scram-sha-256

# Application connections
host all app_user 10.0.0.0/8 scram-sha-256

The replication keyword allows the connection to be used for streaming replication. The replicator user will be created next.

Step 2: Create Replication User

Create a dedicated user for replication (never use the superuser):

-- Connect to primary as postgres superuser
psql -U postgres

-- Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'your-strong-password-here';

-- Grant necessary permissions (if using logical replication later)
ALTER USER replicator WITH REPLICATION;

-- Verify
\du replicator
Tip: Store the replication password in a secrets manager (Vault, AWS Secrets Manager, etc.) and never commit it to version control.

Step 3: Create Replication Slot (Recommended)

Replication slots prevent the primary from deleting WAL files that replicas have not received yet. This is crucial if a replica goes down temporarily.

-- On primary, create a replication slot
SELECT pg_create_physical_replication_slot('replica1_slot');

-- Verify
SELECT * FROM pg_replication_slots;

Step 4: Bootstrap the Replica

Use pg_basebackup to create an initial copy of the primary database:

# On the replica server, stop PostgreSQL if running
sudo systemctl stop postgresql

# Remove existing data directory (if any)
sudo rm -rf /var/lib/postgresql/15/main/*

# Run pg_basebackup
sudo -u postgres pg_basebackup \
 -h primary-server-ip \
 -D /var/lib/postgresql/15/main \
 -U replicator \
 -v \
 -P \
 -W \
 -R \
 -S replica1_slot \
 --wal-method=stream

# Explanation of flags:
# -h: Primary server hostname/IP
# -D: Data directory on replica
# -U: Replication user
# -v: Verbose output
# -P: Show progress
# -W: Prompt for password
# -R: Create recovery.conf (PostgreSQL 12+) or postgresql.auto.conf entry
# -S: Replication slot name
# --wal-method=stream: Use streaming replication during backup

The -R flag automatically creates the necessary configuration for the replica to connect to the primary.

Step 5: Configure the Replica

After pg_basebackup, check the generated postgresql.auto.conf:

# This file is automatically generated by pg_basebackup
primary_conninfo = 'user=replicator password=your-password host=primary-server-ip port=5432 sslmode=prefer'
primary_slot_name = 'replica1_slot'

For PostgreSQL 12+, you also need to create standby.signal:

# On replica
sudo -u postgres touch /var/lib/postgresql/15/main/standby.signal

Add replica-specific settings to postgresql.conf:

# Replica is read-only
hot_standby = on
max_standby_streaming_delay = 30s
hot_standby_feedback = on # Helps prevent query conflicts

# Performance tuning for read workloads
shared_buffers = 4GB
effective_cache_size = 12GB
random_page_cost = 1.1 # Lower for SSD storage

Step 6: Start and Verify Replication

Start PostgreSQL on the replica:

sudo systemctl start postgresql
sudo systemctl status postgresql

Check replication status on the primary:

-- On primary, check replication status
SELECT 
 application_name,
 client_addr,
 state,
 sync_state,
 pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sending_lag_bytes,
 pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag_bytes,
 pg_wal_lsn_diff(write_lsn, flush_lsn) AS flush_lag_bytes,
 pg_wal_lsn_diff(flush_lsn, replay_lsn) AS replay_lag_bytes,
 pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS total_lag_bytes
FROM pg_stat_replication;

You should see output like:

application_name | client_addr | state | sync_state | total_lag_bytes ------------------+----------------+-----------+------------+---------------- replica1 | 192.168.1.101 | streaming | async | 1024

Check on the replica:

-- On replica
SELECT pg_is_in_recovery(); -- Should return 't' (true)

-- Check lag
SELECT 
 pg_last_wal_receive_lsn() AS received_lsn,
 pg_last_wal_replay_lsn() AS replayed_lsn,
 pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;

Manual Promotion (Emergency)

If you need to promote a replica immediately:

# On the replica to be promoted
sudo systemctl stop postgresql

# Promote to primary
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/15/main

# Or create the trigger file (PostgreSQL 12+)
sudo -u postgres touch /var/lib/postgresql/15/main/promote.signal

# Start PostgreSQL
sudo systemctl start postgresql

# Verify
psql -U postgres -c "SELECT pg_is_in_recovery();" -- Should return 'f' (false)

Reconfigure Other Replicas

After promotion, other replicas need to point to the new primary:

# On each remaining replica, update postgresql.auto.conf
primary_conninfo = 'user=replicator password=your-password host=new-primary-ip port=5432'

# Restart replica
sudo systemctl restart postgresql
Key Takeaways:
  • Configure WAL settings on primary before starting replication
  • Use replication slots to prevent WAL deletion
  • Use pg_basebackup to bootstrap replicas
  • Monitor replication status continuously
  • Manual promotion requires stopping the replica and promoting it

Security and Hardening

Securing your PostgreSQL replication setup is critical. Here are the essential security measures:

TLS Between Primary and Replicas

Encrypt replication traffic using SSL/TLS:

Generate Certificates

# Create CA
openssl req -new -x509 -days 3650 -nodes -out ca.crt -keyout ca.key -subj "/CN=postgres-ca"

# Create server certificate (primary)
openssl req -new -nodes -out server.csr -keyout server.key -subj "/CN=primary-db"
openssl x509 -req -in server.csr -days 365 -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt

# Create client certificate (replica)
openssl req -new -nodes -out client.csr -keyout client.key -subj "/CN=replica-db"
openssl x509 -req -in client.csr -days 365 -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt

Configure PostgreSQL for SSL

# postgresql.conf
ssl = on
ssl_cert_file = '/var/lib/postgresql/ssl/server.crt'
ssl_key_file = '/var/lib/postgresql/ssl/server.key'
ssl_ca_file = '/var/lib/postgresql/ssl/ca.crt'

# pg_hba.conf - require SSL for replication
hostssl replication replicator 10.0.0.0/8 cert clientcert=1

Secure pg_hba.conf Configuration

Follow the principle of least privilege:

# Restrictive pg_hba.conf example
# TYPE DATABASE USER ADDRESS METHOD

# Local connections (Unix socket)
local all postgres peer

# Replication - specific IPs only, require SSL
hostssl replication replicator 10.0.1.11/32 scram-sha-256
hostssl replication replicator 10.0.1.12/32 scram-sha-256

# Application connections - specific users and IPs
hostssl all app_user 10.0.2.0/24 scram-sha-256
hostssl all readonly_user 10.0.2.0/24 scram-sha-256

# Deny all other connections
host all all 0.0.0.0/0 reject

Replication Role Permissions

The replication role should have minimal permissions:

-- Create replication user with minimal privileges
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'strong-password';

-- Verify permissions
SELECT rolname, rolreplication, rolsuper 
FROM pg_roles 
WHERE rolname = 'replicator';

-- Should show:
-- rolname: replicator
-- rolreplication: true
-- rolsuper: false
Warning: The REPLICATION privilege allows bypassing row-level security policies. Only grant this to trusted replication users from trusted networks.

Network Security

Additional network-level security measures:

  • Firewall Rules: Only allow PostgreSQL port (5432) from replica IPs and application servers
  • VPN/Tunnels: Use VPN or encrypted tunnels for replication across untrusted networks
  • Private Networks: Deploy primary and replicas on private subnets with no public internet access
  • Security Groups (Cloud): Configure AWS Security Groups, GCP Firewall Rules, or Azure NSGs appropriately

Performance Tuning

Proper performance tuning ensures replication does not become a bottleneck and replicas can keep up with the primary's write load.

WAL Tuning

Optimize WAL settings for your workload:

# postgresql.conf - WAL Performance
wal_level = replica
wal_buffers = 16MB # Increase for high write loads
min_wal_size = 1GB # Prevent frequent WAL recycling
max_wal_size = 4GB # Allow WAL to grow before checkpoint
wal_compression = on # Compress WAL (PostgreSQL 9.5+)
wal_log_hints = on # Required for pg_rewind (failover recovery)

# Checkpoint tuning
checkpoint_timeout = 15min # Time between checkpoints
checkpoint_completion_target = 0.9 # Spread checkpoint I/O
max_wal_size = 4GB # Trigger checkpoint if WAL exceeds this

Replica-Side Optimizations

Replicas can be tuned differently since they're read-only:

# Replica postgresql.conf optimizations
hot_standby = on
hot_standby_feedback = on # Prevents query conflicts
max_standby_streaming_delay = 30s # Max delay before canceling queries
max_standby_archive_delay = 300s # For WAL shipping scenarios

# Read performance
random_page_cost = 1.1 # Lower for SSD (default 4.0 for HDD)
effective_io_concurrency = 200 # For SSD storage
shared_buffers = 4GB # Same as primary
work_mem = 64MB # Can be higher for read queries

Network Considerations

Replication performance depends heavily on network:

  • Bandwidth: Ensure sufficient bandwidth between primary and replicas (at least 2x your peak write rate)
  • Latency: Lower latency reduces replication lag. For synchronous replication, latency directly impacts write performance
  • Dedicated Networks: Use dedicated network links or VLANs for replication traffic
  • Compression: Enable wal_compression to reduce network traffic (trades CPU for bandwidth)
Tip: Calculate required bandwidth: (peak_write_rate_MBps × 2) for safety margin. For example, if you write 100 MB/s, plan for 200 MB/s replication bandwidth.
Key Takeaways:
  • Optimize WAL settings for your write workload
  • Tune replicas differently since they are read-only
  • Ensure sufficient network bandwidth between primary and replicas
  • Use dedicated network links for replication traffic when possible
  • Enable WAL compression to reduce network traffic

Decision Framework

Choose your PostgreSQL HA strategy based on your organization's stage, expertise, and requirements.

If You Are a Startup

Recommended Approach: Use managed cloud services (RDS, Cloud SQL, or Azure Database)

  • Focus engineering time on product development, not database operations
  • Managed services provide automatic failover, backups, and monitoring
  • Higher cost is acceptable for reduced operational burden
  • Vendor lock-in is acceptable at this stage
  • Start with async read replicas for read scaling
  • Upgrade to sync replication or Aurora if zero RPO becomes critical

When to Reconsider: When costs exceed $2000/month, you need specific PostgreSQL extensions, or you require sub-30 second RTO.

If You Are a Scale-Up

Recommended Approach: Evaluate managed services vs self-managed based on team expertise

  • If you have SRE expertise: Consider Patroni on Kubernetes or VMs
  • If database expertise is limited: Continue with managed services but plan migration
  • Implement comprehensive monitoring regardless of approach
  • Use async replication with multiple replicas for read scaling
  • Consider sync replication for critical data if RPO requirements demand it
  • Establish regular failover testing procedures
  • Document runbooks for common failure scenarios

Decision Factors: Team size, on-call capacity, cost sensitivity, control requirements, compliance needs.

If You Are Enterprise

Recommended Approach: Self-managed with Patroni or repmgr, or managed services with custom SLAs

  • Self-managed if you need specific PostgreSQL versions, extensions, or compliance
  • Managed services if you want to reduce operational overhead
  • Implement multi-region replication for disaster recovery
  • Use sync replication for zero RPO requirements
  • Deploy automated failover with Patroni or similar tools
  • Establish SLOs for RPO and RTO based on business requirements
  • Implement comprehensive monitoring with alerting and runbooks
  • Conduct regular disaster recovery drills
  • Maintain multiple replicas across availability zones or regions

Additional Considerations: Compliance requirements, data residency, cross-region replication, custom backup retention policies, integration with existing infrastructure.

Decision Matrix

RequirementManaged ServicePatronirepmgr
Operational OverheadLowHighMedium
CostHighLowLow
ControlLimitedFullFull
Failover Time60-120s (RDS)
10-30s (Aurora)
30-60s60-120s
Expertise RequiredLowHighMedium
Custom ExtensionsLimitedFull SupportFull Support
Key Takeaways:
  • Startups should use managed services to focus on product development
  • Scale-ups should evaluate based on team expertise and cost sensitivity
  • Enterprises should choose based on compliance, control, and operational capacity
  • Managed services reduce operational overhead but increase cost and reduce control
  • Self-managed solutions require significant expertise but provide full control

References and Official Documentation

This guide is based on official PostgreSQL documentation and production experience. For authoritative technical details, consult:

Frequently Asked Questions

Everything you need to know about PostgreSQL high availability and read replicas

No. Synchronous replication protects against data loss (zero RPO) but does not provide automatic failover. If the primary fails, you still need external tooling (Patroni, repmgr) or manual intervention to promote a replica. Synchronous replication also introduces write latency and can block writes if replicas fail, creating availability risks.

Read replicas depend on the primary for WAL streaming. When the primary crashes, replicas lose their WAL source and cannot apply new transactions. They continue serving stale data until either the primary recovers or a replica is promoted. Without automated failover, applications reading from replicas experience degraded service (stale data) or complete unavailability (if applications cannot tolerate stale reads).

Yes, but with trade-offs. Replication slots prevent the primary from removing WAL segments until replicas consume them, protecting against WAL loss. However, if a replica goes down and doesn't reconnect, WAL accumulates on the primary, potentially filling disk space. Monitor slot lag and implement alerts for disconnected slots.

Conclusion

PostgreSQL read replicas provide read scaling and disaster recovery, but they do not provide high availability by themselves. PostgreSQL has no built-in automatic failover mechanism. Achieving true HA requires external tooling like Patroni or repmgr, or managed cloud services.

The key distinctions:

  • Read replicas enable read scaling and reduce primary load
  • Replication provides disaster recovery, not automatic HA
  • HA requires automated failover, leader election, and split-brain prevention
  • RPO depends on replication mode (async vs sync)
  • RTO depends on failover automation (manual vs automated)

Choose your approach based on your organization's stage, expertise, and requirements. Startups should use managed services. Scale-ups should evaluate based on team capacity. Enterprises should choose based on compliance and control needs.

Regardless of approach, implement comprehensive monitoring, establish SLOs for RPO and RTO, and conduct regular failover testing. The investment in understanding these patterns pays dividends in system reliability.

Need Help with PostgreSQL High Availability?

Our database reliability engineers can help you design, implement, and maintain PostgreSQL HA setups. Get expert guidance on replication, failover, monitoring, and performance tuning.

View Case Studies