Skip to main content

RPO vs RTO

RPO and RTO turn backup talk into measurable targets. RPO is how much data you can lose. RTO is how long you can be down. On a WordPress VPS, these metrics determine backup frequency, offsite strategy, and how much automation you need.

Quick Summary
  • RPO (Recovery Point Objective): maximum acceptable data loss measured in time.
  • RTO (Recovery Time Objective): maximum acceptable downtime measured in time.
  • RPO drives backup frequency and replication.
  • RTO drives restore speed, local copies, and automation.

Definitions (simple and strict)

MetricQuestion it answersMeasured inControlled by
RPO"How much data can I lose?"minutes/hoursbackup interval, replication lag, offsite delay
RTO"How long can I be offline?"minutes/hoursrestore speed, automation, availability design

Examples:

  • RPO = 1 hour means you accept losing up to 1 hour of content/orders.
  • RTO = 2 hours means you must be back online within 2 hours.

WordPress components and typical targets

Different parts of WordPress have different "pain" when lost.

ComponentWhy it mattersTypical RPOTypical RTO
Databaseposts, users, orders, settingstightmedium-to-tight
Uploadsuser media and assetsmediummedium
Plugins/themescode drift from updatesloose-to-mediummedium
Secrets/configDB creds, salts, API keysloose (but critical)tight

Notes:

  • WooCommerce sites usually need tighter DB RPO than brochure sites.
  • Config/secrets do not change often, but losing them blocks restores.

How RPO maps to backup frequency

If your DB dump runs every 6 hours, your best-case RPO is roughly 6 hours.

Reality is slightly worse because of:

  • job runtime
  • upload lag (offsite copies)
  • failures you did not notice

Practical rule:

  • RPO target should be comfortably larger than your backup interval.

How RTO is built (what you actually spend time on)

RTO is not just "extract and import". It is the sum of:

  • detection time (someone notices)
  • decision time (choose restore point)
  • provisioning time (new VPS, packages)
  • data transfer time (download from offsite)
  • restore time (import DB, extract files)
  • validation time (health checks)
  • cutover time (DNS/LB)

If you want a low RTO, you must reduce these components.

Measure your real restore time (a restore drill)

Run timed drills. Your theoretical RTO is not your actual RTO.

Time the download

time-download-from-offsite.sh
time rclone copy remote:wp-backups/site-a /tmp/restore-input --include 'wp-files-2026-03-01.tar.zst'

Time the file restore

time-extract-files-archive.sh
sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test

time sudo tar --use-compress-program=zstd -xf /tmp/restore-input/wp-files-2026-03-01.tar.zst -C /tmp/restore-test

Time the DB restore

time-restore-db-dump.sh
time zstd -dc /tmp/restore-input/wp-db-2026-03-01.sql.zst | mysql wordpress_restore

Capture these timings in a log and use them when you set RTO.

Target tiers (a practical way to think)

These tiers are examples to calibrate expectations.

TierExample RPOExample RTOTypical approach
Basic24h12hdaily backups, manual restore
Standard4-6h2-4hfrequent DB dumps + local copies + tested runbooks
High availabilityminutesminutesreplication + automated failover + continuous monitoring

How backup design affects RPO and RTO

Local + offsite

Local copies reduce restore time (RTO) because you do not need to download.

Offsite copies reduce the chance of total loss (helps meet RPO in a disaster where local is gone).

See:

  • opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/local-vs-remote-backup.mdx
  • opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/offsite--local-redundancy.mdx

Encryption

Encryption protects data, but increases restore steps:

  • decrypt
  • then decompress
  • then restore

This can slightly increase RTO. Measure it.

time-decrypt-and-restore-db.sh
time gpg --decrypt /backups/wp-db-2026-03-01.sql.zst.gpg | zstd -dc | mysql wordpress_restore

Retention

Retention interacts with RPO:

  • if you keep only 7 days of daily dumps, you cannot restore to "last month"
  • if pruning deletes baselines, incremental restores may fail

See:

  • opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/rotation--retention-policies.mdx

Example: two-VPS redundant architecture (impact on RPO/RTO)

This is a common design when you want near-continuous service.

Assumptions:

  • VPS-A is production.
  • VPS-B is standby.
  • files are synced (rsync) on a schedule.
  • database changes are replicated.
  • a load balancer can switch traffic.
two-vps-architecture.txt
VPS-A (primary)
- WordPress files
- MySQL primary

VPS-B (standby)
- rsync copies of files
- MySQL replica

Traffic
- load balancer health checks
- failover to VPS-B

Expected RPO and RTO

MetricWhat drives itExample outcome
RPOreplication lag + rsync intervalminutes (if replication is healthy)
RTOhealth check + failover + warm servicesminutes (if automation is correct)

Benefits

  • Very low downtime during a single-node failure.
  • Smaller restore operations (often failover instead of full restore).
  • Can be positioned as a higher tier offering to clients.

Limitations and risks

  • More moving parts: replication health, file sync health, LB health checks.
  • Split-brain risk during failback if you are not disciplined.
  • Higher costs (two servers, monitoring, paid LB features).
  • Higher security surface area (two hosts to harden).

Setup difficulty (engineering estimate)

This is intentionally conservative:

ComponentDifficultyWhy
rsync file syncmediumexcludes, permissions, delete semantics, verification
DB replicationhighbinlogs, lag monitoring, promotion/failback
automated failovermedium-to-highhealth checks, TLS, caching, cutover behavior
operationshighmonitoring, drills, incident handling

This design improves RPO/RTO, but it is closer to availability engineering than "backups".

Client communication (how to describe targets)

Keep language concrete:

  • "We take DB dumps every 6 hours" (backup frequency)
  • "We test restores monthly" (validation)
  • "We can restore from offsite within ~X hours" (measured RTO)

Avoid promising numbers you have not measured.

An RPO/RTO worksheet

Use this worksheet to define targets and constraints.

Inputs

InputExampleNotes
Files archive size20 GBaffects download/extract time
DB dump size (compressed)800 MBaffects download/import time
Offsite bandwidth80 Mbpsaffects download time
Restore operatoron-callaffects detection/decision
DNS/LB cutoverLBaffects traffic switch

Measured times (fill in from drills)

StepYour measured time
download files archive
extract files
download DB dump
import DB
validation and smoke tests

Targets

MetricTarget
RPO
RTO

If your measured times exceed your target RTO, you must change the design (local copy, faster format, automation, or HA).

A restore drill script you can reuse

This creates a repeatable baseline for measuring RTO components.

restore-drill-measurement-script.sh
#!/usr/bin/env bash
set -euo pipefail

FILES_ARCHIVE="/backups/wp-files-2026-03-01.tar.zst"
DB_DUMP="/backups/wp-db-2026-03-01.sql.zst"

echo "[$(date -Is)] start restore drill"

echo "[$(date -Is)] extract files"
sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test
time sudo tar --use-compress-program=zstd -xf "$FILES_ARCHIVE" -C /tmp/restore-test

echo "[$(date -Is)] restore db"
time zstd -dc "$DB_DUMP" | mysql wordpress_restore

echo "[$(date -Is)] verify layout"
sudo find /tmp/restore-test -maxdepth 3 -type d -name wp-content -print
sudo find /tmp/restore-test -maxdepth 3 -type f -name wp-config.php -print

echo "[$(date -Is)] done"
warning

This imports into wordpress_restore. Do not point restore drills at production databases.

Common mistakes

  • Setting RPO/RTO without measuring restores.
  • Ignoring offsite copy lag.
  • Storing backups offsite without encryption.
  • Assuming "two servers" automatically means low RTO (failover must be tested).

RPO math (practical examples)

RPO is primarily limited by how often you capture data and how reliably you move it offsite.

Example: DB dump cadence

DB dump intervalBest-case RPORealistic RPO (with failures)
every 24h~24h24h+ (if one dump fails)
every 6h~6h6h+
every 1h~1h1h+

If your dumps run every 6 hours but your offsite upload runs once per day, disaster RPO can still be close to 24 hours (because the VPS may die before the offsite copy completes).

Separate cadences for DB and files

This is common on WordPress:

  • database dumps: frequent
  • file snapshots: less frequent

Example:

example-separate-cadence.txt
DB dumps: every 2 hours
File snapshots: daily
Full baseline archive: weekly
Offsite upload: after every backup run

This keeps DB RPO tighter than file RPO, which is often acceptable.


How to reduce RTO (without changing business scope)

If your measured RTO is too high, you usually need to reduce one of these components:

Provisioning time

  • Keep an infrastructure checklist (packages, versions, configs).
  • Use scripts or automation to install your stack.
  • Keep your web server and PHP-FPM config under version control (not secrets).

Data transfer time

  • Keep at least one local copy (fast restores).
  • Use faster formats (zstd) for operational backups.
  • If offsite is required, ensure you have enough bandwidth and keep artifacts reasonably sized.

Restore execution time

  • Practice restores so your steps are deterministic.
  • Keep artifacts in predictable locations.
  • Avoid manual decision making under pressure.

High availability is not a backup

HA reduces downtime, but it does not automatically protect you from:

  • accidental deletion (deletion replicates)
  • corruption (corruption replicates)
  • malware (malware replicates)

Use HA to improve RTO, and backups to improve both RPO and recoverability.

ha-vs-backup.txt
Backups: restore to a previous point in time.
HA: keep service available across node failures.
You usually need both for critical sites.

Two-VPS design: operational runbook notes

If you implement primary/standby, document failover and failback.

Signals you need to monitor

  • file sync freshness (last rsync success time)
  • replication health (replica running, lag)
  • LB health checks
  • backup job success (still needed)

Example checks:

check-file-sync-freshness.sh
ssh backup@backup-host 'ls -lah /srv/wp-backups/site-a | sed -n "1,40p"'

Replication health checks vary by MySQL/MariaDB version, but you must track:

  • replica IO thread state
  • replica SQL thread state
  • seconds behind source (lag)
mysql-replication-status-check.sh
mysql -e "SHOW REPLICA STATUS\\G" | sed -n '1,120p'

Failover checklist

  1. Confirm primary is unhealthy (not just slow).
  2. Confirm standby has recent files and an acceptable replication lag.
  3. Promote standby (DB + app) using your documented steps.
  4. Switch traffic.
  5. Announce and log the incident.

Failback checklist

Failback is where split-brain happens if you are not careful.

  1. Decide which node is authoritative.
  2. Freeze writes on the old primary.
  3. Re-sync files in the correct direction.
  4. Rebuild DB replication in the correct direction.
  5. Only then allow the old primary to serve traffic.
warning

If both nodes accept writes at the same time, you can create diverging databases and lose data. Document your promotion/failback steps and test them.


Point-in-time recovery (advanced)

If you need a very tight RPO for the database, full dumps may not be enough.

Options include:

  • MySQL/MariaDB replication (standby)
  • binary logs (replay changes between dumps)

This is a deeper operational topic, but the high-level model is:

pitr-model.txt
Full dump at T0
Binary logs capture changes after T0
Restore: import dump + replay binlogs up to target time

PITR improves RPO but increases complexity and requires careful testing.


Pricing and client value (how to think, not a quote)

If you sell managed recovery targets to clients, price the operational reality:

  • number of sites
  • data size and change rate
  • retention period
  • offsite storage costs
  • frequency of restore drills
  • on-call expectations

It is reasonable to offer tiers aligned to RPO/RTO targets. The important part is that the targets are measurable and tested.


Post-restore smoke checks (reduce surprise downtime)

After you restore, validate the things that commonly break:

  • HTTP returns expected status codes
  • database connectivity works
  • wp-admin login works
  • uploads load
  • cron is running (or intentionally disabled)
post-restore-smoke-checks.sh
curl -fsS -o /dev/null -w 'status=%{http_code} time_total=%{time_total}\n' http://127.0.0.1/
curl -fsS -o /dev/null -w 'status=%{http_code} time_total=%{time_total}\n' http://127.0.0.1/wp-login.php

mysql -e "SELECT 1" >/dev/null
Reference tier examples (inputs -> targets)
Site profileSuggested RPOSuggested RTONotes
Personal blog24h12hlow change rate
Small business6h4hbasic operations
WooCommerce1h1-2htransactions matter
Membership/news1h1hfrequent updates

Treat these as starting points. Measure restores and adjust.

Next steps

  • Backup types and schedules: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/full-vs-incremental-vs-differential.mdx.
  • 3-2-1 design: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/321-backup-strategy-for-wp.mdx.
  • Disaster recovery workflow: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/disaster-recovery-workflow.mdx.