RPO vs RTO

RPO and RTO turn backup talk into measurable targets. RPO is how much data you can lose. RTO is how long you can be down. On a WordPress VPS, these metrics determine backup frequency, offsite strategy, and how much automation you need.

Quick Summary

RPO (Recovery Point Objective): maximum acceptable data loss measured in time.
RTO (Recovery Time Objective): maximum acceptable downtime measured in time.
RPO drives backup frequency and replication.
RTO drives restore speed, local copies, and automation.

Definitions (simple and strict)

Metric	Question it answers	Measured in	Controlled by
RPO	"How much data can I lose?"	minutes/hours	backup interval, replication lag, offsite delay
RTO	"How long can I be offline?"	minutes/hours	restore speed, automation, availability design

Examples:

RPO = 1 hour means you accept losing up to 1 hour of content/orders.
RTO = 2 hours means you must be back online within 2 hours.

WordPress components and typical targets

Different parts of WordPress have different "pain" when lost.

Component	Why it matters	Typical RPO	Typical RTO
Database	posts, users, orders, settings	tight	medium-to-tight
Uploads	user media and assets	medium	medium
Plugins/themes	code drift from updates	loose-to-medium	medium
Secrets/config	DB creds, salts, API keys	loose (but critical)	tight

Notes:

WooCommerce sites usually need tighter DB RPO than brochure sites.
Config/secrets do not change often, but losing them blocks restores.

How RPO maps to backup frequency

If your DB dump runs every 6 hours, your best-case RPO is roughly 6 hours.

Reality is slightly worse because of:

job runtime
upload lag (offsite copies)
failures you did not notice

Practical rule:

RPO target should be comfortably larger than your backup interval.

How RTO is built (what you actually spend time on)

RTO is not just "extract and import". It is the sum of:

detection time (someone notices)
decision time (choose restore point)
provisioning time (new VPS, packages)
data transfer time (download from offsite)
restore time (import DB, extract files)
validation time (health checks)
cutover time (DNS/LB)

If you want a low RTO, you must reduce these components.

Measure your real restore time (a restore drill)

Run timed drills. Your theoretical RTO is not your actual RTO.

Time the download

time-download-from-offsite.sh
time rclone copy remote:wp-backups/site-a /tmp/restore-input --include 'wp-files-2026-03-01.tar.zst'

Time the file restore

time-extract-files-archive.sh
sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test

time sudo tar --use-compress-program=zstd -xf /tmp/restore-input/wp-files-2026-03-01.tar.zst -C /tmp/restore-test

Time the DB restore

time-restore-db-dump.sh
time zstd -dc /tmp/restore-input/wp-db-2026-03-01.sql.zst | mysql wordpress_restore

Capture these timings in a log and use them when you set RTO.

Target tiers (a practical way to think)

These tiers are examples to calibrate expectations.

Tier	Example RPO	Example RTO	Typical approach
Basic	24h	12h	daily backups, manual restore
Standard	4-6h	2-4h	frequent DB dumps + local copies + tested runbooks
High availability	minutes	minutes	replication + automated failover + continuous monitoring

How backup design affects RPO and RTO

Local + offsite

Local copies reduce restore time (RTO) because you do not need to download.

Offsite copies reduce the chance of total loss (helps meet RPO in a disaster where local is gone).

See:

opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/local-vs-remote-backup.mdx
opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/offsite--local-redundancy.mdx

Encryption

Encryption protects data, but increases restore steps:

decrypt
then decompress
then restore

This can slightly increase RTO. Measure it.

time-decrypt-and-restore-db.sh
time gpg --decrypt /backups/wp-db-2026-03-01.sql.zst.gpg | zstd -dc | mysql wordpress_restore

Retention

Retention interacts with RPO:

if you keep only 7 days of daily dumps, you cannot restore to "last month"
if pruning deletes baselines, incremental restores may fail

See:

opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/rotation--retention-policies.mdx

Example: two-VPS redundant architecture (impact on RPO/RTO)

This is a common design when you want near-continuous service.

Assumptions:

VPS-A is production.
VPS-B is standby.
files are synced (rsync) on a schedule.
database changes are replicated.
a load balancer can switch traffic.

two-vps-architecture.txt
VPS-A (primary)
  - WordPress files
  - MySQL primary

VPS-B (standby)
  - rsync copies of files
  - MySQL replica

Traffic
  - load balancer health checks
  - failover to VPS-B

Expected RPO and RTO

Metric	What drives it	Example outcome
RPO	replication lag + rsync interval	minutes (if replication is healthy)
RTO	health check + failover + warm services	minutes (if automation is correct)

Benefits

Very low downtime during a single-node failure.
Smaller restore operations (often failover instead of full restore).
Can be positioned as a higher tier offering to clients.

Limitations and risks

More moving parts: replication health, file sync health, LB health checks.
Split-brain risk during failback if you are not disciplined.
Higher costs (two servers, monitoring, paid LB features).
Higher security surface area (two hosts to harden).

Setup difficulty (engineering estimate)

This is intentionally conservative:

Component	Difficulty	Why
rsync file sync	medium	excludes, permissions, delete semantics, verification
DB replication	high	binlogs, lag monitoring, promotion/failback
automated failover	medium-to-high	health checks, TLS, caching, cutover behavior
operations	high	monitoring, drills, incident handling

This design improves RPO/RTO, but it is closer to availability engineering than "backups".

Client communication (how to describe targets)

Keep language concrete:

"We take DB dumps every 6 hours" (backup frequency)
"We test restores monthly" (validation)
"We can restore from offsite within ~X hours" (measured RTO)

Avoid promising numbers you have not measured.

An RPO/RTO worksheet

Use this worksheet to define targets and constraints.

Inputs

Input	Example	Notes
Files archive size	20 GB	affects download/extract time
DB dump size (compressed)	800 MB	affects download/import time
Offsite bandwidth	80 Mbps	affects download time
Restore operator	on-call	affects detection/decision
DNS/LB cutover	LB	affects traffic switch

Measured times (fill in from drills)

Step	Your measured time
download files archive
extract files
download DB dump
import DB
validation and smoke tests

Targets

Metric	Target
RPO
RTO

If your measured times exceed your target RTO, you must change the design (local copy, faster format, automation, or HA).

A restore drill script you can reuse

This creates a repeatable baseline for measuring RTO components.

restore-drill-measurement-script.sh
#!/usr/bin/env bash
set -euo pipefail

FILES_ARCHIVE="/backups/wp-files-2026-03-01.tar.zst"
DB_DUMP="/backups/wp-db-2026-03-01.sql.zst"

echo "[$(date -Is)] start restore drill"

echo "[$(date -Is)] extract files"
sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test
time sudo tar --use-compress-program=zstd -xf "$FILES_ARCHIVE" -C /tmp/restore-test

echo "[$(date -Is)] restore db"
time zstd -dc "$DB_DUMP" | mysql wordpress_restore

echo "[$(date -Is)] verify layout"
sudo find /tmp/restore-test -maxdepth 3 -type d -name wp-content -print
sudo find /tmp/restore-test -maxdepth 3 -type f -name wp-config.php -print

echo "[$(date -Is)] done"

warning

This imports into wordpress_restore. Do not point restore drills at production databases.

Common mistakes

Setting RPO/RTO without measuring restores.
Ignoring offsite copy lag.
Storing backups offsite without encryption.
Assuming "two servers" automatically means low RTO (failover must be tested).

RPO math (practical examples)

RPO is primarily limited by how often you capture data and how reliably you move it offsite.

Example: DB dump cadence

DB dump interval	Best-case RPO	Realistic RPO (with failures)
every 24h	~24h	24h+ (if one dump fails)
every 6h	~6h	6h+
every 1h	~1h	1h+

If your dumps run every 6 hours but your offsite upload runs once per day, disaster RPO can still be close to 24 hours (because the VPS may die before the offsite copy completes).

Separate cadences for DB and files

This is common on WordPress:

database dumps: frequent
file snapshots: less frequent

Example:

example-separate-cadence.txt
DB dumps: every 2 hours
File snapshots: daily
Full baseline archive: weekly
Offsite upload: after every backup run

This keeps DB RPO tighter than file RPO, which is often acceptable.

How to reduce RTO (without changing business scope)

If your measured RTO is too high, you usually need to reduce one of these components:

Provisioning time

Keep an infrastructure checklist (packages, versions, configs).
Use scripts or automation to install your stack.
Keep your web server and PHP-FPM config under version control (not secrets).

Data transfer time

Keep at least one local copy (fast restores).
Use faster formats (zstd) for operational backups.
If offsite is required, ensure you have enough bandwidth and keep artifacts reasonably sized.

Restore execution time

Practice restores so your steps are deterministic.
Keep artifacts in predictable locations.
Avoid manual decision making under pressure.

High availability is not a backup

HA reduces downtime, but it does not automatically protect you from:

accidental deletion (deletion replicates)
corruption (corruption replicates)
malware (malware replicates)

Use HA to improve RTO, and backups to improve both RPO and recoverability.

ha-vs-backup.txt
Backups: restore to a previous point in time.
HA: keep service available across node failures.
You usually need both for critical sites.

Two-VPS design: operational runbook notes

If you implement primary/standby, document failover and failback.

Signals you need to monitor

file sync freshness (last rsync success time)
replication health (replica running, lag)
LB health checks
backup job success (still needed)

Example checks:

check-file-sync-freshness.sh
ssh backup@backup-host 'ls -lah /srv/wp-backups/site-a | sed -n "1,40p"'

Replication health checks vary by MySQL/MariaDB version, but you must track:

replica IO thread state
replica SQL thread state
seconds behind source (lag)

mysql-replication-status-check.sh
mysql -e "SHOW REPLICA STATUS\\G" | sed -n '1,120p'

Failover checklist

Confirm primary is unhealthy (not just slow).
Confirm standby has recent files and an acceptable replication lag.
Promote standby (DB + app) using your documented steps.
Switch traffic.
Announce and log the incident.

Failback checklist

Failback is where split-brain happens if you are not careful.

Decide which node is authoritative.
Freeze writes on the old primary.
Re-sync files in the correct direction.
Rebuild DB replication in the correct direction.
Only then allow the old primary to serve traffic.

warning

If both nodes accept writes at the same time, you can create diverging databases and lose data. Document your promotion/failback steps and test them.

Point-in-time recovery (advanced)

If you need a very tight RPO for the database, full dumps may not be enough.

Options include:

MySQL/MariaDB replication (standby)
binary logs (replay changes between dumps)

This is a deeper operational topic, but the high-level model is:

pitr-model.txt
Full dump at T0
Binary logs capture changes after T0
Restore: import dump + replay binlogs up to target time

PITR improves RPO but increases complexity and requires careful testing.

Pricing and client value (how to think, not a quote)

If you sell managed recovery targets to clients, price the operational reality:

number of sites
data size and change rate
retention period
offsite storage costs
frequency of restore drills
on-call expectations

It is reasonable to offer tiers aligned to RPO/RTO targets. The important part is that the targets are measurable and tested.

Post-restore smoke checks (reduce surprise downtime)

After you restore, validate the things that commonly break:

HTTP returns expected status codes
database connectivity works
wp-admin login works
uploads load
cron is running (or intentionally disabled)

post-restore-smoke-checks.sh
curl -fsS -o /dev/null -w 'status=%{http_code} time_total=%{time_total}\n' http://127.0.0.1/
curl -fsS -o /dev/null -w 'status=%{http_code} time_total=%{time_total}\n' http://127.0.0.1/wp-login.php

mysql -e "SELECT 1" >/dev/null

Reference tier examples (inputs -> targets)

Site profile	Suggested RPO	Suggested RTO	Notes
Personal blog	24h	12h	low change rate
Small business	6h	4h	basic operations
WooCommerce	1h	1-2h	transactions matter
Membership/news	1h	1h	frequent updates

Treat these as starting points. Measure restores and adjust.

Next steps

Backup types and schedules: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/full-vs-incremental-vs-differential.mdx.
3-2-1 design: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/321-backup-strategy-for-wp.mdx.
Disaster recovery workflow: opt/docker-data/apps/docusaurus/site/docs/server/linux-server/10-backup-disaster-recovery/disaster-recovery-workflow.mdx.

Definitions (simple and strict)​

WordPress components and typical targets​

How RPO maps to backup frequency​

How RTO is built (what you actually spend time on)​

Measure your real restore time (a restore drill)​

Time the download​

Time the file restore​

Time the DB restore​

Target tiers (a practical way to think)​

How backup design affects RPO and RTO​

Local + offsite​

Encryption​

Retention​

Example: two-VPS redundant architecture (impact on RPO/RTO)​

Expected RPO and RTO​

Benefits​

Limitations and risks​

Setup difficulty (engineering estimate)​

Client communication (how to describe targets)​

An RPO/RTO worksheet​

Inputs​

Measured times (fill in from drills)​

Targets​

A restore drill script you can reuse​

Common mistakes​

RPO math (practical examples)​

Example: DB dump cadence​

Separate cadences for DB and files​

How to reduce RTO (without changing business scope)​

Provisioning time​

Data transfer time​

Restore execution time​

High availability is not a backup​

Two-VPS design: operational runbook notes​

Signals you need to monitor​

Failover checklist​

Failback checklist​

Point-in-time recovery (advanced)​

Pricing and client value (how to think, not a quote)​

Post-restore smoke checks (reduce surprise downtime)​

Next steps​