Backup Runbook

Backups only become real when you can restore them on demand. This runbook is a step-by-step checklist you can follow during normal operations (routine verification) and during incidents (disaster recovery).

Quick Summary

Daily: produce backups and record success/failure.
Weekly: run a restore drill into a staging directory.
Monthly: validate retention, permissions, and offsite copies.
Incident: restore using a written sequence, not memory.

Minimum standard

For a WordPress VPS, a workable minimum is:

Site files archive (WordPress root including wp-content/).
Database dump.
Offsite copy of both artifacts.
A restore drill that you repeat.

Daily run

Create backups (files and database).
Verify artifacts exist and are non-empty.
Log the run (timestamp, sizes, exit codes).
Sync offsite.

daily-backup-runbook-checks.sh
set -eu

ls -lh /backups | sed -n '1,60p'

# quick sanity checks
test -s /backups/wp-files-latest.tar.gz
test -s /backups/wp-db-latest.sql.gz

Weekly restore drill

Always restore into an empty staging directory.

weekly-restore-drill.sh
set -eu

sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test

# extract files into staging
sudo tar -xzf /backups/wp-files-latest.tar.gz -C /tmp/restore-test

# verify key WordPress paths exist
sudo find /tmp/restore-test -maxdepth 3 -type f -name wp-config.php -print
sudo find /tmp/restore-test -maxdepth 3 -type d -name wp-content -print

If you can restore files and locate critical paths, your backups are likely usable.

Monthly audit

Confirm retention policies are actually deleting old files.
Confirm offsite targets have the same number of artifacts.
Confirm permissions: backup directory is readable only by the backup user/root.
Review logs for failures and silent partial backups.

Incident restore (high-level)

warning

Restores can overwrite production data. If the server is still partially alive, snapshot first and restore into a staging directory to validate before you replace anything in place.

Stop web writes (maintenance mode or stop services).
Restore database.
Restore files.
Fix ownership/permissions.
Validate (HTTP + application-level checks).
Re-enable traffic.

Keep the exact commands in your environment-specific restore workflow.

Minimum standard​

Daily run​

Weekly restore drill​

Monthly audit​

Incident restore (high-level)​

Minimum standard

Daily run

Weekly restore drill

Monthly audit

Incident restore (high-level)