Backup Runbook
Backups only become real when you can restore them on demand. This runbook is a step-by-step checklist you can follow during normal operations (routine verification) and during incidents (disaster recovery).
Quick Summary
- Daily: produce backups and record success/failure.
- Weekly: run a restore drill into a staging directory.
- Monthly: validate retention, permissions, and offsite copies.
- Incident: restore using a written sequence, not memory.
Minimum standard
For a WordPress VPS, a workable minimum is:
- Site files archive (WordPress root including
wp-content/). - Database dump.
- Offsite copy of both artifacts.
- A restore drill that you repeat.
Daily run
- Create backups (files and database).
- Verify artifacts exist and are non-empty.
- Log the run (timestamp, sizes, exit codes).
- Sync offsite.
daily-backup-runbook-checks.sh
set -eu
ls -lh /backups | sed -n '1,60p'
# quick sanity checks
test -s /backups/wp-files-latest.tar.gz
test -s /backups/wp-db-latest.sql.gz
Weekly restore drill
Always restore into an empty staging directory.
weekly-restore-drill.sh
set -eu
sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test
# extract files into staging
sudo tar -xzf /backups/wp-files-latest.tar.gz -C /tmp/restore-test
# verify key WordPress paths exist
sudo find /tmp/restore-test -maxdepth 3 -type f -name wp-config.php -print
sudo find /tmp/restore-test -maxdepth 3 -type d -name wp-content -print
If you can restore files and locate critical paths, your backups are likely usable.
Monthly audit
- Confirm retention policies are actually deleting old files.
- Confirm offsite targets have the same number of artifacts.
- Confirm permissions: backup directory is readable only by the backup user/root.
- Review logs for failures and silent partial backups.
Incident restore (high-level)
warning
Restores can overwrite production data. If the server is still partially alive, snapshot first and restore into a staging directory to validate before you replace anything in place.
- Stop web writes (maintenance mode or stop services).
- Restore database.
- Restore files.
- Fix ownership/permissions.
- Validate (HTTP + application-level checks).
- Re-enable traffic.
Keep the exact commands in your environment-specific restore workflow.