Skip to main content

Backup Runbook

Backups only become real when you can restore them on demand. This runbook is a step-by-step checklist you can follow during normal operations (routine verification) and during incidents (disaster recovery).

Quick Summary
  • Daily: produce backups and record success/failure.
  • Weekly: run a restore drill into a staging directory.
  • Monthly: validate retention, permissions, and offsite copies.
  • Incident: restore using a written sequence, not memory.

Minimum standard

For a WordPress VPS, a workable minimum is:

  • Site files archive (WordPress root including wp-content/).
  • Database dump.
  • Offsite copy of both artifacts.
  • A restore drill that you repeat.

Daily run

  1. Create backups (files and database).
  2. Verify artifacts exist and are non-empty.
  3. Log the run (timestamp, sizes, exit codes).
  4. Sync offsite.
daily-backup-runbook-checks.sh
set -eu

ls -lh /backups | sed -n '1,60p'

# quick sanity checks
test -s /backups/wp-files-latest.tar.gz
test -s /backups/wp-db-latest.sql.gz

Weekly restore drill

Always restore into an empty staging directory.

weekly-restore-drill.sh
set -eu

sudo rm -rf /tmp/restore-test
sudo mkdir -p /tmp/restore-test

# extract files into staging
sudo tar -xzf /backups/wp-files-latest.tar.gz -C /tmp/restore-test

# verify key WordPress paths exist
sudo find /tmp/restore-test -maxdepth 3 -type f -name wp-config.php -print
sudo find /tmp/restore-test -maxdepth 3 -type d -name wp-content -print

If you can restore files and locate critical paths, your backups are likely usable.

Monthly audit

  • Confirm retention policies are actually deleting old files.
  • Confirm offsite targets have the same number of artifacts.
  • Confirm permissions: backup directory is readable only by the backup user/root.
  • Review logs for failures and silent partial backups.

Incident restore (high-level)

warning

Restores can overwrite production data. If the server is still partially alive, snapshot first and restore into a staging directory to validate before you replace anything in place.

  1. Stop web writes (maintenance mode or stop services).
  2. Restore database.
  3. Restore files.
  4. Fix ownership/permissions.
  5. Validate (HTTP + application-level checks).
  6. Re-enable traffic.

Keep the exact commands in your environment-specific restore workflow.