Case Study
This case study demonstrates a realistic "site is slow" incident on a WordPress VPS and how to approach it with a repeatable workflow.
Quick Summary
- Capture a snapshot first.
- Identify whether you are CPU bound, memory bound, disk I/O bound, or space bound.
- Fix the cause, then confirm improvement.
Scenario
- Users report the site hangs intermittently.
- SSH is responsive but pages take 10-30 seconds.
- The server runs Nginx, PHP-FPM, and MySQL.
Step 1: take a snapshot
case-study-snapshot.sh
date -Is
uptime
free -h
df -hT /
ps -eo pid,user,pcpu,pmem,etime,cmd --sort=-pcpu | head -n 15
vmstat 1 5
Interpretation example:
- Load is high.
- CPU utilization is not extremely high.
vmstatshows highwa(I/O wait).
This points to storage pressure.
Step 2: confirm disk saturation
case-study-iostat.sh
iostat -x 1 10
Interpretation example:
- Device
utilis near 100%. awaitis high.
Step 3: find the process doing I/O
case-study-iotop.sh
sudo iotop -o -P -a
Interpretation example:
- A backup job (
tar+ compression) is writing heavily. - It started at the same time complaints began.
Step 4: reduce impact safely
Instead of killing immediately, reduce priority:
case-study-reduce-backup-impact.sh
PID=12345
sudo renice +10 -p "$PID"
sudo ionice -c2 -n7 -p "$PID"
If the box recovers quickly, you have validated the primary culprit.
Step 5: fix the scheduling
Typical fixes:
- Move backups off-peak.
- Reduce compression level.
- Exclude caches and nested archives.
- Use a faster format (
tar.zst) for operational backups.
Example: change a backup command to be less disruptive:
case-study-less-disruptive-backup.sh
nice -n 10 ionice -c2 -n7 tar -czf "/backups/site-$(date +%F).tar.gz" /var/www/html
Step 6: document and verify
Re-run the original snapshot and compare:
case-study-verify-improvement.sh
uptime
vmstat 1 5
If wa drops and page load improves, the incident is resolved.
What this case teaches
- High load does not always mean "CPU is the problem".
- Backups and compression are common I/O culprits.
- Deprioritizing work can be safer than killing it.
- Capturing snapshots before and after makes fixes defensible.