Production Patterns
Learning Focus
By the end of this lesson you will be able to deploy timer units with security sandboxing, overlap prevention, missed-run catch-up, fleet jitter, structured logging, error handling, and monitoring integration.
Hardened Production Template
Use this as a baseline for any production timer job:
/etc/systemd/system/safe-timer-job.timer
[Unit]
Description=Safe timer-driven job schedule
[Timer]
OnCalendar=02:15
Persistent=true
RandomizedDelaySec=5m
FixedRandomDelay=true
AccuracySec=1m
[Install]
WantedBy=timers.target
/etc/systemd/system/safe-timer-job.service
[Unit]
Description=Safe timer-driven job
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
User=www-data
Group=www-data
WorkingDirectory=/var/www/html
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/bin/flock -n /var/lock/myjob.lock /usr/local/bin/myjob.sh
RuntimeMaxSec=1h
StandardOutput=append:/var/log/myjob.log
StandardError=append:/var/log/myjob.log
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups
Overlap Prevention with flock
The Problem
If a job takes longer than the timer interval, systemd will try to start a second instance. For Type=oneshot, systemd queues the request — but the safer pattern is explicit file locking.
The Solution: flock
flock-in-service.service
[Service]
Type=oneshot
ExecStart=/usr/bin/flock -n /var/lock/backup.lock /usr/local/bin/backup.sh
| Flag | Behavior |
|---|---|
-n | Non-blocking — exit immediately if lock is held |
-w 30 | Wait up to 30 seconds for lock |
-x | Exclusive lock (default) |
-s | Shared lock (read lock) |
How It Works
When To Use flock
| Scenario | Use flock? | Why |
|---|---|---|
| Nightly backup (30 min job, 24h interval) | Yes | Safety net if backup is slow |
| Health check (2s job, 60s interval) | No | Job is always fast |
| Database export (variable duration) | Yes | Could exceed interval on large DBs |
| CDN sync (variable duration) | Yes | Network delays are unpredictable |
Combining with RuntimeMaxSec
flock-plus-runtimemax.service
[Service]
Type=oneshot
ExecStart=/usr/bin/flock -n /var/lock/backup.lock /usr/local/bin/backup.sh
RuntimeMaxSec=2h # Kill if it runs longer than 2 hours
RuntimeMaxSec is the kill switch. flock is the overlap preventer. Use both together.
Persistent Catch-Up
How It Works
[Timer]
OnCalendar=02:15
Persistent=true
- systemd stores the last trigger time on disk.
- After boot or timer restart, systemd checks: "Did any runs fire between the stored time and now?"
- If yes, systemd runs the service once at the next timer evaluation.
- The catch-up run still honors
RandomizedDelaySec=.
When To Use
| Job Type | Use Persistent? | Why |
|---|---|---|
| Nightly backup | Yes | Must not be silently skipped |
| Database export | Yes | Data integrity depends on regular exports |
| Health check | No | A missed check is fine; the next one will fire |
| Cache warming | No | Stale cache is temporary |
| Log rotation | Yes | Logs must be rotated regularly |
| Retention purge | Yes | Must enforce storage limits |
State Management
persistent-state-commands.sh
# Check when the timer last fired
systemctl show wp-backup.timer -p LastTriggerUSec
# Reset stored state (force fresh start)
sudo systemctl clean --what=state wp-backup.timer
# Restart the timer to pick up the reset
sudo systemctl restart wp-backup.timer
Fleet Jitter
The Thundering Herd Problem
10 servers, all scheduled at 02:15:
- Without jitter: all 10 hit the backup target at 02:15:00.
- With jitter: spread across 02:15:00–02:20:00.
The Solution
jitter-config.timer
[Timer]
OnCalendar=02:15
RandomizedDelaySec=5m # Add 0–5 minutes random delay
FixedRandomDelay=true # Same offset each day (stable per unit)
AccuracySec=1m # Coalescing window
| Directive | Purpose |
|---|---|
RandomizedDelaySec=5m | Each run gets 0–5 minutes of random delay |
FixedRandomDelay=true | The delay is calculated from the unit name — stable across runs and reboots |
AccuracySec=1m | systemd may coalesce with other timers within this window |
Example: 5 Servers
| Server | Base Schedule | FixedRandomDelay Offset | Actual Fire Time |
|---|---|---|---|
| server-01 | 02:15 | +47s | 02:15:47 |
| server-02 | 02:15 | +2m12s | 02:17:12 |
| server-03 | 02:15 | +3m55s | 02:18:55 |
| server-04 | 02:15 | +1m30s | 02:16:30 |
| server-05 | 02:15 | +4m08s | 02:19:08 |
Security Hardening
Graduated Hardening Levels
Level 1 — Basic (Every Production Service)
[Service]
NoNewPrivileges=true
PrivateTmp=true
Level 2 — Standard (Recommended)
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups
Level 3 — Strict (Security-Sensitive Workloads)
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
Security Audit
security-audit.sh
systemd-analyze security safe-timer-job.service
Aim for a score below 3.0:
example-output.txt
→ Overall exposure level for safe-timer-job.service: 2.1 OK
Structured Logging
Log to Both journald and File
logging-service.service
[Service]
StandardOutput=append:/var/log/myjob.log
StandardError=append:/var/log/myjob.log
Structured Log Format in Scripts
/usr/local/bin/backup-with-logging.sh
#!/usr/bin/env bash
set -euo pipefail
LOG_PREFIX="[$(date -Is)] [backup]"
log_info() { echo "$LOG_PREFIX [INFO] $*"; }
log_error() { echo "$LOG_PREFIX [ERROR] $*" >&2; }
log_warn() { echo "$LOG_PREFIX [WARN] $*"; }
log_info "Backup started"
if /usr/local/bin/wp db export /mnt/backups/db.sql --path=/var/www/html; then
log_info "Database exported successfully"
else
log_error "Database export failed"
exit 1
fi
log_info "Backup complete"
Log Rotation
/etc/logrotate.d/timer-jobs
/var/log/wp-backup.log
/var/log/wp-db-backup.log
/var/log/health-check.log
/var/log/media-sync.log
{
daily
rotate 14
compress
missingok
notifempty
create 0640 www-data www-data
}
Error Handling
OnFailure Notification
myjob.service
[Unit]
Description=My scheduled job
OnFailure=alert-failure@%n.service
/etc/systemd/system/alert-failure@.service
[Unit]
Description=Send alert for %i failure
[Service]
Type=oneshot
ExecStart=/usr/local/bin/alert-failure.sh %i
/usr/local/bin/alert-failure.sh
#!/usr/bin/env bash
UNIT="$1"
MSG="[$(date -Is)] ALERT: $UNIT failed on $(hostname)"
echo "$MSG"
# Send to Slack, PagerDuty, etc.
# curl -s -X POST "https://hooks.slack.com/..." -d "{\"text\": \"$MSG\"}"
Script-Level Error Handling
robust-script.sh
#!/usr/bin/env bash
set -euo pipefail
cleanup() {
if [ $? -ne 0 ]; then
echo "[$(date -Is)] [ERROR] Script failed at line $LINENO" >&2
fi
}
trap cleanup EXIT
# Your logic here...
Environment Configuration
Environment Files
Keep environment-specific settings outside the unit file:
myjob.service
[Service]
EnvironmentFile=/etc/default/myjob
ExecStart=/usr/local/bin/myjob.sh
/etc/default/myjob
DB_HOST=localhost
DB_NAME=wordpress
BACKUP_DIR=/mnt/backups
S3_BUCKET=my-bucket
LOG_LEVEL=INFO
Per-Environment Overrides
create-staging-override.sh
sudo mkdir -p /etc/systemd/system/myjob.service.d/
sudo tee /etc/systemd/system/myjob.service.d/staging.conf > /dev/null <<'EOF'
[Service]
EnvironmentFile=
EnvironmentFile=/etc/default/myjob-staging
RuntimeMaxSec=2h
EOF
sudo systemctl daemon-reload
Monitoring
Comprehensive Health Check
/usr/local/bin/check-timer-health.sh
#!/usr/bin/env bash
set -euo pipefail
TIMERS=(wp-backup wp-db-backup backup-prune disk-check wp-cron-runner)
EXIT_CODE=0
for name in "${TIMERS[@]}"; do
timer_active=$(systemctl is-active "${name}.timer" 2>/dev/null || echo "not-found")
if [ "$timer_active" != "active" ]; then
echo "CRITICAL: ${name}.timer is $timer_active"
EXIT_CODE=2
else
next=$(systemctl show "${name}.timer" -p NextElapseUSecRealtime --value 2>/dev/null)
last=$(systemctl show "${name}.timer" -p LastTriggerUSec --value 2>/dev/null)
echo "OK: ${name}.timer (last: $last, next: $next)"
fi
done
exit $EXIT_CODE
WordPress VPS Timer Reference
| Task | Schedule | Key Directives |
|---|---|---|
| WordPress cron events | *:0/15 | User=www-data, wp cron event run --due-now |
| Nightly full backup | 02:15 | flock, RuntimeMaxSec=2h, Persistent=true |
| Database export | 02:30 | wp db export, flock, RandomizedDelaySec=5m |
| Backup retention purge | 04:00 | find -mtime +14 -delete |
| Object cache flush | 00/6:00:00 | wp cache flush, User=www-data |
| Media sync to S3 | 0/2:00:00 | rclone sync, User=www-data |
| SSL certificate renewal | 03:00 | certbot renew, FixedRandomDelay=true |
| PHP-FPM weekly restart | Sun 05:00 | systemctl restart php8.2-fpm |
| Weekly WP optimization | Mon 03:00 | wp db optimize |
| Peak-hours cache warm | 08..20:0/10 | Persistent=false |
Key Takeaways
- Use
flock -n+RuntimeMaxSectogether for overlap prevention + kill switch. - Use
Persistent=truefor any job that must not be silently skipped. - Use
RandomizedDelaySec=+FixedRandomDelay=truefor fleet safety. - Apply at least Level 1 security hardening on every production service.
- Use
OnFailure=for alerting on job failures. - Use
EnvironmentFile=and drop-in overrides for multi-environment setups.
What's Next
- Study Cases — real-world scenarios where systemd timers solve complex automation problems.