Production Patterns

Learning Focus

By the end of this lesson you will be able to deploy timer units with security sandboxing, overlap prevention, missed-run catch-up, fleet jitter, structured logging, error handling, and monitoring integration.

Hardened Production Template

Use this as a baseline for any production timer job:

/etc/systemd/system/safe-timer-job.timer
[Unit]
Description=Safe timer-driven job schedule

[Timer]
OnCalendar=02:15
Persistent=true
RandomizedDelaySec=5m
FixedRandomDelay=true
AccuracySec=1m

[Install]
WantedBy=timers.target

/etc/systemd/system/safe-timer-job.service
[Unit]
Description=Safe timer-driven job
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
User=www-data
Group=www-data
WorkingDirectory=/var/www/html
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/bin/flock -n /var/lock/myjob.lock /usr/local/bin/myjob.sh
RuntimeMaxSec=1h
StandardOutput=append:/var/log/myjob.log
StandardError=append:/var/log/myjob.log

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups

Overlap Prevention with `flock`

The Problem

If a job takes longer than the timer interval, systemd will try to start a second instance. For Type=oneshot, systemd queues the request — but the safer pattern is explicit file locking.

The Solution: `flock`

flock-in-service.service
[Service]
Type=oneshot
ExecStart=/usr/bin/flock -n /var/lock/backup.lock /usr/local/bin/backup.sh

Flag	Behavior
`-n`	Non-blocking — exit immediately if lock is held
`-w 30`	Wait up to 30 seconds for lock
`-x`	Exclusive lock (default)
`-s`	Shared lock (read lock)

How It Works

When To Use `flock`

Scenario	Use flock?	Why
Nightly backup (30 min job, 24h interval)	Yes	Safety net if backup is slow
Health check (2s job, 60s interval)	No	Job is always fast
Database export (variable duration)	Yes	Could exceed interval on large DBs
CDN sync (variable duration)	Yes	Network delays are unpredictable

Combining with RuntimeMaxSec

flock-plus-runtimemax.service
[Service]
Type=oneshot
ExecStart=/usr/bin/flock -n /var/lock/backup.lock /usr/local/bin/backup.sh
RuntimeMaxSec=2h    # Kill if it runs longer than 2 hours

RuntimeMaxSec is the kill switch. flock is the overlap preventer. Use both together.

Persistent Catch-Up

How It Works

[Timer]
OnCalendar=02:15
Persistent=true

systemd stores the last trigger time on disk.
After boot or timer restart, systemd checks: "Did any runs fire between the stored time and now?"
If yes, systemd runs the service once at the next timer evaluation.
The catch-up run still honors RandomizedDelaySec=.

When To Use

Job Type	Use Persistent?	Why
Nightly backup	Yes	Must not be silently skipped
Database export	Yes	Data integrity depends on regular exports
Health check	No	A missed check is fine; the next one will fire
Cache warming	No	Stale cache is temporary
Log rotation	Yes	Logs must be rotated regularly
Retention purge	Yes	Must enforce storage limits

State Management

persistent-state-commands.sh
# Check when the timer last fired
systemctl show wp-backup.timer -p LastTriggerUSec

# Reset stored state (force fresh start)
sudo systemctl clean --what=state wp-backup.timer

# Restart the timer to pick up the reset
sudo systemctl restart wp-backup.timer

Fleet Jitter

The Thundering Herd Problem

10 servers, all scheduled at 02:15:

Without jitter: all 10 hit the backup target at 02:15:00.
With jitter: spread across 02:15:00–02:20:00.

The Solution

jitter-config.timer
[Timer]
OnCalendar=02:15
RandomizedDelaySec=5m        # Add 0–5 minutes random delay
FixedRandomDelay=true        # Same offset each day (stable per unit)
AccuracySec=1m               # Coalescing window

Directive	Purpose
`RandomizedDelaySec=5m`	Each run gets 0–5 minutes of random delay
`FixedRandomDelay=true`	The delay is calculated from the unit name — stable across runs and reboots
`AccuracySec=1m`	systemd may coalesce with other timers within this window

Example: 5 Servers

Server	Base Schedule	FixedRandomDelay Offset	Actual Fire Time
server-01	02:15	+47s	02:15:47
server-02	02:15	+2m12s	02:17:12
server-03	02:15	+3m55s	02:18:55
server-04	02:15	+1m30s	02:16:30
server-05	02:15	+4m08s	02:19:08

Security Hardening

Graduated Hardening Levels

Level 1 — Basic (Every Production Service)

[Service]
NoNewPrivileges=true
PrivateTmp=true

Level 2 — Standard (Recommended)

[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups

Level 3 — Strict (Security-Sensitive Workloads)

[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/lock /mnt/backups
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
MemoryDenyWriteExecute=true
RestrictRealtime=true

Security Audit

security-audit.sh
systemd-analyze security safe-timer-job.service

Aim for a score below 3.0:

example-output.txt
→ Overall exposure level for safe-timer-job.service: 2.1 OK

Structured Logging

Log to Both journald and File

logging-service.service
[Service]
StandardOutput=append:/var/log/myjob.log
StandardError=append:/var/log/myjob.log

Structured Log Format in Scripts

/usr/local/bin/backup-with-logging.sh
#!/usr/bin/env bash
set -euo pipefail

LOG_PREFIX="[$(date -Is)] [backup]"

log_info()  { echo "$LOG_PREFIX [INFO]  $*"; }
log_error() { echo "$LOG_PREFIX [ERROR] $*" >&2; }
log_warn()  { echo "$LOG_PREFIX [WARN]  $*"; }

log_info "Backup started"
if /usr/local/bin/wp db export /mnt/backups/db.sql --path=/var/www/html; then
    log_info "Database exported successfully"
else
    log_error "Database export failed"
    exit 1
fi
log_info "Backup complete"

Log Rotation

/etc/logrotate.d/timer-jobs
/var/log/wp-backup.log
/var/log/wp-db-backup.log
/var/log/health-check.log
/var/log/media-sync.log
{
    daily
    rotate 14
    compress
    missingok
    notifempty
    create 0640 www-data www-data
}

Error Handling

OnFailure Notification

myjob.service
[Unit]
Description=My scheduled job
OnFailure=alert-failure@%n.service

/etc/systemd/system/alert-failure@.service
[Unit]
Description=Send alert for %i failure

[Service]
Type=oneshot
ExecStart=/usr/local/bin/alert-failure.sh %i

/usr/local/bin/alert-failure.sh
#!/usr/bin/env bash
UNIT="$1"
MSG="[$(date -Is)] ALERT: $UNIT failed on $(hostname)"
echo "$MSG"
# Send to Slack, PagerDuty, etc.
# curl -s -X POST "https://hooks.slack.com/..." -d "{\"text\": \"$MSG\"}"

Script-Level Error Handling

robust-script.sh
#!/usr/bin/env bash
set -euo pipefail

cleanup() {
    if [ $? -ne 0 ]; then
        echo "[$(date -Is)] [ERROR] Script failed at line $LINENO" >&2
    fi
}
trap cleanup EXIT

# Your logic here...

Environment Configuration

Environment Files

Keep environment-specific settings outside the unit file:

myjob.service
[Service]
EnvironmentFile=/etc/default/myjob
ExecStart=/usr/local/bin/myjob.sh

/etc/default/myjob
DB_HOST=localhost
DB_NAME=wordpress
BACKUP_DIR=/mnt/backups
S3_BUCKET=my-bucket
LOG_LEVEL=INFO

Per-Environment Overrides

create-staging-override.sh
sudo mkdir -p /etc/systemd/system/myjob.service.d/
sudo tee /etc/systemd/system/myjob.service.d/staging.conf > /dev/null <<'EOF'
[Service]
EnvironmentFile=
EnvironmentFile=/etc/default/myjob-staging
RuntimeMaxSec=2h
EOF
sudo systemctl daemon-reload

Monitoring

Comprehensive Health Check

/usr/local/bin/check-timer-health.sh
#!/usr/bin/env bash
set -euo pipefail

TIMERS=(wp-backup wp-db-backup backup-prune disk-check wp-cron-runner)
EXIT_CODE=0

for name in "${TIMERS[@]}"; do
    timer_active=$(systemctl is-active "${name}.timer" 2>/dev/null || echo "not-found")
    if [ "$timer_active" != "active" ]; then
        echo "CRITICAL: ${name}.timer is $timer_active"
        EXIT_CODE=2
    else
        next=$(systemctl show "${name}.timer" -p NextElapseUSecRealtime --value 2>/dev/null)
        last=$(systemctl show "${name}.timer" -p LastTriggerUSec --value 2>/dev/null)
        echo "OK: ${name}.timer (last: $last, next: $next)"
    fi
done

exit $EXIT_CODE

WordPress VPS Timer Reference

Task	Schedule	Key Directives
WordPress cron events	`*:0/15`	`User=www-data`, `wp cron event run --due-now`
Nightly full backup	`02:15`	`flock`, `RuntimeMaxSec=2h`, `Persistent=true`
Database export	`02:30`	`wp db export`, `flock`, `RandomizedDelaySec=5m`
Backup retention purge	`04:00`	`find -mtime +14 -delete`
Object cache flush	`00/6:00:00`	`wp cache flush`, `User=www-data`
Media sync to S3	`0/2:00:00`	`rclone sync`, `User=www-data`
SSL certificate renewal	`03:00`	`certbot renew`, `FixedRandomDelay=true`
PHP-FPM weekly restart	`Sun 05:00`	`systemctl restart php8.2-fpm`
Weekly WP optimization	`Mon 03:00`	`wp db optimize`
Peak-hours cache warm	`08..20:0/10`	`Persistent=false`

Key Takeaways

Use flock -n + RuntimeMaxSec together for overlap prevention + kill switch.
Use Persistent=true for any job that must not be silently skipped.
Use RandomizedDelaySec= + FixedRandomDelay=true for fleet safety.
Apply at least Level 1 security hardening on every production service.
Use OnFailure= for alerting on job failures.
Use EnvironmentFile= and drop-in overrides for multi-environment setups.

What's Next

Study Cases — real-world scenarios where systemd timers solve complex automation problems.

Hardened Production Template​

Overlap Prevention with flock​

The Problem​

The Solution: flock​

How It Works​

When To Use flock​

Combining with RuntimeMaxSec​

Persistent Catch-Up​

How It Works​

When To Use​

State Management​

Fleet Jitter​

The Thundering Herd Problem​

The Solution​

Example: 5 Servers​

Security Hardening​

Graduated Hardening Levels​

Level 1 — Basic (Every Production Service)​

Level 2 — Standard (Recommended)​

Level 3 — Strict (Security-Sensitive Workloads)​

Security Audit​

Structured Logging​

Log to Both journald and File​

Structured Log Format in Scripts​

Log Rotation​

Error Handling​

OnFailure Notification​

Script-Level Error Handling​

Environment Configuration​

Environment Files​

Per-Environment Overrides​

Monitoring​

Comprehensive Health Check​

WordPress VPS Timer Reference​

Key Takeaways​

What's Next​

Hardened Production Template

Overlap Prevention with `flock`

The Problem

The Solution: `flock`

How It Works

When To Use `flock`

Combining with RuntimeMaxSec

Persistent Catch-Up

How It Works

When To Use

State Management

Fleet Jitter

The Thundering Herd Problem

The Solution

Example: 5 Servers

Security Hardening

Graduated Hardening Levels

Level 1 — Basic (Every Production Service)

Level 2 — Standard (Recommended)

Level 3 — Strict (Security-Sensitive Workloads)

Security Audit

Structured Logging

Log to Both journald and File

Structured Log Format in Scripts

Log Rotation

Error Handling

OnFailure Notification

Script-Level Error Handling

Environment Configuration

Environment Files

Per-Environment Overrides

Monitoring

Comprehensive Health Check

WordPress VPS Timer Reference

Key Takeaways

What's Next