Production Patterns

Learning Focus

By the end of this lesson you will be able to deploy path units with full security sandboxing, rate limiting, timer fallback for reliability, structured logging, and monitoring integration — ready for production workloads.

Hardened Production Template

Use this as a baseline for any production path-triggered job. Adjust User=, ReadWritePaths=, and ExecStart= for each specific job.

The Path Unit

/etc/systemd/system/safe-path-job.path
[Unit]
Description=Watch for incoming files (hardened)
Documentation=man:systemd.path(5)
After=local-fs.target

[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
DirectoryMode=0775

[Install]
WantedBy=paths.target

The Service Unit

/etc/systemd/system/safe-path-job.service
[Unit]
Description=Safe path-triggered job
After=network-online.target
Wants=network-online.target
# Rate limiting: max 10 starts in 60 seconds
StartLimitBurst=10
StartLimitIntervalSec=60

[Service]
Type=oneshot
User=www-data
Group=www-data
WorkingDirectory=/var/www/html
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/local/bin/safe-process.sh
# Kill runaway jobs after 30 minutes
RuntimeMaxSec=30m
# Log to both journald and file
StandardOutput=append:/var/log/safe-path-job.log
StandardError=append:/var/log/safe-path-job.log

# ── Security hardening ──
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop /var/lock /mnt/backups
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true

Security Hardening Deep Dive

Why Harden Path-Triggered Services?

A path-triggered service often processes untrusted input — files uploaded by users, CI pipelines, or external systems. If a malicious file exploits a vulnerability in your processing script, security hardening limits the blast radius.

Security Directives Explained

Directive	What It Does	Impact
`NoNewPrivileges=true`	Prevents the process from gaining new privileges (no setuid/setgid)	Blocks privilege escalation
`PrivateTmp=true`	Gives the service its own `/tmp` namespace	Prevents tmp-based attacks
`ProtectSystem=strict`	Makes `/usr`, `/boot`, `/efi`, and most system directories read-only	Blocks system file modification
`ProtectHome=read-only`	Makes home directories read-only	Protects user data
`ReadWritePaths=`	Whitelist specific writable paths (required with `ProtectSystem=strict`)	Explicit write access
`PrivateDevices=true`	Hides physical devices from the service	Blocks device access
`ProtectKernelTunables=true`	Makes `/proc/sys`, `/sys` read-only	Blocks kernel parameter changes
`ProtectControlGroups=true`	Makes cgroup filesystem read-only	Blocks cgroup manipulation
`MemoryDenyWriteExecute=true`	Prevents creating writable+executable memory	Blocks code injection
`RestrictRealtime=true`	Prevents real-time scheduling	Blocks priority manipulation

Graduated Hardening Levels

Start with Level 1 and increase as you gain confidence:

Level 1 — Basic (All Services Should Have This)

level-1-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true

Level 2 — Standard (Recommended for Production)

level-2-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop

Level 3 — Strict (For Security-Sensitive Workloads)

level-3-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
RestrictSUIDSGID=true

Auditing Your Security Posture

Use systemd-analyze security to score your service:

security-audit.sh
systemd-analyze security safe-path-job.service

example-output.txt
  NAME                                  DESCRIPTION                                        EXPOSURE
✓ NoNewPrivileges=                      Service cannot gain new privileges                       ...
✓ PrivateTmp=                           Service has private /tmp                                ...
✓ ProtectSystem=                        Service has strict system protection                     ...
...
→ Overall exposure level for safe-path-job.service: 2.1 OK

A score below 3.0 is considered good. Below 1.0 is excellent.

Rate Limiting

Why Rate Limit?

If a hot directory receives hundreds of files per second (e.g., a busy upload queue), the path unit would trigger the service hundreds of times per second. Rate limiting prevents resource exhaustion.

Service-Level Rate Limiting

Applied in the [Unit] section of the .service file:

rate-limited-service.service
[Unit]
Description=Rate-limited file processor
# Allow at most 5 starts within any 30-second window
StartLimitBurst=5
StartLimitIntervalSec=30

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
# Wait 5 seconds before retry on failure
Restart=on-failure
RestartSec=5

how-it-works.txt
StartLimitBurst=5       → Allow at most 5 starts
StartLimitIntervalSec=30 → Within any 30-second window
Result: If more than 5 files arrive in 30s, excess triggers are throttled.
         After 30s, the counter resets.

Path-Level Rate Limiting (systemd 250+)

On systemd 250+, you can also rate-limit at the path unit level:

rate-limited-path.path
[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
# Limit triggers to 10 per 5 seconds
TriggerLimitBurst=10
TriggerLimitIntervalSec=5s

What Happens When Rate Limit Is Hit

When the rate limit is exceeded:

systemd stops starting the service.
The unit enters a failed state.
The journal logs: start request repeated too quickly for service.
You must manually reset: sudo systemctl reset-failed mytask.service.

To prevent this disruption, tune your limits based on expected traffic.

Concurrency Safety

systemd provides built-in concurrency safety for Type=oneshot services:

Behavior	Description
No parallel runs	systemd will not start a second instance of a `Type=oneshot` service while the first is running
Event queuing	If events arrive during execution, they are queued
Re-trigger on exit	After the service exits, the path unit re-checks the condition and re-triggers if needed

This means your script does not need file locking or PID file management — systemd handles it.

Timer Fallback Pattern

For critical workflows, pair a .path unit (instant reaction) with a .timer unit (periodic fallback). This provides two layers of reliability:

Primary: The .path unit triggers instantly on filesystem events.
Fallback: The .timer catches anything missed due to inotify race conditions.

Both trigger the same .service unit.

/etc/systemd/system/process-uploads.path
[Unit]
Description=Watch for incoming uploads (instant trigger)

[Path]
DirectoryNotEmpty=/var/www/uploads/queue
MakeDirectory=yes

[Install]
WantedBy=paths.target

/etc/systemd/system/process-uploads-fallback.timer
[Unit]
Description=Fallback: process uploads every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true
Unit=process-uploads.service

[Install]
WantedBy=timers.target

/etc/systemd/system/process-uploads.service
[Unit]
Description=Process uploaded files

[Service]
Type=oneshot
User=www-data
ExecStart=/usr/local/bin/process-uploads.sh

enable-both.sh
sudo systemctl daemon-reload
sudo systemctl enable --now process-uploads.path
sudo systemctl enable --now process-uploads-fallback.timer

When To Use This Pattern

File processing pipelines where missing a file would cause business impact.
Environments with high inotify watch counts where events might be dropped.
Multi-server setups where NFS or network filesystems may not generate reliable inotify events.

Structured Logging Pattern

For production services, use structured log output that's easy to parse and monitor:

/usr/local/bin/structured-process.sh
#!/usr/bin/env bash
set -euo pipefail

LOG_PREFIX="[$(date -Is)] [process-uploads]"

log_info()  { echo "$LOG_PREFIX [INFO]  $*"; }
log_error() { echo "$LOG_PREFIX [ERROR] $*" >&2; }
log_warn()  { echo "$LOG_PREFIX [WARN]  $*"; }

FILE=$(ls /var/www/uploads/queue/ 2>/dev/null | head -1)
if [ -z "$FILE" ]; then
    log_info "No files to process"
    exit 0
fi

log_info "Processing: $FILE"
if /usr/local/bin/process "$FILE"; then
    mv "/var/www/uploads/queue/$FILE" /var/www/uploads/done/
    log_info "Success: $FILE moved to done/"
else
    mv "/var/www/uploads/queue/$FILE" /var/www/uploads/failed/
    log_error "Failed: $FILE moved to failed/"
    exit 1
fi

Log Rotation

For file-based logs, set up logrotate:

/etc/logrotate.d/path-jobs
/var/log/safe-path-job.log
/var/log/csv-import.log
/var/log/image-optimize.log
{
    daily
    rotate 14
    compress
    missingok
    notifempty
    create 0640 www-data www-data
}

Error Handling and Recovery

Retry on Failure

retry-service.service
[Unit]
Description=Retrying file processor
StartLimitBurst=5
StartLimitIntervalSec=300

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
Restart=on-failure
RestartSec=10

Dead Letter Queue Pattern

When a file fails processing, move it to a "failed" folder instead of leaving it in the queue (which would cause infinite retries):

/usr/local/bin/process-with-dlq.sh
#!/usr/bin/env bash
set -euo pipefail

QUEUE="/var/www/drop"
DONE="/var/www/done"
FAILED="/var/www/failed"

mkdir -p "$DONE" "$FAILED"

FILE=$(ls "$QUEUE/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0

if timeout 300 /usr/local/bin/process "$QUEUE/$FILE"; then
    mv "$QUEUE/$FILE" "$DONE/"
    echo "[$(date -Is)] OK: $FILE"
else
    mv "$QUEUE/$FILE" "$FAILED/"
    echo "[$(date -Is)] FAIL: $FILE → moved to $FAILED/" >&2
fi

Multi-Environment Configuration

Using Environment Files

Keep environment-specific settings outside the unit file:

/etc/systemd/system/process-uploads.service
[Service]
Type=oneshot
User=www-data
EnvironmentFile=/etc/default/process-uploads
ExecStart=/usr/local/bin/process-uploads.sh

/etc/default/process-uploads
QUEUE_DIR=/var/www/uploads/queue
DONE_DIR=/var/www/uploads/done
FAILED_DIR=/var/www/uploads/failed
LOG_LEVEL=INFO
S3_BUCKET=my-bucket

/usr/local/bin/process-uploads.sh
#!/usr/bin/env bash
set -euo pipefail
# Variables come from EnvironmentFile
FILE=$(ls "$QUEUE_DIR/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0
echo "[$(date -Is)] Processing: $FILE (bucket: $S3_BUCKET)"
# ...

Per-Environment Overrides

Use drop-in overrides for environment differences:

create-staging-override.sh
sudo mkdir -p /etc/systemd/system/process-uploads.service.d/
sudo tee /etc/systemd/system/process-uploads.service.d/staging.conf > /dev/null <<'EOF'
[Service]
# On staging, use a different environment file
EnvironmentFile=
EnvironmentFile=/etc/default/process-uploads-staging
# Relax the runtime limit
RuntimeMaxSec=1h
EOF
sudo systemctl daemon-reload

Monitoring Integration

Health Check Script

/usr/local/bin/check-path-units.sh
#!/usr/bin/env bash
set -euo pipefail

UNITS=(process-drop csv-import deploy-trigger image-queue)
EXIT_CODE=0

for name in "${UNITS[@]}"; do
    path_status=$(systemctl is-active "${name}.path" 2>/dev/null || echo "not-found")
    if [ "$path_status" != "active" ]; then
        echo "CRITICAL: ${name}.path is $path_status"
        EXIT_CODE=2
    else
        echo "OK: ${name}.path is active"
    fi
done

exit $EXIT_CODE

Notification on Failure

Add an OnFailure= directive to send alerts when a path-triggered service fails:

/etc/systemd/system/process-drop.service
[Unit]
Description=Process drop folder
OnFailure=alert-failure@%n.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process-drop.sh

/etc/systemd/system/alert-failure@.service
[Unit]
Description=Send alert for %i failure

[Service]
Type=oneshot
ExecStart=/usr/local/bin/alert-failure.sh %i

/usr/local/bin/alert-failure.sh
#!/usr/bin/env bash
UNIT="$1"
MSG="[$(date -Is)] ALERT: $UNIT failed on $(hostname)"
echo "$MSG"
# curl -s -X POST "https://hooks.slack.com/..." -d "{\"text\": \"$MSG\"}"

WordPress VPS Production Reference

Common path unit patterns for a WordPress VPS:

Scenario	Directive	Watch Path	Triggered Action
Flush cache via SFTP touch-file	`PathExists=`	`/var/www/html/clear_cache.txt`	`wp cache flush` + `rm` file
Import SQL dump on drop	`PathExistsGlob=`	`/mnt/import/*.sql`	`wp db import` + move to archive
Reload Nginx after conf change	`PathChanged=`	`/etc/nginx/conf.d/`	`systemctl reload nginx`
Media upload sync to S3	`PathModified=`	`/var/www/html/wp-content/uploads/`	`rclone sync`
Image optimization queue	`DirectoryNotEmpty=`	`/var/media/queue/`	Optimize + move to done
ClamAV scan on new uploads	`PathChanged=`	`/var/www/html/wp-content/uploads/`	`clamscan --move=/quarantine`
TLS cert reload	`PathChanged=`	`/etc/letsencrypt/live/.../fullchain.pem`	`systemctl reload nginx`
PHP-FPM restart on php.ini edit	`PathChanged=`	`/etc/php/8.2/fpm/php.ini`	`systemctl restart php8.2-fpm`

Key Takeaways

Always apply at least Level 1 security hardening (NoNewPrivileges, PrivateTmp) to production services.
Use StartLimitBurst and StartLimitIntervalSec to rate-limit hot directories.
Pair critical path units with a timer fallback for maximum reliability.
Implement a dead-letter queue pattern to prevent infinite retries on failing files.
Use EnvironmentFile= and drop-in overrides for multi-environment setups.
Add OnFailure= for production monitoring and alerting.

What's Next

Debugging and Troubleshooting — step-by-step debugging workflow and common failure patterns.

Hardened Production Template​

The Path Unit​

The Service Unit​

Security Hardening Deep Dive​

Why Harden Path-Triggered Services?​

Security Directives Explained​

Graduated Hardening Levels​

Level 1 — Basic (All Services Should Have This)​

Level 2 — Standard (Recommended for Production)​

Level 3 — Strict (For Security-Sensitive Workloads)​

Auditing Your Security Posture​

Rate Limiting​

Why Rate Limit?​

Service-Level Rate Limiting​

Path-Level Rate Limiting (systemd 250+)​

What Happens When Rate Limit Is Hit​

Concurrency Safety​

Timer Fallback Pattern​

Structured Logging Pattern​

Log Rotation​

Error Handling and Recovery​

Retry on Failure​

Dead Letter Queue Pattern​

Multi-Environment Configuration​

Using Environment Files​

Per-Environment Overrides​

Monitoring Integration​

Health Check Script​

Notification on Failure​

WordPress VPS Production Reference​

Key Takeaways​

What's Next​