Skip to main content

Production Patterns

Learning Focus

By the end of this lesson you will be able to deploy path units with full security sandboxing, rate limiting, timer fallback for reliability, structured logging, and monitoring integration — ready for production workloads.

Hardened Production Template

Use this as a baseline for any production path-triggered job. Adjust User=, ReadWritePaths=, and ExecStart= for each specific job.

The Path Unit

/etc/systemd/system/safe-path-job.path
[Unit]
Description=Watch for incoming files (hardened)
Documentation=man:systemd.path(5)
After=local-fs.target

[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
DirectoryMode=0775

[Install]
WantedBy=paths.target

The Service Unit

/etc/systemd/system/safe-path-job.service
[Unit]
Description=Safe path-triggered job
After=network-online.target
Wants=network-online.target
# Rate limiting: max 10 starts in 60 seconds
StartLimitBurst=10
StartLimitIntervalSec=60

[Service]
Type=oneshot
User=www-data
Group=www-data
WorkingDirectory=/var/www/html
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/local/bin/safe-process.sh
# Kill runaway jobs after 30 minutes
RuntimeMaxSec=30m
# Log to both journald and file
StandardOutput=append:/var/log/safe-path-job.log
StandardError=append:/var/log/safe-path-job.log

# ── Security hardening ──
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop /var/lock /mnt/backups
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true

Security Hardening Deep Dive

Why Harden Path-Triggered Services?

A path-triggered service often processes untrusted input — files uploaded by users, CI pipelines, or external systems. If a malicious file exploits a vulnerability in your processing script, security hardening limits the blast radius.

Security Directives Explained

DirectiveWhat It DoesImpact
NoNewPrivileges=truePrevents the process from gaining new privileges (no setuid/setgid)Blocks privilege escalation
PrivateTmp=trueGives the service its own /tmp namespacePrevents tmp-based attacks
ProtectSystem=strictMakes /usr, /boot, /efi, and most system directories read-onlyBlocks system file modification
ProtectHome=read-onlyMakes home directories read-onlyProtects user data
ReadWritePaths=Whitelist specific writable paths (required with ProtectSystem=strict)Explicit write access
PrivateDevices=trueHides physical devices from the serviceBlocks device access
ProtectKernelTunables=trueMakes /proc/sys, /sys read-onlyBlocks kernel parameter changes
ProtectControlGroups=trueMakes cgroup filesystem read-onlyBlocks cgroup manipulation
MemoryDenyWriteExecute=truePrevents creating writable+executable memoryBlocks code injection
RestrictRealtime=truePrevents real-time schedulingBlocks priority manipulation

Graduated Hardening Levels

Start with Level 1 and increase as you gain confidence:

Level 1 — Basic (All Services Should Have This)

level-1-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true
level-2-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop

Level 3 — Strict (For Security-Sensitive Workloads)

level-3-hardening.service
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
RestrictSUIDSGID=true

Auditing Your Security Posture

Use systemd-analyze security to score your service:

security-audit.sh
systemd-analyze security safe-path-job.service
example-output.txt
NAME DESCRIPTION EXPOSURE
✓ NoNewPrivileges= Service cannot gain new privileges ...
✓ PrivateTmp= Service has private /tmp ...
✓ ProtectSystem= Service has strict system protection ...
...
→ Overall exposure level for safe-path-job.service: 2.1 OK

A score below 3.0 is considered good. Below 1.0 is excellent.


Rate Limiting

Why Rate Limit?

If a hot directory receives hundreds of files per second (e.g., a busy upload queue), the path unit would trigger the service hundreds of times per second. Rate limiting prevents resource exhaustion.

Service-Level Rate Limiting

Applied in the [Unit] section of the .service file:

rate-limited-service.service
[Unit]
Description=Rate-limited file processor
# Allow at most 5 starts within any 30-second window
StartLimitBurst=5
StartLimitIntervalSec=30

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
# Wait 5 seconds before retry on failure
Restart=on-failure
RestartSec=5
how-it-works.txt
StartLimitBurst=5 → Allow at most 5 starts
StartLimitIntervalSec=30 → Within any 30-second window
Result: If more than 5 files arrive in 30s, excess triggers are throttled.
After 30s, the counter resets.

Path-Level Rate Limiting (systemd 250+)

On systemd 250+, you can also rate-limit at the path unit level:

rate-limited-path.path
[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
# Limit triggers to 10 per 5 seconds
TriggerLimitBurst=10
TriggerLimitIntervalSec=5s

What Happens When Rate Limit Is Hit

When the rate limit is exceeded:

  1. systemd stops starting the service.
  2. The unit enters a failed state.
  3. The journal logs: start request repeated too quickly for service.
  4. You must manually reset: sudo systemctl reset-failed mytask.service.

To prevent this disruption, tune your limits based on expected traffic.


Concurrency Safety

systemd provides built-in concurrency safety for Type=oneshot services:

BehaviorDescription
No parallel runssystemd will not start a second instance of a Type=oneshot service while the first is running
Event queuingIf events arrive during execution, they are queued
Re-trigger on exitAfter the service exits, the path unit re-checks the condition and re-triggers if needed

This means your script does not need file locking or PID file management — systemd handles it.


Timer Fallback Pattern

For critical workflows, pair a .path unit (instant reaction) with a .timer unit (periodic fallback). This provides two layers of reliability:

  1. Primary: The .path unit triggers instantly on filesystem events.
  2. Fallback: The .timer catches anything missed due to inotify race conditions.

Both trigger the same .service unit.

/etc/systemd/system/process-uploads.path
[Unit]
Description=Watch for incoming uploads (instant trigger)

[Path]
DirectoryNotEmpty=/var/www/uploads/queue
MakeDirectory=yes

[Install]
WantedBy=paths.target
/etc/systemd/system/process-uploads-fallback.timer
[Unit]
Description=Fallback: process uploads every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true
Unit=process-uploads.service

[Install]
WantedBy=timers.target
/etc/systemd/system/process-uploads.service
[Unit]
Description=Process uploaded files

[Service]
Type=oneshot
User=www-data
ExecStart=/usr/local/bin/process-uploads.sh
enable-both.sh
sudo systemctl daemon-reload
sudo systemctl enable --now process-uploads.path
sudo systemctl enable --now process-uploads-fallback.timer
When To Use This Pattern
  • File processing pipelines where missing a file would cause business impact.
  • Environments with high inotify watch counts where events might be dropped.
  • Multi-server setups where NFS or network filesystems may not generate reliable inotify events.

Structured Logging Pattern

For production services, use structured log output that's easy to parse and monitor:

/usr/local/bin/structured-process.sh
#!/usr/bin/env bash
set -euo pipefail

LOG_PREFIX="[$(date -Is)] [process-uploads]"

log_info() { echo "$LOG_PREFIX [INFO] $*"; }
log_error() { echo "$LOG_PREFIX [ERROR] $*" >&2; }
log_warn() { echo "$LOG_PREFIX [WARN] $*"; }

FILE=$(ls /var/www/uploads/queue/ 2>/dev/null | head -1)
if [ -z "$FILE" ]; then
log_info "No files to process"
exit 0
fi

log_info "Processing: $FILE"
if /usr/local/bin/process "$FILE"; then
mv "/var/www/uploads/queue/$FILE" /var/www/uploads/done/
log_info "Success: $FILE moved to done/"
else
mv "/var/www/uploads/queue/$FILE" /var/www/uploads/failed/
log_error "Failed: $FILE moved to failed/"
exit 1
fi

Log Rotation

For file-based logs, set up logrotate:

/etc/logrotate.d/path-jobs
/var/log/safe-path-job.log
/var/log/csv-import.log
/var/log/image-optimize.log
{
daily
rotate 14
compress
missingok
notifempty
create 0640 www-data www-data
}

Error Handling and Recovery

Retry on Failure

retry-service.service
[Unit]
Description=Retrying file processor
StartLimitBurst=5
StartLimitIntervalSec=300

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
Restart=on-failure
RestartSec=10

Dead Letter Queue Pattern

When a file fails processing, move it to a "failed" folder instead of leaving it in the queue (which would cause infinite retries):

/usr/local/bin/process-with-dlq.sh
#!/usr/bin/env bash
set -euo pipefail

QUEUE="/var/www/drop"
DONE="/var/www/done"
FAILED="/var/www/failed"

mkdir -p "$DONE" "$FAILED"

FILE=$(ls "$QUEUE/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0

if timeout 300 /usr/local/bin/process "$QUEUE/$FILE"; then
mv "$QUEUE/$FILE" "$DONE/"
echo "[$(date -Is)] OK: $FILE"
else
mv "$QUEUE/$FILE" "$FAILED/"
echo "[$(date -Is)] FAIL: $FILE → moved to $FAILED/" >&2
fi

Multi-Environment Configuration

Using Environment Files

Keep environment-specific settings outside the unit file:

/etc/systemd/system/process-uploads.service
[Service]
Type=oneshot
User=www-data
EnvironmentFile=/etc/default/process-uploads
ExecStart=/usr/local/bin/process-uploads.sh
/etc/default/process-uploads
QUEUE_DIR=/var/www/uploads/queue
DONE_DIR=/var/www/uploads/done
FAILED_DIR=/var/www/uploads/failed
LOG_LEVEL=INFO
S3_BUCKET=my-bucket
/usr/local/bin/process-uploads.sh
#!/usr/bin/env bash
set -euo pipefail
# Variables come from EnvironmentFile
FILE=$(ls "$QUEUE_DIR/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0
echo "[$(date -Is)] Processing: $FILE (bucket: $S3_BUCKET)"
# ...

Per-Environment Overrides

Use drop-in overrides for environment differences:

create-staging-override.sh
sudo mkdir -p /etc/systemd/system/process-uploads.service.d/
sudo tee /etc/systemd/system/process-uploads.service.d/staging.conf > /dev/null <<'EOF'
[Service]
# On staging, use a different environment file
EnvironmentFile=
EnvironmentFile=/etc/default/process-uploads-staging
# Relax the runtime limit
RuntimeMaxSec=1h
EOF
sudo systemctl daemon-reload

Monitoring Integration

Health Check Script

/usr/local/bin/check-path-units.sh
#!/usr/bin/env bash
set -euo pipefail

UNITS=(process-drop csv-import deploy-trigger image-queue)
EXIT_CODE=0

for name in "${UNITS[@]}"; do
path_status=$(systemctl is-active "${name}.path" 2>/dev/null || echo "not-found")
if [ "$path_status" != "active" ]; then
echo "CRITICAL: ${name}.path is $path_status"
EXIT_CODE=2
else
echo "OK: ${name}.path is active"
fi
done

exit $EXIT_CODE

Notification on Failure

Add an OnFailure= directive to send alerts when a path-triggered service fails:

/etc/systemd/system/process-drop.service
[Unit]
Description=Process drop folder
OnFailure=alert-failure@%n.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/process-drop.sh
/etc/systemd/system/alert-failure@.service
[Unit]
Description=Send alert for %i failure

[Service]
Type=oneshot
ExecStart=/usr/local/bin/alert-failure.sh %i
/usr/local/bin/alert-failure.sh
#!/usr/bin/env bash
UNIT="$1"
MSG="[$(date -Is)] ALERT: $UNIT failed on $(hostname)"
echo "$MSG"
# curl -s -X POST "https://hooks.slack.com/..." -d "{\"text\": \"$MSG\"}"

WordPress VPS Production Reference

Common path unit patterns for a WordPress VPS:

ScenarioDirectiveWatch PathTriggered Action
Flush cache via SFTP touch-filePathExists=/var/www/html/clear_cache.txtwp cache flush + rm file
Import SQL dump on dropPathExistsGlob=/mnt/import/*.sqlwp db import + move to archive
Reload Nginx after conf changePathChanged=/etc/nginx/conf.d/systemctl reload nginx
Media upload sync to S3PathModified=/var/www/html/wp-content/uploads/rclone sync
Image optimization queueDirectoryNotEmpty=/var/media/queue/Optimize + move to done
ClamAV scan on new uploadsPathChanged=/var/www/html/wp-content/uploads/clamscan --move=/quarantine
TLS cert reloadPathChanged=/etc/letsencrypt/live/.../fullchain.pemsystemctl reload nginx
PHP-FPM restart on php.ini editPathChanged=/etc/php/8.2/fpm/php.inisystemctl restart php8.2-fpm

Key Takeaways

  • Always apply at least Level 1 security hardening (NoNewPrivileges, PrivateTmp) to production services.
  • Use StartLimitBurst and StartLimitIntervalSec to rate-limit hot directories.
  • Pair critical path units with a timer fallback for maximum reliability.
  • Implement a dead-letter queue pattern to prevent infinite retries on failing files.
  • Use EnvironmentFile= and drop-in overrides for multi-environment setups.
  • Add OnFailure= for production monitoring and alerting.

What's Next