Production Patterns
By the end of this lesson you will be able to deploy path units with full security sandboxing, rate limiting, timer fallback for reliability, structured logging, and monitoring integration — ready for production workloads.
Hardened Production Template
Use this as a baseline for any production path-triggered job. Adjust User=, ReadWritePaths=, and ExecStart= for each specific job.
The Path Unit
[Unit]
Description=Watch for incoming files (hardened)
Documentation=man:systemd.path(5)
After=local-fs.target
[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
DirectoryMode=0775
[Install]
WantedBy=paths.target
The Service Unit
[Unit]
Description=Safe path-triggered job
After=network-online.target
Wants=network-online.target
# Rate limiting: max 10 starts in 60 seconds
StartLimitBurst=10
StartLimitIntervalSec=60
[Service]
Type=oneshot
User=www-data
Group=www-data
WorkingDirectory=/var/www/html
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/local/bin/safe-process.sh
# Kill runaway jobs after 30 minutes
RuntimeMaxSec=30m
# Log to both journald and file
StandardOutput=append:/var/log/safe-path-job.log
StandardError=append:/var/log/safe-path-job.log
# ── Security hardening ──
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop /var/lock /mnt/backups
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
Security Hardening Deep Dive
Why Harden Path-Triggered Services?
A path-triggered service often processes untrusted input — files uploaded by users, CI pipelines, or external systems. If a malicious file exploits a vulnerability in your processing script, security hardening limits the blast radius.
Security Directives Explained
| Directive | What It Does | Impact |
|---|---|---|
NoNewPrivileges=true | Prevents the process from gaining new privileges (no setuid/setgid) | Blocks privilege escalation |
PrivateTmp=true | Gives the service its own /tmp namespace | Prevents tmp-based attacks |
ProtectSystem=strict | Makes /usr, /boot, /efi, and most system directories read-only | Blocks system file modification |
ProtectHome=read-only | Makes home directories read-only | Protects user data |
ReadWritePaths= | Whitelist specific writable paths (required with ProtectSystem=strict) | Explicit write access |
PrivateDevices=true | Hides physical devices from the service | Blocks device access |
ProtectKernelTunables=true | Makes /proc/sys, /sys read-only | Blocks kernel parameter changes |
ProtectControlGroups=true | Makes cgroup filesystem read-only | Blocks cgroup manipulation |
MemoryDenyWriteExecute=true | Prevents creating writable+executable memory | Blocks code injection |
RestrictRealtime=true | Prevents real-time scheduling | Blocks priority manipulation |
Graduated Hardening Levels
Start with Level 1 and increase as you gain confidence:
Level 1 — Basic (All Services Should Have This)
[Service]
NoNewPrivileges=true
PrivateTmp=true
Level 2 — Standard (Recommended for Production)
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop
Level 3 — Strict (For Security-Sensitive Workloads)
[Service]
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/log /var/www/drop
PrivateDevices=true
ProtectKernelTunables=true
ProtectControlGroups=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
RestrictSUIDSGID=true
Auditing Your Security Posture
Use systemd-analyze security to score your service:
systemd-analyze security safe-path-job.service
NAME DESCRIPTION EXPOSURE
✓ NoNewPrivileges= Service cannot gain new privileges ...
✓ PrivateTmp= Service has private /tmp ...
✓ ProtectSystem= Service has strict system protection ...
...
→ Overall exposure level for safe-path-job.service: 2.1 OK
A score below 3.0 is considered good. Below 1.0 is excellent.
Rate Limiting
Why Rate Limit?
If a hot directory receives hundreds of files per second (e.g., a busy upload queue), the path unit would trigger the service hundreds of times per second. Rate limiting prevents resource exhaustion.
Service-Level Rate Limiting
Applied in the [Unit] section of the .service file:
[Unit]
Description=Rate-limited file processor
# Allow at most 5 starts within any 30-second window
StartLimitBurst=5
StartLimitIntervalSec=30
[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
# Wait 5 seconds before retry on failure
Restart=on-failure
RestartSec=5
StartLimitBurst=5 → Allow at most 5 starts
StartLimitIntervalSec=30 → Within any 30-second window
Result: If more than 5 files arrive in 30s, excess triggers are throttled.
After 30s, the counter resets.
Path-Level Rate Limiting (systemd 250+)
On systemd 250+, you can also rate-limit at the path unit level:
[Path]
DirectoryNotEmpty=/var/www/drop
MakeDirectory=yes
# Limit triggers to 10 per 5 seconds
TriggerLimitBurst=10
TriggerLimitIntervalSec=5s
What Happens When Rate Limit Is Hit
When the rate limit is exceeded:
- systemd stops starting the service.
- The unit enters a failed state.
- The journal logs:
start request repeated too quickly for service. - You must manually reset:
sudo systemctl reset-failed mytask.service.
To prevent this disruption, tune your limits based on expected traffic.
Concurrency Safety
systemd provides built-in concurrency safety for Type=oneshot services:
| Behavior | Description |
|---|---|
| No parallel runs | systemd will not start a second instance of a Type=oneshot service while the first is running |
| Event queuing | If events arrive during execution, they are queued |
| Re-trigger on exit | After the service exits, the path unit re-checks the condition and re-triggers if needed |
This means your script does not need file locking or PID file management — systemd handles it.
Timer Fallback Pattern
For critical workflows, pair a .path unit (instant reaction) with a .timer unit (periodic fallback). This provides two layers of reliability:
- Primary: The
.pathunit triggers instantly on filesystem events. - Fallback: The
.timercatches anything missed due to inotify race conditions.
Both trigger the same .service unit.
[Unit]
Description=Watch for incoming uploads (instant trigger)
[Path]
DirectoryNotEmpty=/var/www/uploads/queue
MakeDirectory=yes
[Install]
WantedBy=paths.target
[Unit]
Description=Fallback: process uploads every 15 minutes
[Timer]
OnCalendar=*:0/15
Persistent=true
Unit=process-uploads.service
[Install]
WantedBy=timers.target
[Unit]
Description=Process uploaded files
[Service]
Type=oneshot
User=www-data
ExecStart=/usr/local/bin/process-uploads.sh
sudo systemctl daemon-reload
sudo systemctl enable --now process-uploads.path
sudo systemctl enable --now process-uploads-fallback.timer
- File processing pipelines where missing a file would cause business impact.
- Environments with high inotify watch counts where events might be dropped.
- Multi-server setups where NFS or network filesystems may not generate reliable inotify events.
Structured Logging Pattern
For production services, use structured log output that's easy to parse and monitor:
#!/usr/bin/env bash
set -euo pipefail
LOG_PREFIX="[$(date -Is)] [process-uploads]"
log_info() { echo "$LOG_PREFIX [INFO] $*"; }
log_error() { echo "$LOG_PREFIX [ERROR] $*" >&2; }
log_warn() { echo "$LOG_PREFIX [WARN] $*"; }
FILE=$(ls /var/www/uploads/queue/ 2>/dev/null | head -1)
if [ -z "$FILE" ]; then
log_info "No files to process"
exit 0
fi
log_info "Processing: $FILE"
if /usr/local/bin/process "$FILE"; then
mv "/var/www/uploads/queue/$FILE" /var/www/uploads/done/
log_info "Success: $FILE moved to done/"
else
mv "/var/www/uploads/queue/$FILE" /var/www/uploads/failed/
log_error "Failed: $FILE moved to failed/"
exit 1
fi
Log Rotation
For file-based logs, set up logrotate:
/var/log/safe-path-job.log
/var/log/csv-import.log
/var/log/image-optimize.log
{
daily
rotate 14
compress
missingok
notifempty
create 0640 www-data www-data
}
Error Handling and Recovery
Retry on Failure
[Unit]
Description=Retrying file processor
StartLimitBurst=5
StartLimitIntervalSec=300
[Service]
Type=oneshot
ExecStart=/usr/local/bin/process.sh
Restart=on-failure
RestartSec=10
Dead Letter Queue Pattern
When a file fails processing, move it to a "failed" folder instead of leaving it in the queue (which would cause infinite retries):
#!/usr/bin/env bash
set -euo pipefail
QUEUE="/var/www/drop"
DONE="/var/www/done"
FAILED="/var/www/failed"
mkdir -p "$DONE" "$FAILED"
FILE=$(ls "$QUEUE/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0
if timeout 300 /usr/local/bin/process "$QUEUE/$FILE"; then
mv "$QUEUE/$FILE" "$DONE/"
echo "[$(date -Is)] OK: $FILE"
else
mv "$QUEUE/$FILE" "$FAILED/"
echo "[$(date -Is)] FAIL: $FILE → moved to $FAILED/" >&2
fi
Multi-Environment Configuration
Using Environment Files
Keep environment-specific settings outside the unit file:
[Service]
Type=oneshot
User=www-data
EnvironmentFile=/etc/default/process-uploads
ExecStart=/usr/local/bin/process-uploads.sh
QUEUE_DIR=/var/www/uploads/queue
DONE_DIR=/var/www/uploads/done
FAILED_DIR=/var/www/uploads/failed
LOG_LEVEL=INFO
S3_BUCKET=my-bucket
#!/usr/bin/env bash
set -euo pipefail
# Variables come from EnvironmentFile
FILE=$(ls "$QUEUE_DIR/" 2>/dev/null | head -1)
[ -z "$FILE" ] && exit 0
echo "[$(date -Is)] Processing: $FILE (bucket: $S3_BUCKET)"
# ...
Per-Environment Overrides
Use drop-in overrides for environment differences:
sudo mkdir -p /etc/systemd/system/process-uploads.service.d/
sudo tee /etc/systemd/system/process-uploads.service.d/staging.conf > /dev/null <<'EOF'
[Service]
# On staging, use a different environment file
EnvironmentFile=
EnvironmentFile=/etc/default/process-uploads-staging
# Relax the runtime limit
RuntimeMaxSec=1h
EOF
sudo systemctl daemon-reload
Monitoring Integration
Health Check Script
#!/usr/bin/env bash
set -euo pipefail
UNITS=(process-drop csv-import deploy-trigger image-queue)
EXIT_CODE=0
for name in "${UNITS[@]}"; do
path_status=$(systemctl is-active "${name}.path" 2>/dev/null || echo "not-found")
if [ "$path_status" != "active" ]; then
echo "CRITICAL: ${name}.path is $path_status"
EXIT_CODE=2
else
echo "OK: ${name}.path is active"
fi
done
exit $EXIT_CODE
Notification on Failure
Add an OnFailure= directive to send alerts when a path-triggered service fails:
[Unit]
Description=Process drop folder
OnFailure=alert-failure@%n.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/process-drop.sh
[Unit]
Description=Send alert for %i failure
[Service]
Type=oneshot
ExecStart=/usr/local/bin/alert-failure.sh %i
#!/usr/bin/env bash
UNIT="$1"
MSG="[$(date -Is)] ALERT: $UNIT failed on $(hostname)"
echo "$MSG"
# curl -s -X POST "https://hooks.slack.com/..." -d "{\"text\": \"$MSG\"}"
WordPress VPS Production Reference
Common path unit patterns for a WordPress VPS:
| Scenario | Directive | Watch Path | Triggered Action |
|---|---|---|---|
| Flush cache via SFTP touch-file | PathExists= | /var/www/html/clear_cache.txt | wp cache flush + rm file |
| Import SQL dump on drop | PathExistsGlob= | /mnt/import/*.sql | wp db import + move to archive |
| Reload Nginx after conf change | PathChanged= | /etc/nginx/conf.d/ | systemctl reload nginx |
| Media upload sync to S3 | PathModified= | /var/www/html/wp-content/uploads/ | rclone sync |
| Image optimization queue | DirectoryNotEmpty= | /var/media/queue/ | Optimize + move to done |
| ClamAV scan on new uploads | PathChanged= | /var/www/html/wp-content/uploads/ | clamscan --move=/quarantine |
| TLS cert reload | PathChanged= | /etc/letsencrypt/live/.../fullchain.pem | systemctl reload nginx |
| PHP-FPM restart on php.ini edit | PathChanged= | /etc/php/8.2/fpm/php.ini | systemctl restart php8.2-fpm |
Key Takeaways
- Always apply at least Level 1 security hardening (
NoNewPrivileges,PrivateTmp) to production services. - Use
StartLimitBurstandStartLimitIntervalSecto rate-limit hot directories. - Pair critical path units with a timer fallback for maximum reliability.
- Implement a dead-letter queue pattern to prevent infinite retries on failing files.
- Use
EnvironmentFile=and drop-in overrides for multi-environment setups. - Add
OnFailure=for production monitoring and alerting.
What's Next
- Debugging and Troubleshooting — step-by-step debugging workflow and common failure patterns.