You need to write a deploy script. Or a dev server launcher. Or a CI cleanup step that tears down background processes after a test run. Whatever it is, your script needs to find processes, check if they’re running, stop them, start new ones, and verify they came up healthy.
Here’s how to do it reliably – and where the common approaches fall apart.
The fragile pipeline
Most people start here:
ps aux | grep myapp | grep -v grep | awk '{print $2}' | xargs kill
It looks reasonable. It works on your machine, in your terminal, the first time you try it. Then it breaks.
Why it breaks
It matches too broadly. grep myapp is a substring match against the entire line, including arguments, paths, and environment. If someone is editing /home/deploy/myapp/config.yml in vim, that process matches. If another service has --upstream=myapp.internal in its command line, that matches too.
Race conditions. Between ps listing the PID and kill executing, the process can exit on its own. On a busy system, that PID can be reassigned to a new, unrelated process. You just killed something you didn’t intend to.
Different output across platforms. ps aux output varies between macOS and Linux. Column widths shift. The COMMAND column truncates differently. If your script runs in CI on Linux and you developed it on macOS, the awk field positions might not line up.
A real example of it going wrong
A deploy script at a startup used this to kill old app instances:
ps aux | grep "node app.js" | grep -v grep | awk '{print $2}' | xargs kill -9
One day, a developer was running less /var/log/node app.js.log in a tmux session on the deploy box. The grep matched. The deploy script killed their less process – no real harm there. But it also matched a monitoring agent whose arguments included --watch "node app.js". That monitoring agent stopped reporting, and nobody noticed the deploy had actually failed until users started complaining.
Substring matching on unstructured text is not process targeting. It’s hoping for the best.
Better patterns with standard tools
pgrep and pkill
pgrep and pkill exist specifically to replace the ps | grep pattern:
# Find PIDs by process name (not substring of entire line)
pgrep myapp
# Match against the full command line when you need it
pgrep -f "node server.js"
# Exact name match only
pgrep -x myapp
# Kill by name with SIGTERM
pkill myapp
# Kill with SIGKILL
pkill -9 -f "node server.js"
pgrep matches against the process name by default, not the full command line. This avoids the “matching vim editing a config file” problem. Use -f when you need full command line matching, and -x when you need exact name matching.
In scripts, use pgrep to check if something is running:
if pgrep -f "node server.js" > /dev/null 2>&1; then
echo "Server is already running"
exit 1
fi
PID files
For processes your script starts, PID files are the most reliable tracking method:
#!/bin/bash
PIDFILE="/var/run/myapp.pid"
start_app() {
if [ -f "$PIDFILE" ] && kill -0 "$(cat "$PIDFILE")" 2>/dev/null; then
echo "Already running (PID $(cat "$PIDFILE"))"
return 1
fi
./myapp &
echo $! > "$PIDFILE"
echo "Started (PID $!)"
}
stop_app() {
if [ ! -f "$PIDFILE" ]; then
echo "No PID file found"
return 1
fi
local pid
pid=$(cat "$PIDFILE")
if kill -0 "$pid" 2>/dev/null; then
kill "$pid"
wait "$pid" 2>/dev/null
echo "Stopped (PID $pid)"
else
echo "Process $pid not running (stale PID file)"
fi
rm -f "$PIDFILE"
}
# Clean up on exit
trap 'stop_app' EXIT INT TERM
kill -0 is the key trick here: signal 0 doesn’t actually send a signal, but the kernel checks if the process exists and you have permission to signal it. It’s a safe “is this running?” check.
flock for preventing duplicate instances
If your script shouldn’t run concurrently with itself:
#!/bin/bash
LOCKFILE="/var/lock/myapp-deploy.lock"
exec 200>"$LOCKFILE"
if ! flock -n 200; then
echo "Another instance is already running"
exit 1
fi
# Rest of script runs with lock held
# Lock is released when script exits (fd 200 closes)
This is atomic. No race conditions. Two deploy scripts started simultaneously will not both proceed.
Health check loops
After starting a service, don’t just assume it’s healthy:
start_and_wait() {
./myapp &
local pid=$!
echo $pid > "$PIDFILE"
local retries=30
while [ $retries -gt 0 ]; do
if curl -sf http://localhost:3000/health > /dev/null 2>&1; then
echo "Healthy (PID $pid)"
return 0
fi
# Make sure the process hasn't crashed
if ! kill -0 "$pid" 2>/dev/null; then
echo "Process died during startup"
rm -f "$PIDFILE"
return 1
fi
retries=$((retries - 1))
sleep 1
done
echo "Timed out waiting for health check"
kill "$pid" 2>/dev/null
rm -f "$PIDFILE"
return 1
}
The kill -0 check inside the loop catches the case where the process crashes immediately. Without it, you’d wait the full 30 seconds before discovering it was dead.
wait for background process management
If your script starts multiple background processes:
#!/bin/bash
pids=()
./worker-a &
pids+=($!)
./worker-b &
pids+=($!)
./worker-c &
pids+=($!)
# Wait for all to finish, track failures
failed=0
for pid in "${pids[@]}"; do
if ! wait "$pid"; then
echo "Process $pid failed"
failed=$((failed + 1))
fi
done
if [ $failed -gt 0 ]; then
echo "$failed process(es) failed"
exit 1
fi
wait with a specific PID gives you the exit code of that process. wait without arguments waits for all children but you lose individual exit status.
Parsing process output with awk
People reach for awk because process tools produce tabular text, and awk is the natural way to slice tabular text. Here are the patterns worth knowing.
Common awk patterns
High CPU processes:
ps aux | awk '$3 > 80 {print $2, $11}'
$3 is the CPU percentage column. This prints the PID and command of anything over 80%.
High memory processes (RSS in KB):
ps aux | awk '$6 > 500000 {print $2, $6/1024"MB", $11}'
$6 is the RSS column. 500000 KB is roughly 488 MB.
All listening processes with ports (using lsof):
lsof -i -P -n | awk '/LISTEN/ {print $1, $9}'
Port and process from netstat:
netstat -tlnp 2>/dev/null | awk '/LISTEN/ {split($4,a,":"); print a[length(a)], $7}'
split($4,a,":") breaks the address field on colons. a[length(a)] gets the last element, which is the port number. This handles both 0.0.0.0:3000 and :::3000 (IPv6).
Why awk parsing breaks
These patterns work in interactive use. They become liabilities in scripts that run across environments.
Column positions shift between OS versions. macOS ps and Linux ps use the same flags but produce subtly different output. Column widths change. Extra columns appear in some configurations.
Truncated process names. ps truncates the COMMAND column based on terminal width. In a non-interactive context (like cron or CI), the terminal width might be undefined, causing truncation at 80 characters or less.
Locale differences. Number formatting can change with locale settings. A decimal separator might be . or ,. If awk is comparing $3 > 80 and the CPU percentage is 80,5, the comparison silently does the wrong thing.
lsof’s output is especially fragile. Column alignment depends on the length of values in other rows. A long username or filename shifts everything.
The underlying problem is that these tools were designed for human eyes, not for programmatic consumption.
Structured output
The real problem with text parsing
Every awk one-liner in the previous section has implicit assumptions about column positions, field separators, and output format. These assumptions hold until they don’t, and the failure mode is silent: your script extracts the wrong value and acts on it.
JSON is better for automation. The structure is explicit. Fields are named. Parsers exist in every language.
jq patterns for process automation
jq is the standard tool for working with JSON on the command line:
# Extract a single field
echo '{"pid": 1234, "name": "node"}' | jq '.pid'
# Filter an array
echo '[{"pid":1,"cpu":5},{"pid":2,"cpu":90}]' | jq '.[] | select(.cpu > 50)'
# Extract into tab-separated values for further processing
echo '[{"pid":1,"name":"a"},{"pid":2,"name":"b"}]' | jq -r '.[] | [.pid, .name] | @tsv'
The problem is that the standard Unix process tools don’t speak JSON. ps has no --json flag. lsof has -F for “field mode” output, but it’s a custom format, not JSON. netstat and ss have no structured output at all.
So you’re left building fragile text parsers, or wrapping them in scripts that construct JSON manually:
# This works, but look at it
ps aux | awk 'NR>1 {printf "{\"pid\":%s,\"cpu\":%s,\"mem\":%s,\"cmd\":\"%s\"}\n",$2,$3,$4,$11}'
That awk-to-JSON bridge is itself fragile – it doesn’t handle quotes in command names, and it still has the column position problem.
Putting it together
Here’s a real deploy script skeleton. First, the traditional version using the patterns above:
Traditional version
#!/bin/bash
set -euo pipefail
APP_NAME="myapp"
APP_PORT=3000
APP_BIN="./target/release/myapp"
PIDFILE="/var/run/${APP_NAME}.pid"
HEALTH_URL="http://localhost:${APP_PORT}/health"
TIMEOUT=30
stop_old() {
# Try PID file first
if [ -f "$PIDFILE" ]; then
local pid
pid=$(cat "$PIDFILE")
if kill -0 "$pid" 2>/dev/null; then
echo "Stopping old process (PID $pid)..."
kill "$pid"
# Wait for graceful shutdown
local waited=0
while kill -0 "$pid" 2>/dev/null && [ $waited -lt 10 ]; do
sleep 1
waited=$((waited + 1))
done
# Force kill if still running
if kill -0 "$pid" 2>/dev/null; then
echo "Graceful shutdown timed out, sending SIGKILL..."
kill -9 "$pid"
sleep 1
fi
fi
rm -f "$PIDFILE"
fi
# Also check by port in case PID file is stale
local port_pid
port_pid=$(lsof -i :${APP_PORT} -t 2>/dev/null | head -1)
if [ -n "$port_pid" ]; then
echo "Found process $port_pid still on port ${APP_PORT}, killing..."
kill "$port_pid" 2>/dev/null
sleep 2
kill -9 "$port_pid" 2>/dev/null || true
fi
}
start_new() {
echo "Starting ${APP_NAME}..."
$APP_BIN &
echo $! > "$PIDFILE"
echo "Started (PID $!)"
}
wait_healthy() {
local retries=$TIMEOUT
while [ $retries -gt 0 ]; do
if curl -sf "$HEALTH_URL" > /dev/null 2>&1; then
echo "Health check passed"
return 0
fi
local pid
pid=$(cat "$PIDFILE" 2>/dev/null)
if [ -n "$pid" ] && ! kill -0 "$pid" 2>/dev/null; then
echo "Process died during startup"
return 1
fi
retries=$((retries - 1))
sleep 1
done
echo "Health check timed out after ${TIMEOUT}s"
return 1
}
# Main
stop_old
start_new
if ! wait_healthy; then
echo "Deploy failed"
exit 1
fi
echo "Deploy complete"
This works. It handles PID files, graceful shutdown, fallback to SIGKILL, port-based detection for stale state, and health checking. But it’s ~70 lines of defensive shell scripting, and the lsof fallback is a text-parsing step that could behave differently across environments.
Cleaner version
#!/bin/bash
set -euo pipefail
APP_BIN="./target/release/myapp"
APP_PORT=3000
PIDFILE="/var/run/myapp.pid"
HEALTH_URL="http://localhost:${APP_PORT}/health"
TIMEOUT=30
stop_old() {
if [ -f "$PIDFILE" ]; then
local pid
pid=$(cat "$PIDFILE")
if kill -0 "$pid" 2>/dev/null; then
echo "Stopping PID $pid..."
kill "$pid"
tail --pid="$pid" -f /dev/null 2>/dev/null &
local tail_pid=$!
( sleep 10; kill -9 "$pid" 2>/dev/null ) &
wait "$tail_pid" 2>/dev/null || true
fi
rm -f "$PIDFILE"
fi
}
start_new() {
$APP_BIN &
echo $! > "$PIDFILE"
echo "Started PID $!"
}
wait_healthy() {
local i=0
while [ $i -lt $TIMEOUT ]; do
if curl -sf "$HEALTH_URL" > /dev/null 2>&1; then
return 0
fi
if ! kill -0 "$(cat "$PIDFILE")" 2>/dev/null; then
return 1
fi
i=$((i + 1))
sleep 1
done
return 1
}
stop_old
start_new
wait_healthy || { echo "Deploy failed"; exit 1; }
echo "Deploy complete"
Shorter, but still a meaningful amount of shell for what is fundamentally: stop old process, start new one, check it’s healthy.
proc’s JSON mode
If you have proc installed, the process inspection parts get simpler – and structured.
Check what’s on a port and get JSON back:
proc on :3000 --json | jq '.process.pid'
Find high-CPU processes without awk column gymnastics:
proc list --json | jq '.processes[] | select(.cpu_percent > 50) | {pid, name, cpu_percent}'
Check if a specific process is running by name:
if proc by myapp --json | jq -e '.count > 0' > /dev/null 2>&1; then
echo "myapp is running"
fi
Kill what’s on a port in a CI cleanup step:
proc kill :3000,:8080,:5432 --yes 2>/dev/null || true
The --json flag gives you named fields instead of positional columns. No awk, no column counting, no cross-platform differences in output format. And destructive commands like kill and stop support --yes for non-interactive use and --dry-run for testing.
Install
brew install yazeed/proc/proc # macOS
cargo install proc-cli # Rust
npm install -g proc-cli # npm/bun
See the GitHub repo for all installation options.