Skip to content
SP StackPractices
intermediate By StackPractices

Parallel Job Execution with Bash

Run shell commands and scripts in parallel safely using xargs, parallel, or background jobs with concurrency control.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Modern servers have multiple cores, but a naive shell script runs one command at a time. Parallel job execution lets you process many files, URLs, or tasks at once, cutting runtime from hours to minutes. Bash gives you several tools for this: background jobs, xargs -P, and GNU parallel. Each option balances control, portability, and ease of use.

When to Use

Use this resource when:

  • You need to process many files or records in a batch.
  • A sequential loop is too slow for your workflow.
  • You want to control the maximum number of concurrent jobs.
  • You need to collect exit codes from every child process.

Solution

Bash parallel job execution

#!/usr/bin/env bash
set -euo pipefail

MAX_JOBS="${1:-4}"
INPUT_FILE="${2:-jobs.txt}"

# Option 1: xargs with parallel workers
process_task() {
    local task="$1"
    echo "Processing $task"
    sleep "$((RANDOM % 3 + 1))"
    echo "Done $task"
}
export -f process_task

cat "$INPUT_FILE" | xargs -P "$MAX_JOBS" -I {} bash -c 'process_task "{}"'

# Option 2: GNU parallel
# parallel -j "$MAX_JOBS" process_task {} < "$INPUT_FILE"

# Option 3: Background jobs with a semaphore
SEMAPHORE=0
while IFS= read -r task; do
    if [[ $SEMAPHORE -ge $MAX_JOBS ]]; then
        wait -n
        SEMAPHORE=$((SEMAPHORE - 1))
    fi
    process_task "$task" &
    SEMAPHORE=$((SEMAPHORE + 1))
done < "$INPUT_FILE"
wait

Explanation

The script shows three common approaches. xargs -P is portable and available on most systems, but less flexible than GNU parallel. GNU parallel offers better output handling, resuming, and progress display. The background-jobs approach uses wait -n to keep a maximum number of concurrent jobs without external tools. export -f makes the Bash function visible to subprocesses when using xargs -I {} bash -c.

Variants

ApproachToolProsCons
xargscoreutilsPortable, simpleLimited control, messy output
GNU parallelparallelPowerful, resumable, ordered outputExtra dependency
Background jobsbash builtinNo external depsManual bookkeeping, race-prone

Best Practices

  1. Cap concurrency to a tested limit. Too many jobs exhaust CPU, memory, or file descriptors.
  2. Make jobs idempotent. A retried job should produce the same result without side effects.
  3. Capture and aggregate exit codes. A single failed job should not silently hide among successful ones.
  4. Use a temporary directory per job. This prevents file collisions and makes cleanup easy.
  5. Log with the job identifier. Prefix output with the task name so you can trace failures.

Common Mistakes

  1. Unbounded parallelism. Launching every task in the background at once can crash the shell.
  2. Losing exit codes. xargs returns the last exit code by default; use -P with -t or GNU parallel to track each job.
  3. Ignoring shell quoting. Filenames with spaces break xargs unless you use -0 or -d.
  4. Writing to the same output file. Concurrent writes interleave output; use one file per job or lock the file.
  5. No timeout. A stuck job can block the whole batch; add timeout to each command.

Frequently Asked Questions

Q: What is the difference between xargs and GNU parallel? A: xargs is a coreutils tool with limited parallelism features. GNU parallel is designed for concurrency, offering better output ordering, resumability, and progress bars.

Q: How do I handle tasks with spaces in names? A: Use xargs -0 with find -print0 or GNU parallel with quoted arguments. Never pass unquoted filenames to shell commands.

Q: How do I limit memory usage? A: Reduce MAX_JOBS and run each job under systemd-run or ulimit to cap memory per process.