Skip to content
SP StackPractices
intermediate

Retry Pattern

Retry an operation that has failed with transient errors, using configurable strategies like fixed delay, exponential backoff, or circuit breaker integration.

Topics: design

Retry Pattern

Overview

The Retry Pattern is a resilience pattern that handles transient failures by retrying a failed operation. Transient failures are typically caused by temporary conditions such as network congestion, temporary service unavailability, or timeouts. The pattern uses configurable strategies — fixed delay, linear, or exponential backoff — to avoid overwhelming the target system.

When to Use

Use the Retry Pattern when:

  • Errors are transient and likely to resolve on retry (network timeouts, 503 Service Unavailable)
  • The operation is idempotent or can be safely repeated
  • You want to improve perceived reliability without user intervention
  • You need configurable backoff to avoid thundering herd problems
  • Combine with Circuit Breaker to avoid retrying when a service is clearly down

Solution

Python

import time
from functools import wraps

def retry(max_attempts=3, delay=1, backoff=2, exceptions=(Exception,)):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            attempt = 1
            current_delay = delay
            while attempt <= max_attempts:
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    if attempt == max_attempts:
                        raise
                    print(f"Attempt {attempt} failed: {e}. Retrying in {current_delay}s...")
                    time.sleep(current_delay)
                    current_delay *= backoff
                    attempt += 1
            return None
        return wrapper
    return decorator

@retry(max_attempts=3, delay=1, backoff=2, exceptions=(ConnectionError,))
def fetch_data(url: str):
    import random
    if random.random() < 0.7:
        raise ConnectionError("Network error")
    return f"Data from {url}"

# Usage
try:
    result = fetch_data("https://api.example.com")
    print(result)
except ConnectionError:
    print("All retry attempts exhausted")

JavaScript

function retry(fn, { maxAttempts = 3, delay = 1000, backoff = 2, exceptions = [Error] } = {}) {
  return async function(...args) {
    let attempt = 1;
    let currentDelay = delay;

    while (attempt <= maxAttempts) {
      try {
        return await fn(...args);
      } catch (e) {
        const isRetryable = exceptions.some(ex => e instanceof ex);
        if (!isRetryable || attempt === maxAttempts) throw e;

        console.log(`Attempt ${attempt} failed: ${e.message}. Retrying in ${currentDelay}ms...`);
        await new Promise(r => setTimeout(r, currentDelay));
        currentDelay *= backoff;
        attempt++;
      }
    }
  };
}

async function fetchData(url) {
  if (Math.random() < 0.7) throw new Error("Network error");
  return `Data from ${url}`;
}

const retryFetch = retry(fetchData, { maxAttempts: 3, delay: 1000, backoff: 2 });

// Usage
retryFetch("https://api.example.com")
  .then(console.log)
  .catch(e => console.log("All retry attempts exhausted:", e.message));

Java

import java.util.function.Supplier;

public class Retry {
    public static <T> T execute(Supplier<T> action, int maxAttempts, long delayMs, double backoff) {
        int attempt = 1;
        long currentDelay = delayMs;

        while (attempt <= maxAttempts) {
            try {
                return action.get();
            } catch (Exception e) {
                if (attempt == maxAttempts) throw new RuntimeException("All retries exhausted", e);
                System.out.println("Attempt " + attempt + " failed. Retrying in " + currentDelay + "ms...");
                try {
                    Thread.sleep(currentDelay);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException("Interrupted during retry delay", ie);
                }
                currentDelay = (long)(currentDelay * backoff);
                attempt++;
            }
        }
        throw new IllegalStateException("Unreachable");
    }
}

// Usage
String result = Retry.execute(() -> {
    if (Math.random() < 0.7) throw new RuntimeException("Network error");
    return "Data fetched";
}, 3, 1000, 2.0);

Explanation

The Retry Pattern has three configurable dimensions:

  • Max Attempts: How many times to try before giving up (including the initial attempt)
  • Delay: The initial wait time between retries
  • Backoff Strategy: How the delay grows:
    • Fixed: Same delay every time
    • Linear: Delay increases by a fixed amount
    • Exponential: Delay doubles (or multiplies) each time — best for most scenarios
  • Exception Filter: Which exceptions are considered retryable

Variants

VariantDescriptionUse Case
Fixed DelayConstant wait between retriesPredictable load on target
Exponential BackoffDelay doubles each retryAvoids overwhelming recovering services
JitterAdds randomness to backoffPrevents thundering herd after recovery
Circuit Breaker + RetrySkip retries when breaker is openPrevents wasted retry attempts

Best Practices

  • Make operations idempotent before applying retries — retries can cause duplicate side effects
  • Use exponential backoff with jitter for distributed systems to avoid synchronized retries
  • Set a max total duration (deadline) in addition to max attempts
  • Log every retry attempt with context for debugging
  • Combine with Circuit Breaker — don’t retry when the target is clearly down

Common Mistakes

  • Retrying non-idempotent operations without deduplication mechanisms
  • Using linear or no backoff, overwhelming a recovering service
  • Not setting a max retry limit, causing infinite loops
  • Retrying on non-transient errors (e.g., 400 Bad Request, authentication failures)
  • Ignoring retry storms — many clients retrying simultaneously after a brief outage

Frequently Asked Questions

Q: What is the difference between Retry and Circuit Breaker? A: Retry handles individual transient failures. Circuit Breaker prevents cascading failures by stopping requests to a failing service. They work best together: Retry handles temporary blips, Circuit Breaker handles prolonged outages.

Q: Should I retry 500 Internal Server Errors? A: It depends. 500 may indicate a transient server issue worth retrying, but 502/503/504 are more clearly transient. Never retry 4xx client errors (400, 401, 403, 404) without fixing the request first.