Rate Limiting | StackPractices

Q: What HTTP status code should I return when rate limited?

429 Too Many Requests. Include a Retry-After header with the number of seconds to wait.

Overview

Rate limiting controls how many requests a client can make to your API in a given time window. It prevents abuse, ensures fair resource allocation, and protects downstream services from overload.

Common algorithms include fixed window, sliding window, and token bucket. Redis is often used as the shared counter store in distributed systems.

When to Use

Use this recipe when:

Protecting public APIs from abuse or DDoS
Enforcing tiered usage limits (free vs paid plans)
Preventing brute-force attacks on authentication endpoints
Managing capacity for resource-intensive operations
Implementing fair-use policies across users

Solution

Python (Token Bucket)

import time
from threading import Lock

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
        self.lock = Lock()

    def allow(self) -> bool:
        with self.lock:
            now = time.time()
            elapsed = now - self.last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False

bucket = TokenBucket(capacity=10, refill_rate=1)
print(bucket.allow())  # True

JavaScript (Fixed Window with Redis)

const redis = require('redis');
const client = redis.createClient();

async function rateLimit(key, limit, windowSeconds) {
  const windowKey = `${key}:${Math.floor(Date.now() / 1000 / windowSeconds)}`;
  const current = await client.incr(windowKey);
  if (current === 1) {
    await client.expire(windowKey, windowSeconds);
  }
  return current <= limit;
}

// Usage in Express middleware
async function limiter(req, res, next) {
  const key = `ratelimit:${req.ip}`;
  const allowed = await rateLimit(key, 100, 60);
  if (!allowed) return res.status(429).json({ error: 'Too many requests' });
  next();
}

Java (Sliding Window)

import java.util.concurrent.*;

public class SlidingWindow {
    private final int capacity;
    private final long windowMs;
    private final ConcurrentLinkedDeque<Long> timestamps = new ConcurrentLinkedDeque<>();

    public SlidingWindow(int capacity, long windowMs) {
        this.capacity = capacity;
        this.windowMs = windowMs;
    }

    public synchronized boolean allow() {
        long now = System.currentTimeMillis();
        while (!timestamps.isEmpty() && now - timestamps.peekFirst() > windowMs) {
            timestamps.pollFirst();
        }
        if (timestamps.size() < capacity) {
            timestamps.addLast(now);
            return true;
        }
        return false;
    }
}

Algorithm Comparison

Algorithm	Pros	Cons	Best For
Fixed Window	Simple, low memory	Burst at window boundary	Basic protection
Sliding Window	Smooth rate, no bursts	Higher memory/compute	Precise rate control
Token Bucket	Allows bursts up to capacity	Complex to implement correctly	APIs with burst tolerance
Leaky Bucket	Strict constant output rate	Can drop requests	Downstream protection

Best Practices

Return 429 status with Retry-After header when rate limited
Use Redis for distributed rate limiting across multiple servers
Differentiate by client: Use API key or user ID, not just IP
Set higher limits for authenticated users than anonymous traffic
Log rate limit events for security monitoring and abuse detection
Gradual backoff: Inform clients when they can retry instead of hard blocks

Common Mistakes

Rate limiting by IP only, punishing shared NAT users
Not handling Redis failures gracefully (fail open vs fail closed)
Using in-memory counters in multi-instance deployments
Setting limits too aggressively, blocking legitimate users
Not documenting rate limits in API documentation

Frequently Asked Questions

Q: Should I rate limit at the edge or in the application? A: Both. Use edge/CDN (Cloudflare, AWS WAF) for DDoS protection and application-level limits for business logic.

Q: What HTTP status code should I return when rate limited? A: 429 Too Many Requests. Include a Retry-After header with the number of seconds to wait.

Q: How do I rate limit without Redis in a distributed system? A: Use sticky sessions (not ideal), or implement a centralized counter with your existing database (slower but functional).