Skip to content
SP StackPractices
intermediate

Rate Limiting

How to implement API rate limiting using token bucket, sliding window, and fixed window algorithms across Python, JavaScript, and Java.

Topics: api

Overview

Rate limiting controls how many requests a client can make to your API in a given time window. It prevents abuse, ensures fair resource allocation, and protects downstream services from overload.

Common algorithms include fixed window, sliding window, and token bucket. Redis is often used as the shared counter store in distributed systems.

When to Use

Use this recipe when:

  • Protecting public APIs from abuse or DDoS
  • Enforcing tiered usage limits (free vs paid plans)
  • Preventing brute-force attacks on authentication endpoints
  • Managing capacity for resource-intensive operations
  • Implementing fair-use policies across users

Solution

Python (Token Bucket)

import time
from threading import Lock

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
        self.lock = Lock()

    def allow(self) -> bool:
        with self.lock:
            now = time.time()
            elapsed = now - self.last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False

bucket = TokenBucket(capacity=10, refill_rate=1)
print(bucket.allow())  # True

JavaScript (Fixed Window with Redis)

const redis = require('redis');
const client = redis.createClient();

async function rateLimit(key, limit, windowSeconds) {
  const windowKey = `${key}:${Math.floor(Date.now() / 1000 / windowSeconds)}`;
  const current = await client.incr(windowKey);
  if (current === 1) {
    await client.expire(windowKey, windowSeconds);
  }
  return current <= limit;
}

// Usage in Express middleware
async function limiter(req, res, next) {
  const key = `ratelimit:${req.ip}`;
  const allowed = await rateLimit(key, 100, 60);
  if (!allowed) return res.status(429).json({ error: 'Too many requests' });
  next();
}

Java (Sliding Window)

import java.util.concurrent.*;

public class SlidingWindow {
    private final int capacity;
    private final long windowMs;
    private final ConcurrentLinkedDeque<Long> timestamps = new ConcurrentLinkedDeque<>();

    public SlidingWindow(int capacity, long windowMs) {
        this.capacity = capacity;
        this.windowMs = windowMs;
    }

    public synchronized boolean allow() {
        long now = System.currentTimeMillis();
        while (!timestamps.isEmpty() && now - timestamps.peekFirst() > windowMs) {
            timestamps.pollFirst();
        }
        if (timestamps.size() < capacity) {
            timestamps.addLast(now);
            return true;
        }
        return false;
    }
}

Algorithm Comparison

AlgorithmProsConsBest For
Fixed WindowSimple, low memoryBurst at window boundaryBasic protection
Sliding WindowSmooth rate, no burstsHigher memory/computePrecise rate control
Token BucketAllows bursts up to capacityComplex to implement correctlyAPIs with burst tolerance
Leaky BucketStrict constant output rateCan drop requestsDownstream protection

Best Practices

  • Return 429 status with Retry-After header when rate limited
  • Use Redis for distributed rate limiting across multiple servers
  • Differentiate by client: Use API key or user ID, not just IP
  • Set higher limits for authenticated users than anonymous traffic
  • Log rate limit events for security monitoring and abuse detection
  • Gradual backoff: Inform clients when they can retry instead of hard blocks

Common Mistakes

  • Rate limiting by IP only, punishing shared NAT users
  • Not handling Redis failures gracefully (fail open vs fail closed)
  • Using in-memory counters in multi-instance deployments
  • Setting limits too aggressively, blocking legitimate users
  • Not documenting rate limits in API documentation

Frequently Asked Questions

Q: Should I rate limit at the edge or in the application? A: Both. Use edge/CDN (Cloudflare, AWS WAF) for DDoS protection and application-level limits for business logic.

Q: What HTTP status code should I return when rate limited? A: 429 Too Many Requests. Include a Retry-After header with the number of seconds to wait.

Q: How do I rate limit without Redis in a distributed system? A: Use sticky sessions (not ideal), or implement a centralized counter with your existing database (slower but functional).