Skip to content
SP StackPractices
intermediate By StackPractices

API Performance Budget Template

A template for setting and tracking API latency and throughput performance budgets.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Every API endpoint consumes resources: CPU, memory, database connections, and network bandwidth. Without explicit budgets, teams add features until the system collapses under load. This template defines performance budgets as a contract between product requirements and infrastructure capacity, making trade-offs explicit before they become outages.

When to Use

Use this resource when:

  • Designing a new API or adding a new endpoint to an existing one
  • Planning capacity for a marketing campaign or seasonal traffic spike
  • Investigating why an API is slow under load

Solution

# API Performance Budget: `<API Name>`

## 1. Endpoint Budgets

### 1.1. Latency Budget (p95)

| Endpoint | Target | Current | Buffer | Notes |
|----------|--------|---------|--------|-------|
| `GET /health` | < 10ms | 8ms | 2ms | Must not depend on DB |
| `GET /users/{id}` | < 100ms | 85ms | 15ms | Single DB query allowed |
| `POST /orders` | < 250ms | 210ms | 40ms | Multi-table transaction |
| `GET /reports/aggregate` | < 2s | 1.8s | 200ms | Read replica allowed |

### 1.2. Throughput Budget

| Endpoint | RPS Target | Max Burst | Connection Pool | Notes |
|----------|------------|-----------|-----------------|-------|
| `GET /products` | 5,000 | 10,000 | 50 | Cached in Redis |
| `POST /checkout` | 500 | 1,000 | 20 | DB write, idempotent |

### 1.3. Payload Budget

| Direction | Max Size | Typical | Compression |
|-----------|----------|---------|-------------|
| Request body | 1 MB | 10 KB | Gzip |
| Response body | 5 MB | 50 KB | Gzip + Brotli |
| Header total | 16 KB | 1 KB | None |

### 1.4. Dependency Budget

| Dependency | Max Latency Contribution | Max Concurrent Calls | Fallback |
|------------|--------------------------|----------------------|----------|
| PostgreSQL | 40ms | 20 | Read replica |
| Redis | 5ms | 100 | Skip cache (slower) |
| Payment Gateway | 200ms | 10 | Queue for async |
| Search Service | 100ms | 50 | Basic LIKE query |

## 2. Resource Budgets

| Resource | Baseline | Headroom | Max | Alert At |
|----------|----------|----------|-----|----------|
| CPU / core | 40% | 30% | 70% | 60% |
| Memory / pod | 512 MB | 256 MB | 1 GB | 800 MB |
| DB connections | 50 | 30 | 80 | 70 |
| Network egress | 100 Mbps | 50 Mbps | 200 Mbps | 150 Mbps |

## 3. Budget Enforcement

- [ ] Latency assertions in CI (fail build if p95 exceeds budget)
- [ ] Load test gates in deployment pipeline
- [ ] Payload size validation at API gateway
- [ ] Circuit breaker on dependency calls exceeding max latency
- [ ] Autoscaling triggered at 60% CPU, not 80%

## 4. Review Cadence

| Trigger | Action | Owner |
|---------|--------|-------|
| Weekly | Review dashboards for drift above budget | SRE |
| Monthly | Adjust budgets based on traffic growth | Platform |
| Quarterly | Re-negotiate budgets with product team | Engineering Lead |
| Incident | Post-incident budget audit | Incident Commander |

Explanation

A latency budget is a chain: frontend (100ms) + network (50ms) + API (200ms) + database (40ms) = 390ms total. If any single link exceeds its allocation, the whole user experience degrades. The buffer column exists because real-world performance is noisy. Dependency budgets prevent a slow downstream service from consuming the entire API budget. Throughput budgets define connection pools and cache sizes. Resource budgets prevent one heavy endpoint from starving others on the same host.

Variants

ContextApproachNotes
GraphQLQuery complexity budgetLimit total field cost per query, not just depth
Streaming APIThroughput per shardBudget per partition, not global
Mobile SDKBundle size + call frequencyBudget includes payload size and number of parallel requests

Best Practices

  1. Set budgets during design, not after launch
  2. Include budgets in API design review checklists
  3. Fail CI builds when latency tests exceed budget by any margin
  4. Reserve 20-30% buffer in every budget for traffic spikes
  5. Document who can approve budget exceptions (usually engineering lead)

Common Mistakes

  1. Basing budgets on best-case lab conditions instead of production percentiles
  2. Ignoring dependency latency when calculating endpoint budgets
  3. Setting one global budget for all endpoints regardless of criticality
  4. Treating budgets as one-time decisions instead of living documents
  5. Failing to enforce budgets in CI or deployment gates

Frequently Asked Questions

What happens when an endpoint exceeds its budget?

First, optimize: add caching, denormalize queries, or paginate. If the endpoint fundamentally needs more time, escalate to the engineering lead who can either approve a budget increase or downgrade the SLO. Never silently ignore budget violations.

How do I budget for endpoints with unknown traffic patterns?

Start with conservative estimates based on similar endpoints. Monitor for 2 weeks after launch, then adjust. Use feature flags to ramp traffic gradually so you can observe real performance before committing to a budget.

Should budgets apply to batch / async endpoints?

Yes, but use different metrics. For async, budget on queue depth, processing rate (items/second), and end-to-end latency from enqueue to completion. Throughput matters more than response time for batch jobs.