API Performance Budget Template
A template for setting and tracking API latency and throughput performance budgets.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Every API endpoint consumes resources: CPU, memory, database connections, and network bandwidth. Without explicit budgets, teams add features until the system collapses under load. This template defines performance budgets as a contract between product requirements and infrastructure capacity, making trade-offs explicit before they become outages.
When to Use
Use this resource when:
- Designing a new API or adding a new endpoint to an existing one
- Planning capacity for a marketing campaign or seasonal traffic spike
- Investigating why an API is slow under load
Solution
# API Performance Budget: `<API Name>`
## 1. Endpoint Budgets
### 1.1. Latency Budget (p95)
| Endpoint | Target | Current | Buffer | Notes |
|----------|--------|---------|--------|-------|
| `GET /health` | < 10ms | 8ms | 2ms | Must not depend on DB |
| `GET /users/{id}` | < 100ms | 85ms | 15ms | Single DB query allowed |
| `POST /orders` | < 250ms | 210ms | 40ms | Multi-table transaction |
| `GET /reports/aggregate` | < 2s | 1.8s | 200ms | Read replica allowed |
### 1.2. Throughput Budget
| Endpoint | RPS Target | Max Burst | Connection Pool | Notes |
|----------|------------|-----------|-----------------|-------|
| `GET /products` | 5,000 | 10,000 | 50 | Cached in Redis |
| `POST /checkout` | 500 | 1,000 | 20 | DB write, idempotent |
### 1.3. Payload Budget
| Direction | Max Size | Typical | Compression |
|-----------|----------|---------|-------------|
| Request body | 1 MB | 10 KB | Gzip |
| Response body | 5 MB | 50 KB | Gzip + Brotli |
| Header total | 16 KB | 1 KB | None |
### 1.4. Dependency Budget
| Dependency | Max Latency Contribution | Max Concurrent Calls | Fallback |
|------------|--------------------------|----------------------|----------|
| PostgreSQL | 40ms | 20 | Read replica |
| Redis | 5ms | 100 | Skip cache (slower) |
| Payment Gateway | 200ms | 10 | Queue for async |
| Search Service | 100ms | 50 | Basic LIKE query |
## 2. Resource Budgets
| Resource | Baseline | Headroom | Max | Alert At |
|----------|----------|----------|-----|----------|
| CPU / core | 40% | 30% | 70% | 60% |
| Memory / pod | 512 MB | 256 MB | 1 GB | 800 MB |
| DB connections | 50 | 30 | 80 | 70 |
| Network egress | 100 Mbps | 50 Mbps | 200 Mbps | 150 Mbps |
## 3. Budget Enforcement
- [ ] Latency assertions in CI (fail build if p95 exceeds budget)
- [ ] Load test gates in deployment pipeline
- [ ] Payload size validation at API gateway
- [ ] Circuit breaker on dependency calls exceeding max latency
- [ ] Autoscaling triggered at 60% CPU, not 80%
## 4. Review Cadence
| Trigger | Action | Owner |
|---------|--------|-------|
| Weekly | Review dashboards for drift above budget | SRE |
| Monthly | Adjust budgets based on traffic growth | Platform |
| Quarterly | Re-negotiate budgets with product team | Engineering Lead |
| Incident | Post-incident budget audit | Incident Commander |
Explanation
A latency budget is a chain: frontend (100ms) + network (50ms) + API (200ms) + database (40ms) = 390ms total. If any single link exceeds its allocation, the whole user experience degrades. The buffer column exists because real-world performance is noisy. Dependency budgets prevent a slow downstream service from consuming the entire API budget. Throughput budgets define connection pools and cache sizes. Resource budgets prevent one heavy endpoint from starving others on the same host.
Variants
| Context | Approach | Notes |
|---|---|---|
| GraphQL | Query complexity budget | Limit total field cost per query, not just depth |
| Streaming API | Throughput per shard | Budget per partition, not global |
| Mobile SDK | Bundle size + call frequency | Budget includes payload size and number of parallel requests |
Best Practices
- Set budgets during design, not after launch
- Include budgets in API design review checklists
- Fail CI builds when latency tests exceed budget by any margin
- Reserve 20-30% buffer in every budget for traffic spikes
- Document who can approve budget exceptions (usually engineering lead)
Common Mistakes
- Basing budgets on best-case lab conditions instead of production percentiles
- Ignoring dependency latency when calculating endpoint budgets
- Setting one global budget for all endpoints regardless of criticality
- Treating budgets as one-time decisions instead of living documents
- Failing to enforce budgets in CI or deployment gates
Frequently Asked Questions
What happens when an endpoint exceeds its budget?
First, optimize: add caching, denormalize queries, or paginate. If the endpoint fundamentally needs more time, escalate to the engineering lead who can either approve a budget increase or downgrade the SLO. Never silently ignore budget violations.
How do I budget for endpoints with unknown traffic patterns?
Start with conservative estimates based on similar endpoints. Monitor for 2 weeks after launch, then adjust. Use feature flags to ramp traffic gradually so you can observe real performance before committing to a budget.
Should budgets apply to batch / async endpoints?
Yes, but use different metrics. For async, budget on queue depth, processing rate (items/second), and end-to-end latency from enqueue to completion. Throughput matters more than response time for batch jobs.
Related Resources
API Lifecycle Management Template
A checklist template for API deprecation, versioning, and sunset procedures.
DocAPI Monitoring & Alerting Template
A template for defining API SLA thresholds, error rate alerts, and monitoring dashboards.
DocMicroservice Contract Template
A template for defining service contracts and API agreements between microservices.
DocService Dependency Map Template
A template for documenting and visualizing service dependencies in distributed systems.
DocSystem Diagram Template
A template for creating C4 model and architecture diagram standards.