Capacity Planning — Forecast, Scale, and Optimize Infrastructure
A practical guide to capacity planning for cloud and on-premise infrastructure: demand forecasting, load testing, auto-scaling strategies, and avoiding over-provisioning.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Capacity planning ensures your infrastructure can handle current and future demand without wasting resources. It bridges the gap between reactive firefighting and proactive scaling, helping teams deliver reliable services while controlling costs.
This guide covers demand forecasting, load testing, scaling strategies, and cost-aware capacity decisions for cloud and on-premise systems.
When to Use
- You are preparing for a product launch, marketing campaign, or seasonal traffic spike
- Your service experiences recurring performance degradation during peak hours
- You want to reduce cloud infrastructure costs without impacting reliability
- You need to justify infrastructure budgets with data-driven projections
- You are migrating from on-premise to cloud and need to right-size resources
Core Concepts
| Concept | Description |
|---|---|
| Current Capacity | Maximum throughput your system can handle with existing resources |
| Headroom | Buffer above peak usage to absorb unexpected spikes (typically 20-30%) |
| Saturation Point | Resource utilization level where performance degrades (usually >70% CPU, >80% memory) |
| Scaling Lead Time | Time required to provision and deploy additional capacity |
| Demand Forecast | Projected future load based on historical trends and business events |
Step-by-Step Capacity Planning Process
1. Measure Current Baseline
Before planning growth, understand your current state:
# Collect metrics over a representative period (2-4 weeks)
# Key metrics: CPU, memory, disk I/O, network, request latency, error rate
# Example: Prometheus query for CPU utilization
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100
Metrics to track:
- Resource metrics: CPU, memory, disk, network
- Application metrics: Requests per second, latency percentiles (p50, p95, p99), error rates
- Business metrics: Active users, transactions per minute, data volume growth
2. Identify Bottlenecks
Find the first resource that will saturate under load:
# Example: Analyze which resource hits limits first
from dataclasses import dataclass
@dataclass
class ResourceLimit:
name: str
current_usage: float
max_capacity: float
saturation_threshold: float
def headroom(self) -> float:
return (self.saturation_threshold - self.current_usage) / self.saturation_threshold * 100
# Evaluate headroom for each resource
resources = [
ResourceLimit("CPU", 45, 100, 70),
ResourceLimit("Memory", 60, 100, 80),
ResourceLimit("Disk IOPS", 75, 100, 85),
ResourceLimit("Network", 30, 100, 70),
]
bottleneck = min(resources, key=lambda r: r.headroom())
print(f"Bottleneck: {bottleneck.name} with {bottleneck.headroom():.1f}% headroom")
3. Forecast Demand
Use historical data plus business context to project future load:
Techniques:
- Trend extrapolation: Extend historical growth curves
- Seasonal adjustment: Account for weekly, monthly, or annual patterns
- Event-driven forecasting: Factor in known traffic events (launches, campaigns)
- Business correlation: Link capacity to business metrics (new customers, revenue)
# Example: Demand forecast with headroom
peak_qps_current: 5000
weekly_growth_rate: 0.05 # 5% per week
headroom_percent: 0.30 # 30% buffer
# Forecast for 3 months (13 weeks)
peak_qps_forecast: 5000 * (1.05 ** 13) ≈ 9440
required_capacity: 9440 * 1.30 ≈ 12272 QPS
4. Load Test to Validate
Verify your assumptions with controlled load tests:
# Example: k6 load test script
# capacity-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '5m', target: 100 }, // Ramp up
{ duration: '10m', target: 100 }, // Steady state
{ duration: '5m', target: 200 }, // Stress test
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<200'],
http_req_failed: ['rate<0.01'],
},
};
export default function() {
let res = http.get('https://api.example.com/health');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 200ms': (r) => r.timings.duration < 200,
});
sleep(1);
}
5. Choose Scaling Strategy
| Strategy | When to Use | Pros | Cons |
|---|---|---|---|
| Vertical scaling | Predictable, steady growth; database workloads | Simple, no code changes | Hard limit, downtime risk, expensive |
| Horizontal scaling | Variable, spiky traffic; stateless services | Elastic, fault-tolerant | Added complexity, data consistency |
| Auto-scaling | Unpredictable or cyclical demand | Cost-efficient, hands-off | Cold start latency, configuration complexity |
| Reserved capacity | Predictable baseline load | Significant cost savings | Less flexible, upfront commitment |
6. Plan for Headroom
Always maintain buffer capacity for unexpected events:
- Minimum headroom: 20% above expected peak
- Critical services: 30-40% headroom
- Cost-constrained environments: 15% with faster scaling triggers
- Seasonal businesses: Plan headroom around known peak seasons
7. Document and Review
Create a capacity plan document that includes:
- Current baseline metrics and bottlenecks
- Demand forecast with assumptions
- Scaling strategy and triggers
- Cost projections
- Review schedule (monthly or quarterly)
Best Practices
- Start with data, not guesses. Collect at least 2 weeks of production metrics before forecasting.
- Test at scale. Load test at 2-3x expected peak to understand failure modes.
- Right-size continuously. Review instance types and reserved capacity quarterly.
- Correlate with business events. Link capacity to product launches, marketing, and seasonality.
- Automate monitoring. Set up alerts when utilization crosses review thresholds (e.g., 60% sustained).
- Plan for degradation. Define graceful degradation strategies when capacity is exceeded.
Common Mistakes
- Planning for averages instead of peaks. Average load hides burst behavior.
- Ignoring scaling lead time. If it takes 10 minutes to scale, plan for traffic 10 minutes earlier.
- Over-provisioning “just in case.” Excess capacity is wasted money; use auto-scaling for variable loads.
- Forgetting downstream dependencies. Scaling frontend without scaling database leads to new bottlenecks.
- Not re-testing after changes. Architecture changes invalidate previous capacity assumptions.
Variants
- Cloud-native capacity planning: Use managed auto-scaling, spot instances, and serverless for elastic workloads.
- On-premise capacity planning: Focus on hardware procurement cycles, virtualization density, and power/cooling constraints.
- Database capacity planning: Monitor query performance, connection limits, storage growth, and replication lag.
FAQ
Q: How far ahead should I forecast capacity? Forecast 3-6 months for cloud environments and 12-18 months for on-premise hardware procurement.
Q: What is the difference between capacity planning and performance tuning? Capacity planning determines how many resources you need. Performance tuning makes existing resources more efficient. Do both.
Q: How do I balance cost and reliability? Use auto-scaling for variable loads, reserved instances for baselines, and maintain 20-30% headroom. Review monthly.
Q: Should I plan capacity per service or globally? Plan per service, then aggregate. Each service has different scaling characteristics and bottlenecks.
Conclusion
Capacity planning is an ongoing practice, not a one-time exercise. Measure, forecast, test, and review regularly to keep your infrastructure aligned with business growth while controlling costs.