Skip to content
SP StackPractices
intermediate By StackPractices

Capacity Planning — Forecast, Scale, and Optimize Infrastructure

A practical guide to capacity planning for cloud and on-premise infrastructure: demand forecasting, load testing, auto-scaling strategies, and avoiding over-provisioning.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Capacity planning ensures your infrastructure can handle current and future demand without wasting resources. It bridges the gap between reactive firefighting and proactive scaling, helping teams deliver reliable services while controlling costs.

This guide covers demand forecasting, load testing, scaling strategies, and cost-aware capacity decisions for cloud and on-premise systems.

When to Use

  • You are preparing for a product launch, marketing campaign, or seasonal traffic spike
  • Your service experiences recurring performance degradation during peak hours
  • You want to reduce cloud infrastructure costs without impacting reliability
  • You need to justify infrastructure budgets with data-driven projections
  • You are migrating from on-premise to cloud and need to right-size resources

Core Concepts

ConceptDescription
Current CapacityMaximum throughput your system can handle with existing resources
HeadroomBuffer above peak usage to absorb unexpected spikes (typically 20-30%)
Saturation PointResource utilization level where performance degrades (usually >70% CPU, >80% memory)
Scaling Lead TimeTime required to provision and deploy additional capacity
Demand ForecastProjected future load based on historical trends and business events

Step-by-Step Capacity Planning Process

1. Measure Current Baseline

Before planning growth, understand your current state:

# Collect metrics over a representative period (2-4 weeks)
# Key metrics: CPU, memory, disk I/O, network, request latency, error rate

# Example: Prometheus query for CPU utilization
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100

Metrics to track:

  • Resource metrics: CPU, memory, disk, network
  • Application metrics: Requests per second, latency percentiles (p50, p95, p99), error rates
  • Business metrics: Active users, transactions per minute, data volume growth

2. Identify Bottlenecks

Find the first resource that will saturate under load:

# Example: Analyze which resource hits limits first
from dataclasses import dataclass

@dataclass
class ResourceLimit:
    name: str
    current_usage: float
    max_capacity: float
    saturation_threshold: float

    def headroom(self) -> float:
        return (self.saturation_threshold - self.current_usage) / self.saturation_threshold * 100

# Evaluate headroom for each resource
resources = [
    ResourceLimit("CPU", 45, 100, 70),
    ResourceLimit("Memory", 60, 100, 80),
    ResourceLimit("Disk IOPS", 75, 100, 85),
    ResourceLimit("Network", 30, 100, 70),
]

bottleneck = min(resources, key=lambda r: r.headroom())
print(f"Bottleneck: {bottleneck.name} with {bottleneck.headroom():.1f}% headroom")

3. Forecast Demand

Use historical data plus business context to project future load:

Techniques:

  • Trend extrapolation: Extend historical growth curves
  • Seasonal adjustment: Account for weekly, monthly, or annual patterns
  • Event-driven forecasting: Factor in known traffic events (launches, campaigns)
  • Business correlation: Link capacity to business metrics (new customers, revenue)
# Example: Demand forecast with headroom
peak_qps_current: 5000
weekly_growth_rate: 0.05  # 5% per week
headroom_percent: 0.30    # 30% buffer

# Forecast for 3 months (13 weeks)
peak_qps_forecast: 5000 * (1.05 ** 13) ≈ 9440
required_capacity: 9440 * 1.30 ≈ 12272 QPS

4. Load Test to Validate

Verify your assumptions with controlled load tests:

# Example: k6 load test script
# capacity-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '5m', target: 100 },   // Ramp up
    { duration: '10m', target: 100 }, // Steady state
    { duration: '5m', target: 200 },  // Stress test
    { duration: '5m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function() {
  let res = http.get('https://api.example.com/health');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}

5. Choose Scaling Strategy

StrategyWhen to UseProsCons
Vertical scalingPredictable, steady growth; database workloadsSimple, no code changesHard limit, downtime risk, expensive
Horizontal scalingVariable, spiky traffic; stateless servicesElastic, fault-tolerantAdded complexity, data consistency
Auto-scalingUnpredictable or cyclical demandCost-efficient, hands-offCold start latency, configuration complexity
Reserved capacityPredictable baseline loadSignificant cost savingsLess flexible, upfront commitment

6. Plan for Headroom

Always maintain buffer capacity for unexpected events:

  • Minimum headroom: 20% above expected peak
  • Critical services: 30-40% headroom
  • Cost-constrained environments: 15% with faster scaling triggers
  • Seasonal businesses: Plan headroom around known peak seasons

7. Document and Review

Create a capacity plan document that includes:

  • Current baseline metrics and bottlenecks
  • Demand forecast with assumptions
  • Scaling strategy and triggers
  • Cost projections
  • Review schedule (monthly or quarterly)

Best Practices

  • Start with data, not guesses. Collect at least 2 weeks of production metrics before forecasting.
  • Test at scale. Load test at 2-3x expected peak to understand failure modes.
  • Right-size continuously. Review instance types and reserved capacity quarterly.
  • Correlate with business events. Link capacity to product launches, marketing, and seasonality.
  • Automate monitoring. Set up alerts when utilization crosses review thresholds (e.g., 60% sustained).
  • Plan for degradation. Define graceful degradation strategies when capacity is exceeded.

Common Mistakes

  • Planning for averages instead of peaks. Average load hides burst behavior.
  • Ignoring scaling lead time. If it takes 10 minutes to scale, plan for traffic 10 minutes earlier.
  • Over-provisioning “just in case.” Excess capacity is wasted money; use auto-scaling for variable loads.
  • Forgetting downstream dependencies. Scaling frontend without scaling database leads to new bottlenecks.
  • Not re-testing after changes. Architecture changes invalidate previous capacity assumptions.

Variants

  • Cloud-native capacity planning: Use managed auto-scaling, spot instances, and serverless for elastic workloads.
  • On-premise capacity planning: Focus on hardware procurement cycles, virtualization density, and power/cooling constraints.
  • Database capacity planning: Monitor query performance, connection limits, storage growth, and replication lag.

FAQ

Q: How far ahead should I forecast capacity? Forecast 3-6 months for cloud environments and 12-18 months for on-premise hardware procurement.

Q: What is the difference between capacity planning and performance tuning? Capacity planning determines how many resources you need. Performance tuning makes existing resources more efficient. Do both.

Q: How do I balance cost and reliability? Use auto-scaling for variable loads, reserved instances for baselines, and maintain 20-30% headroom. Review monthly.

Q: Should I plan capacity per service or globally? Plan per service, then aggregate. Each service has different scaling characteristics and bottlenecks.

Conclusion

Capacity planning is an ongoing practice, not a one-time exercise. Measure, forecast, test, and review regularly to keep your infrastructure aligned with business growth while controlling costs.