Skip to content
SP StackPractices
intermediate By StackPractices

Complete Guide to AWS Cost Optimization

Reduce AWS cloud spend by 40%. Covers EC2 right-sizing, Spot instances, Reserved Instances, Savings Plans, S3 lifecycle, RDS optimization, networking, monitoring, and automation.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Complete Guide to AWS Cost Optimization

Introduction

AWS bills grow silently — unused resources, over-provisioned instances, and lack of monitoring can inflate costs by 40% or more. This guide covers EC2 right-sizing, Spot and Reserved Instances, Savings Plans, S3 lifecycle policies, RDS optimization, networking costs, and automated cost monitoring.

Cost Explorer and Budgets

Analyzing spending

# Install AWS CLI
pip install awscli

# Get cost breakdown by service
aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=SERVICE

# Get cost by tag
aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=TAG Key=Environment

Setting up budgets

# Create a monthly budget alert
aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "MonthlyBudget",
    "BudgetLimit": {"Amount": "5000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80
      },
      "Subscribers": [{
        "SubscriptionType": "EMAIL",
        "Address": "ops@example.com"
      }]
    }
  ]'

EC2 Right-Sizing

Finding underutilized instances

# List CPU utilization for all EC2 instances
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average \
  --output table

Right-sizing with AWS Compute Optimizer

# Enable Compute Optimizer
aws compute-optimizer enable-compute-optimizer

# Get recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=finding,values=Underprovisioned,Overprovisioned

Typical right-sizing actions

CurrentRecommendationMonthly Savings
m5.2xlarge (avg 5% CPU)t3.large~$200
c5.4xlarge (avg 10% CPU)c5.xlarge~$250
r5.xlarge (avg 15% CPU)t3.medium~$150

Spot Instances

Spot Fleet for batch workloads

{
  "SpotFleetRequestConfig": {
    "AllocationStrategy": "diversified",
    "IamFleetRole": "arn:aws:iam::123456789012:role/SpotFleetRole",
    "SpotPrice": "0.10",
    "TargetCapacity": 10,
    "LaunchSpecifications": [
      {
        "InstanceType": "t3.medium",
        "ImageId": "ami-12345678",
        "SubnetId": "subnet-12345678"
      },
      {
        "InstanceType": "t3.large",
        "ImageId": "ami-12345678",
        "SubnetId": "subnet-87654321"
      }
    ]
  }
}

Spot with auto-scaling groups

# Terraform
resource "aws_autoscaling_group" "spot" {
  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 0
      spot_allocation_strategy                 = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }

      override {
        instance_type = "t3.medium"
      }
      override {
        instance_type = "t3.large"
      }
    }
  }

  min_size         = 2
  max_size         = 10
  desired_capacity = 4
}

Spot interruption handling

import boto3

# Spot Instance Interruption Notice gives 2 minutes
# Poll metadata endpoint for interruption notices
import urllib.request
import json

def check_spot_interruption():
    try:
        response = urllib.request.urlopen(
            "http://169.254.169.254/latest/meta-data/spot/instance-action"
        )
        action = json.loads(response.read())
        if action["action"] == "terminate":
            # Drain connections, save state, shutdown gracefully
            print(f"Spot interruption at {action['time']}")
            graceful_shutdown()
    except:
        pass

Reserved Instances and Savings Plans

Reserved Instances

TypeCommitmentDiscountBest For
Standard RI1 or 3 yearUp to 72%Steady-state workloads
Convertible RI1 or 3 yearUp to 54%Workloads that may change
Scheduled RI1 yearVariablePredictable time windows

Savings Plans

# Purchase a Compute Savings Plan
aws savingsplans create-savings-plan \
  --savings-plan-type COMPUTE \
  --commitment "500" \
  --term "1YEAR" \
  --payment-option "NO_UPFRONT"

When to use which

  • Steady-state EC2 — Standard Reserved Instances (highest discount)
  • Flexible workloads — Compute Savings Plans (apply to any instance family)
  • Fargate/Lambda — Compute Savings Plans (cover Fargate and Lambda)
  • S3, DynamoDB — No RIs available, use lifecycle and capacity modes

S3 Cost Optimization

Lifecycle policies

{
  "Rules": [
    {
      "ID": "MoveToIAAfter30Days",
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ]
    },
    {
      "ID": "DeleteTempFiles",
      "Status": "Enabled",
      "Filter": { "Prefix": "temp/" },
      "Expiration": { "Days": 7 }
    }
  ]
}

S3 storage classes

ClassCost vs StandardUse Case
STANDARD1xFrequently accessed
STANDARD_IA0.5xInfrequently accessed (30+ days)
ONEZONE_IA0.4xInfrequently accessed, non-critical
GLACIER0.17xArchive (90+ days)
DEEP_ARCHIVE0.04xLong-term archive (365+ days)
INTELLIGENT_TIERINGVariableUnknown access patterns

Intelligent-Tiering

aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-bucket \
  --id MoveToArchive \
  --intelligent-tiering-configuration '{
    "Status": "Enabled",
    "Tierings": [
      { "Days": 90, "AccessTier": "ARCHIVE_ACCESS" },
      { "Days": 180, "AccessTier": "DEEP_ARCHIVE_ACCESS" }
    ]
  }'

RDS Optimization

Right-sizing RDS

# Check DB instance CPU
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=mydb \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average,Maximum

RDS cost reduction strategies

  • Downsize instances — db.t4g instead of db.m5 if CPU < 20%
  • Reserved Instances — up to 69% discount for 3-year commitment
  • Stop non-prod at night — RDS can be stopped for up to 7 days
  • Use Aurora Serverless — scales to zero for intermittent workloads
  • Delete unused snapshots — old snapshots accumulate silently
  • Use read replicas wisely — each replica costs the same as the primary

Automated snapshot cleanup

import boto3
from datetime import datetime, timedelta

rds = boto3.client("rds")

def cleanup_old_snapshots(days=30):
    cutoff = datetime.now() - timedelta(days=days)
    snapshots = rds.describe_db_snapshots()["DBSnapshots"]

    for snap in snapshots:
        if snap["SnapshotCreateTime"].replace(tzinfo=None) < cutoff:
            if not snap.get("DBSnapshotAttributes"):
                rds.delete_db_snapshot(DBSnapshotIdentifier=snap["DBSnapshotIdentifier"])
                print(f"Deleted: {snap['DBSnapshotIdentifier']}")

Networking Costs

Data transfer optimization

ScenarioCost
Inbound data transferFree
Same AZ data transferFree
Cross-AZ data transfer$0.01/GB
Cross-region data transfer$0.02-0.09/GB
Internet egress$0.09/GB

Reducing networking costs

  • Keep traffic in same AZ — place dependent services in the same AZ
  • Use VPC endpoints — avoid NAT Gateway charges for AWS service traffic
  • Use CloudFront — cache content at edge, reduce origin data transfer
  • Compress responses — less data = less egress cost
  • Use S3 Transfer Acceleration — for uploads, not downloads

VPC endpoints

# Terraform — Gateway endpoint for S3 (free)
resource "aws_vpc_endpoint" "s3" {
  vpc_id          = aws_vpc.main.id
  service_name    = "com.amazonaws.us-east-1.s3"
  route_table_ids = [aws_route_table.private.id]
}

# Interface endpoint for DynamoDB (~$0.01/hr)
resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id          = aws_vpc.main.id
  service_name    = "com.amazonaws.us-east-1.dynamodb"
  vpc_endpoint_type = "Interface"
  subnet_ids      = aws_subnet.private[*].id
}

Automated Cost Monitoring

AWS Cost Anomaly Detection

# Enable cost anomaly detection
aws ce create-anomaly-monitor \
  --anomaly-monitor '{
    "MonitorName": "DailyAnomaly",
    "MonitorType": "DIMENSIONAL",
    "MonitorSpecification": "{\"Dimension\":\"SERVICE\"}"
  }'

Cost reports with Lambda

import boto3
import json

ce = boto3.client("ce")

def lambda_handler(event, context):
    response = ce.get_cost_and-usage(
        TimePeriod={"Start": "2026-06-01", "End": "2026-07-01"},
        Granularity="MONTHLY",
        Metrics=["UnblendedCost"],
        GroupBy=[{"Type": "SERVICE", "Key": "Service"}],
    )

    costs = []
    for group in response["ResultsByTime"][0]["Groups"]:
        service = group["Keys"][0]
        amount = float(group["Metrics"]["UnblendedCost"]["Amount"])
        if amount > 100:
            costs.append(f"{service}: ${amount:.2f}")

    # Send to Slack
    if costs:
        send_slack_notification("\n".join(costs))

    return {"statusCode": 200}

Best Practices

  • Enable Cost Explorer — visibility is the first step to optimization
  • Set up budget alerts — catch overruns before they happen
  • Right-size every 3 months — workloads change, instances should too
  • Use Spot for 70%+ of non-critical workloads — 90% discount
  • Commit to RIs/Savings Plans for baseline — cover 60-70% of steady-state
  • Use S3 lifecycle policies — move old data to cheaper tiers automatically
  • Stop non-prod at night — 65% of non-prod hours are idle
  • Delete unused EBS volumes — they cost money even when unattached
  • Release unused Elastic IPs — $0.005/hr when not attached
  • Use VPC endpoints — avoid NAT Gateway data processing charges
  • Tag everything — enable cost allocation by team/project
  • Review monthly — costs creep without regular review

Common Mistakes

  • Leaving EC2 instances running 24/7 in dev — use auto-stop schedules
  • Using S3 Standard for archive data — lifecycle to Glacier/Deep Archive
  • Over-provisioning RDS — downsize based on actual CPU/memory
  • Ignoring NAT Gateway costs — use VPC endpoints for AWS service traffic
  • Not using Spot for batch/cron jobs — 90% savings for interruptible workloads
  • Forgetting to delete old snapshots — accumulate silently over months
  • Buying RIs without analyzing usage — wrong instance family = wasted commitment
  • Not tagging resources — no visibility into team/project costs
  • Using CloudWatch Logs with no retention — log volumes grow unbounded
  • Ignoring data transfer costs — cross-AZ and cross-region add up fast

Frequently Asked Questions

How much can I save with AWS cost optimization?

Typical savings range from 30-50% for organizations that have never optimized. The biggest wins come from right-sizing (15-20%), Spot instances (10-15% for eligible workloads), and RI/Savings Plans commitments (10-15% for steady-state). S3 lifecycle and networking optimization add another 5-10%.

Should I use Reserved Instances or Savings Plans?

Use Standard RIs when you have a stable, predictable EC2 workload on a specific instance family. Use Compute Savings Plans when you want flexibility across instance families, or when you use Fargate and Lambda. Savings Plans are the newer, more flexible option — AWS recommends them for most new commitments.

How do I track costs by team or project?

Use AWS tags and Cost Allocation Tags. Tag every resource with Team, Project, and Environment. Enable the tags as Cost Allocation Tags in AWS Billing. Then use Cost Explorer to filter and group by these tags.