Complete Guide to AWS Cost Optimization

Q: How much can I save with AWS cost optimization?

Typical savings range from 30-50% for organizations that have never optimized. The biggest wins come from right-sizing (15-20%), Spot instances (10-15% for eligible workloads), and RI/Savings Plans commitments (10-15% for steady-state). S3 lifecycle and networking optimization add another 5-10%.

Q: Should I use Reserved Instances or Savings Plans?

Use Standard RIs when you have a stable, predictable EC2 workload on a specific instance family. Use Compute Savings Plans when you want flexibility across instance families, or when you use Fargate and Lambda. Savings Plans are the newer, more flexible option — AWS recommends them for most new commitments.

Q: How do I track costs by team or project?

Use AWS tags and Cost Allocation Tags. Tag every resource with Team, Project, and Environment. Enable the tags as Cost Allocation Tags in AWS Billing. Then use Cost Explorer to filter and group by these tags.

Introduction

AWS bills grow silently — unused resources, over-provisioned instances, and lack of monitoring can inflate costs by 40% or more. This guide covers EC2 right-sizing, Spot and Reserved Instances, Savings Plans, S3 lifecycle policies, RDS optimization, networking costs, and automated cost monitoring.

Cost Explorer and Budgets

Analyzing spending

# Install AWS CLI
pip install awscli

# Get cost breakdown by service
aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=SERVICE

# Get cost by tag
aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=TAG Key=Environment

Setting up budgets

# Create a monthly budget alert
aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "MonthlyBudget",
    "BudgetLimit": {"Amount": "5000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80
      },
      "Subscribers": [{
        "SubscriptionType": "EMAIL",
        "Address": "ops@example.com"
      }]
    }
  ]'

EC2 Right-Sizing

Finding underutilized instances

# List CPU utilization for all EC2 instances
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average \
  --output table

Right-sizing with AWS Compute Optimizer

# Enable Compute Optimizer
aws compute-optimizer enable-compute-optimizer

# Get recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=finding,values=Underprovisioned,Overprovisioned

Typical right-sizing actions

Current	Recommendation	Monthly Savings
m5.2xlarge (avg 5% CPU)	t3.large	~$200
c5.4xlarge (avg 10% CPU)	c5.xlarge	~$250
r5.xlarge (avg 15% CPU)	t3.medium	~$150

Spot Instances

Spot Fleet for batch workloads

{
  "SpotFleetRequestConfig": {
    "AllocationStrategy": "diversified",
    "IamFleetRole": "arn:aws:iam::123456789012:role/SpotFleetRole",
    "SpotPrice": "0.10",
    "TargetCapacity": 10,
    "LaunchSpecifications": [
      {
        "InstanceType": "t3.medium",
        "ImageId": "ami-12345678",
        "SubnetId": "subnet-12345678"
      },
      {
        "InstanceType": "t3.large",
        "ImageId": "ami-12345678",
        "SubnetId": "subnet-87654321"
      }
    ]
  }
}

Spot with auto-scaling groups

# Terraform
resource "aws_autoscaling_group" "spot" {
  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 0
      spot_allocation_strategy                 = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }

      override {
        instance_type = "t3.medium"
      }
      override {
        instance_type = "t3.large"
      }
    }
  }

  min_size         = 2
  max_size         = 10
  desired_capacity = 4
}

Spot interruption handling

import boto3

# Spot Instance Interruption Notice gives 2 minutes
# Poll metadata endpoint for interruption notices
import urllib.request
import json

def check_spot_interruption():
    try:
        response = urllib.request.urlopen(
            "http://169.254.169.254/latest/meta-data/spot/instance-action"
        )
        action = json.loads(response.read())
        if action["action"] == "terminate":
            # Drain connections, save state, shutdown gracefully
            print(f"Spot interruption at {action['time']}")
            graceful_shutdown()
    except:
        pass

Reserved Instances and Savings Plans

Reserved Instances

Type	Commitment	Discount	Best For
Standard RI	1 or 3 year	Up to 72%	Steady-state workloads
Convertible RI	1 or 3 year	Up to 54%	Workloads that may change
Scheduled RI	1 year	Variable	Predictable time windows

Savings Plans

# Purchase a Compute Savings Plan
aws savingsplans create-savings-plan \
  --savings-plan-type COMPUTE \
  --commitment "500" \
  --term "1YEAR" \
  --payment-option "NO_UPFRONT"

When to use which

Steady-state EC2 — Standard Reserved Instances (highest discount)
Flexible workloads — Compute Savings Plans (apply to any instance family)
Fargate/Lambda — Compute Savings Plans (cover Fargate and Lambda)
S3, DynamoDB — No RIs available, use lifecycle and capacity modes

S3 Cost Optimization

Lifecycle policies

{
  "Rules": [
    {
      "ID": "MoveToIAAfter30Days",
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ]
    },
    {
      "ID": "DeleteTempFiles",
      "Status": "Enabled",
      "Filter": { "Prefix": "temp/" },
      "Expiration": { "Days": 7 }
    }
  ]
}

S3 storage classes

Class	Cost vs Standard	Use Case
STANDARD	1x	Frequently accessed
STANDARD_IA	0.5x	Infrequently accessed (30+ days)
ONEZONE_IA	0.4x	Infrequently accessed, non-critical
GLACIER	0.17x	Archive (90+ days)
DEEP_ARCHIVE	0.04x	Long-term archive (365+ days)
INTELLIGENT_TIERING	Variable	Unknown access patterns

Intelligent-Tiering

aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-bucket \
  --id MoveToArchive \
  --intelligent-tiering-configuration '{
    "Status": "Enabled",
    "Tierings": [
      { "Days": 90, "AccessTier": "ARCHIVE_ACCESS" },
      { "Days": 180, "AccessTier": "DEEP_ARCHIVE_ACCESS" }
    ]
  }'

RDS Optimization

Right-sizing RDS

# Check DB instance CPU
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=mydb \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average,Maximum

RDS cost reduction strategies

Downsize instances — db.t4g instead of db.m5 if CPU < 20%
Reserved Instances — up to 69% discount for 3-year commitment
Stop non-prod at night — RDS can be stopped for up to 7 days
Use Aurora Serverless — scales to zero for intermittent workloads
Delete unused snapshots — old snapshots accumulate silently
Use read replicas wisely — each replica costs the same as the primary

Automated snapshot cleanup

import boto3
from datetime import datetime, timedelta

rds = boto3.client("rds")

def cleanup_old_snapshots(days=30):
    cutoff = datetime.now() - timedelta(days=days)
    snapshots = rds.describe_db_snapshots()["DBSnapshots"]

    for snap in snapshots:
        if snap["SnapshotCreateTime"].replace(tzinfo=None) < cutoff:
            if not snap.get("DBSnapshotAttributes"):
                rds.delete_db_snapshot(DBSnapshotIdentifier=snap["DBSnapshotIdentifier"])
                print(f"Deleted: {snap['DBSnapshotIdentifier']}")

Networking Costs

Data transfer optimization

Scenario	Cost
Inbound data transfer	Free
Same AZ data transfer	Free
Cross-AZ data transfer	$0.01/GB
Cross-region data transfer	$0.02-0.09/GB
Internet egress	$0.09/GB

Reducing networking costs

Keep traffic in same AZ — place dependent services in the same AZ
Use VPC endpoints — avoid NAT Gateway charges for AWS service traffic
Use CloudFront — cache content at edge, reduce origin data transfer
Compress responses — less data = less egress cost
Use S3 Transfer Acceleration — for uploads, not downloads

VPC endpoints

# Terraform — Gateway endpoint for S3 (free)
resource "aws_vpc_endpoint" "s3" {
  vpc_id          = aws_vpc.main.id
  service_name    = "com.amazonaws.us-east-1.s3"
  route_table_ids = [aws_route_table.private.id]
}

# Interface endpoint for DynamoDB (~$0.01/hr)
resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id          = aws_vpc.main.id
  service_name    = "com.amazonaws.us-east-1.dynamodb"
  vpc_endpoint_type = "Interface"
  subnet_ids      = aws_subnet.private[*].id
}

Automated Cost Monitoring

AWS Cost Anomaly Detection

# Enable cost anomaly detection
aws ce create-anomaly-monitor \
  --anomaly-monitor '{
    "MonitorName": "DailyAnomaly",
    "MonitorType": "DIMENSIONAL",
    "MonitorSpecification": "{\"Dimension\":\"SERVICE\"}"
  }'

Cost reports with Lambda

import boto3
import json

ce = boto3.client("ce")

def lambda_handler(event, context):
    response = ce.get_cost_and-usage(
        TimePeriod={"Start": "2026-06-01", "End": "2026-07-01"},
        Granularity="MONTHLY",
        Metrics=["UnblendedCost"],
        GroupBy=[{"Type": "SERVICE", "Key": "Service"}],
    )

    costs = []
    for group in response["ResultsByTime"][0]["Groups"]:
        service = group["Keys"][0]
        amount = float(group["Metrics"]["UnblendedCost"]["Amount"])
        if amount > 100:
            costs.append(f"{service}: ${amount:.2f}")

    # Send to Slack
    if costs:
        send_slack_notification("\n".join(costs))

    return {"statusCode": 200}

Best Practices

Enable Cost Explorer — visibility is the first step to optimization
Set up budget alerts — catch overruns before they happen
Right-size every 3 months — workloads change, instances should too
Use Spot for 70%+ of non-critical workloads — 90% discount
Commit to RIs/Savings Plans for baseline — cover 60-70% of steady-state
Use S3 lifecycle policies — move old data to cheaper tiers automatically
Stop non-prod at night — 65% of non-prod hours are idle
Delete unused EBS volumes — they cost money even when unattached
Release unused Elastic IPs — $0.005/hr when not attached
Use VPC endpoints — avoid NAT Gateway data processing charges
Tag everything — enable cost allocation by team/project
Review monthly — costs creep without regular review

Common Mistakes

Leaving EC2 instances running 24/7 in dev — use auto-stop schedules
Using S3 Standard for archive data — lifecycle to Glacier/Deep Archive
Over-provisioning RDS — downsize based on actual CPU/memory
Ignoring NAT Gateway costs — use VPC endpoints for AWS service traffic
Not using Spot for batch/cron jobs — 90% savings for interruptible workloads
Forgetting to delete old snapshots — accumulate silently over months
Buying RIs without analyzing usage — wrong instance family = wasted commitment
Not tagging resources — no visibility into team/project costs
Using CloudWatch Logs with no retention — log volumes grow unbounded
Ignoring data transfer costs — cross-AZ and cross-region add up fast

Frequently Asked Questions

How much can I save with AWS cost optimization?

Typical savings range from 30-50% for organizations that have never optimized. The biggest wins come from right-sizing (15-20%), Spot instances (10-15% for eligible workloads), and RI/Savings Plans commitments (10-15% for steady-state). S3 lifecycle and networking optimization add another 5-10%.

Should I use Reserved Instances or Savings Plans?

Use Standard RIs when you have a stable, predictable EC2 workload on a specific instance family. Use Compute Savings Plans when you want flexibility across instance families, or when you use Fargate and Lambda. Savings Plans are the newer, more flexible option — AWS recommends them for most new commitments.

How do I track costs by team or project?

Use AWS tags and Cost Allocation Tags. Tag every resource with Team, Project, and Environment. Enable the tags as Cost Allocation Tags in AWS Billing. Then use Cost Explorer to filter and group by these tags.