Cost Optimization in AWS Using Spot, Reserved, and Auto Scaling

Cloud bills look complicated; they are mostly not. Compute dominates almost every AWS bill, and within compute, the gap between “what was provisioned” and “what was needed” dominates the wasted spend. A standard engineering organization that has never optimized AWS compute is paying somewhere between 40% and 70% more than it needs to. The fix is not a vendor tool or a heroic re-architecture — it is a disciplined application of three mechanisms AWS has provided for over a decade: Spot Instances, Savings Plans / Reserved capacity, and Auto Scaling. Used together, these reduce compute cost by half on most workloads without sacrificing availability. Used naively, they introduce outages.

This post is about how to use them correctly.

The Compute Pricing Hierarchy

AWS compute is priced on a hierarchy that maps directly to commitment and interruptibility:

Pricing model	Discount vs. On-Demand	Commitment	Interruptible
On-Demand	0%	None	No
Compute Savings Plan	30–66%	1–3 years	No
EC2 Instance Savings Plan	up to 72%	1–3 years, family-locked	No
Reserved Instances	up to 72%	1–3 years, type-locked	No
Spot Instances	50–90%	None	Yes (2-min notice)

The optimization strategy follows directly: cover your steady-state baseline with commitment-based pricing (Savings Plans), absorb burst and stateless workloads with Spot, and use On-Demand only for the diminishing slice that fits neither — which should be 5–15% of your fleet in steady state, not 100%.

Savings Plans: The Right Default

Savings Plans replaced Reserved Instances as the recommended commitment model for most workloads in 2019, and the recommendation has not changed. The two relevant types:

Compute Savings Plan. Commit to an hourly compute spend ($/hr) over 1 or 3 years. Applies across instance families, sizes, regions, and even between EC2, Fargate, and Lambda. 66% maximum discount for 3-year all-upfront.
EC2 Instance Savings Plan. Commits to a specific instance family in a specific region. Up to 72% discount, but locked in.

For most teams, Compute Savings Plans are the right answer. The flexibility — being able to move from m5 to m6i to Graviton without restructuring commitments — is worth the few percentage points of additional discount you’d get from EC2 ISPs.

A practical algorithm for sizing the commitment:

Pull 90 days of compute spend from Cost Explorer or CUR data.
Find the 5th percentile hourly spend over that window. This is your safe baseline — you ran at least this much for ~95% of hours.
Commit to 80% of that 5th percentile for 1 year (no-upfront if cash flow matters; all-upfront for the maximum discount).
Top up to 100% of the baseline after observing 30 days of behavior under the commitment.

The discipline: never commit to more than your demonstrated baseline. Unused commitment is real money lost. Auto Compute Optimizer and Cost Explorer’s recommendation engine are decent starting points; verify them against your own data before signing the commitment.

Spot: Cheap Compute With a Catch

Spot Instances run on AWS’s spare capacity. You bid the On-Demand price (effectively a price cap); AWS allocates capacity if available and gives you a 2-minute interruption notice when it needs the capacity back. Typical discount is 60–80%, occasionally higher.

The catch is the interruption. Workloads that survive Spot are:

Stateless. No critical in-memory state; can be killed and restarted without data loss.
Idempotent. Restarting a task produces the same result; partial completion doesn’t corrupt anything.
Distributed. Killing one of N workers reduces capacity by 1/N, not 100%.
Tolerant of moderate restart latency. Pulling a new instance and starting work takes 1–3 minutes.

Examples: batch processing, CI runners, stateless web tier, ML training, video transcoding, asynchronous worker pools, dev/test environments. Not Spot candidates: stateful databases, license-bound software with slow boot, real-time systems that cannot survive a sub-fleet restart.

Modern Spot Practices

Spot has evolved significantly since the early days. The current best practices:

Use EC2 Auto Scaling Groups with Mixed Instances. Specify multiple instance types (at least 6–10) across multiple availability zones. The allocation strategy price-capacity-optimized is now the default and almost always the right choice — it picks the pool with the best combination of low price and low interruption risk.
Don’t bid above On-Demand. AWS doesn’t let you, and you shouldn’t want to. Spot is a capacity play, not a bidding strategy.
Spread across availability zones. Capacity availability varies by AZ; multi-AZ Spot ASGs have dramatically lower interruption rates.
Avoid niche instance types. c5.18xlarge has a small pool and high interruption rate. m5.large, c5.large, r5.large and their AMD/Graviton variants have deep pools.
Handle the interruption notice. Listen for instance-action on instance metadata; drain connections, checkpoint state, or shed work in the 2-minute window.

For Kubernetes workloads, Karpenter is now the de facto Spot/On-Demand mixer. It provisions nodes on-demand based on pending pod requirements, mixes instance types automatically, and consolidates underutilized nodes back into smaller ones. Karpenter has largely displaced Cluster Autoscaler for AWS-native EKS clusters.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m6i.large", "m6i.xlarge", "m6a.large", "m6a.xlarge",
                   "m7i.large", "m7i.xlarge", "m7g.large", "m7g.xlarge"]
      nodeClassRef:
        name: default
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Spot for Critical Services

Conventional wisdom says don’t run anything critical on Spot. Modern practice is more nuanced: critical fleets can run mostly on Spot if the fleet is large enough that the loss of any single instance is invisible.

A web tier of 100 instances behind a load balancer, with 70 on Spot and 30 on On-Demand: interruptions reduce capacity briefly, the load balancer routes around the dying instances, and the On-Demand baseline absorbs whatever Spot can’t cover. Capacity Rebalancing on ASGs further smooths this — the ASG launches a replacement before the interrupted instance dies.

The math that makes this work: even at a 5% monthly interruption rate (high for diversified Spot), the expected steady-state available capacity is 95% of the fleet. Plan capacity at 105% of needed and you’re fine.

Auto Scaling: The Knob That Matters Most

Static provisioning is the largest source of waste in AWS. Most workloads have 4–10x variation between daily peak and trough; provisioning for peak leaves 75% of capacity idle for most of the day.

The mechanisms:

EC2 Auto Scaling Groups (ASGs). Scale instance counts based on metrics. Target Tracking is the right default (e.g., “keep CPU at 50%”), with Step Scaling for more nuanced behavior.
Cluster Autoscaler / Karpenter for Kubernetes — scales nodes in response to pending pod resource requests.
ECS Service Auto Scaling and App Runner concurrency scaling for container services.
Application Auto Scaling for DynamoDB, Aurora Serverless v2 ACUs, and other managed services.

The point that earns its keep: scale on the metric that actually limits your service. CPU is often wrong. Request count, queue depth, or custom application metrics (active connections, in-flight tasks) often produce better scaling decisions. CloudWatch custom metrics into Target Tracking work; so does scaling on application-level signals via Step Scaling or scheduled actions.

Predictive vs. Reactive Scaling

Reactive scaling (most ASGs) responds to load that has already arrived. There is always a lag — instances take 1–3 minutes to launch and warm up, during which the existing fleet is overloaded.

Predictive scaling (introduced in 2018) uses historical patterns to scale ahead of expected load. For workloads with daily or weekly cycles — most consumer-facing services — predictive scaling avoids the morning rush penalty entirely. Combine with reactive scaling for unexpected spikes.

Right-Sizing Is Continuous

The other half of cost optimization is making sure each instance is the right size. AWS Compute Optimizer analyzes CloudWatch metrics and recommends instance type changes for over-provisioned workloads. A typical environment has 20–40% of instances running at <20% CPU utilization — those are right-sizing candidates.

The discipline: review Compute Optimizer recommendations monthly. Approve right-sizing in waves; verify performance unchanged for a week before the next wave.

Graviton: The Free 20%

AWS’s Graviton processors (ARM-based, m7g, c7g, r7g) are 20% cheaper than equivalent x86 instances and typically 10–40% faster for the same workload. Most modern languages (Java 17+, Python, Go, Node.js, .NET 8+) run on Graviton without code changes. Multi-arch container builds (docker buildx) emit both x86 and ARM images from one CI build.

The migration friction is small for greenfield workloads and moderate for established ones (mostly: any binary dependency that ships x86-only). The cost win is large enough that “we’ll get to it eventually” is leaving 20% of compute spend on the table.

Storage and Data Transfer: The Other Half

Compute dominates the bill, but storage and data transfer waste is real and often invisible.

EBS snapshots accumulate. Implement lifecycle policies (Data Lifecycle Manager). Most teams have 60–80% of their EBS spend in snapshots they cannot identify the purpose of.
S3 storage classes. Intelligent-Tiering for unknown access patterns; lifecycle rules to Glacier Deep Archive for known-cold data. Lifecycle rules pay for themselves within months on most buckets.
NAT Gateway traffic. $0.045/GB processed, plus $0.045/hour per gateway. A microservice fleet talking to S3 through a NAT Gateway is expensive — use VPC Endpoints for S3, DynamoDB, ECR, and other AWS services to eliminate this entirely.
Cross-AZ traffic. $0.01/GB in both directions. Architecturally minimize by colocating services that talk heavily.
CloudWatch Logs ingestion. $0.50/GB. Sample debug logs; route high-volume logs to S3 directly via Firehose; use Log Insights queries instead of pulling logs to your laptop.

Observability and Tagging

You cannot optimize what you cannot attribute. Tag every resource with at least team, environment, service, cost-center. Enforce tags via Service Control Policies or AWS Config. Without consistent tagging, Cost Explorer is opaque and conversations about who owns what spend are theological.

The minimum useful FinOps dashboard:

Daily spend per service (e.g., EC2, RDS, S3) and per environment.
Compute commitment coverage and utilization (Savings Plans should be near 100% utilized; coverage should be 70–90% of eligible spend).
Top 10 cost-growing resources week-over-week.
Spot interruption rate per workload.
Right-sizing recommendations from Compute Optimizer.

Trade-Offs

A few things to be honest about:

Spot is operationally heavier than On-Demand. Drain handling, multi-instance-type configuration, interruption monitoring. The cost win is large but the engineering investment is real.
Savings Plans are debt. A 3-year commitment is locked in. If your traffic drops 60% in year 2, you’re paying for capacity you don’t use. Prefer 1-year commitments at scale until you have multi-year capacity confidence.
Aggressive scaling creates cold starts. Scaling down at night saves money; scaling up in the morning has a latency penalty. Predictive scaling or scheduled minimum capacity smooths this.
Auto Scaling on the wrong metric is worse than not scaling. A service that scales on CPU but is memory-bound will OOM under load and continue to scale until the limit, then fail. Pick the metric that actually predicts saturation.

Closing

AWS cost optimization is not a tooling problem; it is a discipline problem. The mechanisms — Compute Savings Plans for the baseline, Spot for the burst and the stateless, Auto Scaling for the diurnal, Graviton for the free 20% — are all well-documented and have existed for years. What separates teams that pay 50% less from teams that don’t is the operational hygiene: tagging that supports attribution, dashboards that show coverage and waste, regular right-sizing reviews, and the willingness to commit to Savings Plans based on demonstrated baselines rather than aspirational ones. Spot specifically is underused not because it doesn’t work but because the engineering investment to handle interruptions correctly is non-trivial — and for stateless, distributed fleets that investment pays back within weeks of going live. Treat compute cost as a tuned parameter of the system, not a fixed input, and the bill comes down without any sacrifice in reliability.