Kubernetes Cost Optimization in Production

Cost optimization in Kubernetes is mostly a scheduling and resource management problem. Teams usually overpay because of oversized requests, low node utilization, and inefficient workload patterns.

The safest wins usually come from measuring waste first, then changing requests, autoscaling, and node pools in small steps.

Measure Before You Optimize

Track these metrics first:

CPU and memory request vs actual usage
Cluster and node utilization
Cost by namespace and workload
Idle resource ratio (requested but unused)

Use Prometheus + Grafana and cloud billing export to build cost dashboards by team.

Right-Size Resource Requests and Limits

Common anti-pattern:

requests are set close to peak, but peak happens only a few minutes per day.

Good practice:

set requests near steady-state usage
keep limits aligned with safe burst behavior
review monthly

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

Use Vertical and Horizontal Autoscaling Together

HPA scales replicas for traffic changes.
VPA recommends or adjusts pod sizing over time.
Cluster Autoscaler scales nodes based on unschedulable pods.

A practical pattern:

VPA in recommendation mode for stateless services.
HPA for request-driven scaling.
Cluster Autoscaler with mixed on-demand and spot pools.

Optimize Node Pools

Create node pools by workload type:

latency-sensitive pool (on-demand)
batch/asynchronous pool (spot/preemptible)
memory-heavy pool

Benefits:

better bin packing
reduced over-provisioning
clearer cost accountability

Improve Scheduling Efficiency

Avoid strict anti-affinity unless required.
Use topology spread constraints with balanced settings.
Keep pod requests realistic to improve bin packing.
Use taints/tolerations only when isolation is needed.

Poor scheduling decisions can increase node count by 20-40%.

Scale Down Non-Production Environments

Many teams forget this easy win:

schedule dev/staging scale-down after working hours
pause preview environments automatically
expire old ephemeral namespaces

kubectl scale deploy -n staging --all --replicas=0

Automate this with CI/CD or a nightly controller job.

Optimize Stateful Workloads

For databases and queues:

right-size storage class and IOPS tier
avoid oversized PVC defaults
apply retention policies for old snapshots
move cold data to cheaper storage tiers

Storage spend can silently exceed compute spend.

Add FinOps Guardrails

Enforce required labels: team, env, cost-center.
Block deployments without requests/limits.
Set namespace budgets and alerts.
Show cost impact in pull requests for large changes.

Quarterly Cost Review Template

For each critical workload:

Current monthly cost
p50/p95 resource usage
Requested vs used ratio
Scaling policy effectiveness
Optimization actions and expected savings

High-Impact Actions (First 30 Days)

Fix top 20 overprovisioned workloads.
Turn on autoscaler tuning and spot for safe workloads.
Auto-scale down non-prod outside office hours.
Add policy to require requests/limits and cost labels.

These changes usually reduce spend without putting SLOs at risk, because they remove waste before touching reliability-sensitive workloads.