Kubernetes Cost Optimization in Production


Cost optimization in Kubernetes is mostly a scheduling and resource management problem. Teams usually overpay because of oversized requests, low node utilization, and inefficient workload patterns.

This guide focuses on production-safe optimizations you can implement without hurting reliability.

1. Measure Before You Optimize

Track these metrics first:

  • CPU and memory request vs actual usage
  • Cluster and node utilization
  • Cost by namespace and workload
  • Idle resource ratio (requested but unused)

Use Prometheus + Grafana and cloud billing export to build cost dashboards by team.

2. Right-Size Resource Requests and Limits

Common anti-pattern:

  • requests are set close to peak, but peak happens only a few minutes per day.

Good practice:

  • set requests near steady-state usage
  • keep limits aligned with safe burst behavior
  • review monthly
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

3. Use Vertical and Horizontal Autoscaling Together

  • HPA scales replicas for traffic changes.
  • VPA recommends or adjusts pod sizing over time.
  • Cluster Autoscaler scales nodes based on unschedulable pods.

A practical pattern:

  • VPA in recommendation mode for stateless services.
  • HPA for request-driven scaling.
  • Cluster Autoscaler with mixed on-demand and spot pools.

4. Optimize Node Pools

Create node pools by workload type:

  • latency-sensitive pool (on-demand)
  • batch/asynchronous pool (spot/preemptible)
  • memory-heavy pool

Benefits:

  • better bin packing
  • reduced over-provisioning
  • clearer cost accountability

5. Improve Scheduling Efficiency

  • Avoid strict anti-affinity unless required.
  • Use topology spread constraints with balanced settings.
  • Keep pod requests realistic to improve bin packing.
  • Use taints/tolerations only when isolation is needed.

Poor scheduling decisions can increase node count by 20-40%.

6. Scale Down Non-Production Environments

Many teams forget this easy win:

  • schedule dev/staging scale-down after working hours
  • pause preview environments automatically
  • expire old ephemeral namespaces
kubectl scale deploy -n staging --all --replicas=0

Automate this with CI/CD or a nightly controller job.

7. Optimize Stateful Workloads

For databases and queues:

  • right-size storage class and IOPS tier
  • avoid oversized PVC defaults
  • apply retention policies for old snapshots
  • move cold data to cheaper storage tiers

Storage spend can silently exceed compute spend.

8. Add FinOps Guardrails

  • Enforce required labels: team, env, cost-center.
  • Block deployments without requests/limits.
  • Set namespace budgets and alerts.
  • Show cost impact in pull requests for large changes.

9. Quarterly Cost Review Template

For each critical workload:

  1. Current monthly cost
  2. p50/p95 resource usage
  3. Requested vs used ratio
  4. Scaling policy effectiveness
  5. Optimization actions and expected savings

High-Impact Actions (First 30 Days)

  1. Fix top 20 overprovisioned workloads.
  2. Turn on autoscaler tuning and spot for safe workloads.
  3. Auto-scale down non-prod outside office hours.
  4. Add policy to require requests/limits and cost labels.

These actions are usually enough to reduce Kubernetes spend materially without compromising SLOs.