Kubernetes Cost Optimization in Production
Cost optimization in Kubernetes is mostly a scheduling and resource management problem. Teams usually overpay because of oversized requests, low node utilization, and inefficient workload patterns.
This guide focuses on production-safe optimizations you can implement without hurting reliability.
1. Measure Before You Optimize
Track these metrics first:
- CPU and memory request vs actual usage
- Cluster and node utilization
- Cost by namespace and workload
- Idle resource ratio (requested but unused)
Use Prometheus + Grafana and cloud billing export to build cost dashboards by team.
2. Right-Size Resource Requests and Limits
Common anti-pattern:
- requests are set close to peak, but peak happens only a few minutes per day.
Good practice:
- set requests near steady-state usage
- keep limits aligned with safe burst behavior
- review monthly
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
3. Use Vertical and Horizontal Autoscaling Together
- HPA scales replicas for traffic changes.
- VPA recommends or adjusts pod sizing over time.
- Cluster Autoscaler scales nodes based on unschedulable pods.
A practical pattern:
- VPA in recommendation mode for stateless services.
- HPA for request-driven scaling.
- Cluster Autoscaler with mixed on-demand and spot pools.
4. Optimize Node Pools
Create node pools by workload type:
- latency-sensitive pool (on-demand)
- batch/asynchronous pool (spot/preemptible)
- memory-heavy pool
Benefits:
- better bin packing
- reduced over-provisioning
- clearer cost accountability
5. Improve Scheduling Efficiency
- Avoid strict anti-affinity unless required.
- Use topology spread constraints with balanced settings.
- Keep pod requests realistic to improve bin packing.
- Use taints/tolerations only when isolation is needed.
Poor scheduling decisions can increase node count by 20-40%.
6. Scale Down Non-Production Environments
Many teams forget this easy win:
- schedule dev/staging scale-down after working hours
- pause preview environments automatically
- expire old ephemeral namespaces
kubectl scale deploy -n staging --all --replicas=0
Automate this with CI/CD or a nightly controller job.
7. Optimize Stateful Workloads
For databases and queues:
- right-size storage class and IOPS tier
- avoid oversized PVC defaults
- apply retention policies for old snapshots
- move cold data to cheaper storage tiers
Storage spend can silently exceed compute spend.
8. Add FinOps Guardrails
- Enforce required labels:
team,env,cost-center. - Block deployments without requests/limits.
- Set namespace budgets and alerts.
- Show cost impact in pull requests for large changes.
9. Quarterly Cost Review Template
For each critical workload:
- Current monthly cost
- p50/p95 resource usage
- Requested vs used ratio
- Scaling policy effectiveness
- Optimization actions and expected savings
High-Impact Actions (First 30 Days)
- Fix top 20 overprovisioned workloads.
- Turn on autoscaler tuning and spot for safe workloads.
- Auto-scale down non-prod outside office hours.
- Add policy to require requests/limits and cost labels.
These actions are usually enough to reduce Kubernetes spend materially without compromising SLOs.