Cost optimization in Kubernetes is mostly a scheduling and resource management problem. Teams usually overpay because of oversized requests, low node utilization, and inefficient workload patterns.

The safest wins usually come from measuring waste first, then changing requests, autoscaling, and node pools in small steps.

Measure Before You Optimize

Track these metrics first:

  • CPU and memory request vs actual usage
  • Cluster and node utilization
  • Cost by namespace and workload
  • Idle resource ratio (requested but unused)

Use Prometheus + Grafana and cloud billing export to build cost dashboards by team.

Right-Size Resource Requests and Limits

Common anti-pattern:

  • requests are set close to peak, but peak happens only a few minutes per day.

Good practice:

  • set requests near steady-state usage
  • keep limits aligned with safe burst behavior
  • review monthly
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

Use Vertical and Horizontal Autoscaling Together

  • HPA scales replicas for traffic changes.
  • VPA recommends or adjusts pod sizing over time.
  • Cluster Autoscaler scales nodes based on unschedulable pods.

A practical pattern:

  • VPA in recommendation mode for stateless services.
  • HPA for request-driven scaling.
  • Cluster Autoscaler with mixed on-demand and spot pools.

Optimize Node Pools

Create node pools by workload type:

  • latency-sensitive pool (on-demand)
  • batch/asynchronous pool (spot/preemptible)
  • memory-heavy pool

Benefits:

  • better bin packing
  • reduced over-provisioning
  • clearer cost accountability

Improve Scheduling Efficiency

  • Avoid strict anti-affinity unless required.
  • Use topology spread constraints with balanced settings.
  • Keep pod requests realistic to improve bin packing.
  • Use taints/tolerations only when isolation is needed.

Poor scheduling decisions can increase node count by 20-40%.

Scale Down Non-Production Environments

Many teams forget this easy win:

  • schedule dev/staging scale-down after working hours
  • pause preview environments automatically
  • expire old ephemeral namespaces
kubectl scale deploy -n staging --all --replicas=0

Automate this with CI/CD or a nightly controller job.

Optimize Stateful Workloads

For databases and queues:

  • right-size storage class and IOPS tier
  • avoid oversized PVC defaults
  • apply retention policies for old snapshots
  • move cold data to cheaper storage tiers

Storage spend can silently exceed compute spend.

Add FinOps Guardrails

  • Enforce required labels: team, env, cost-center.
  • Block deployments without requests/limits.
  • Set namespace budgets and alerts.
  • Show cost impact in pull requests for large changes.

Quarterly Cost Review Template

For each critical workload:

  1. Current monthly cost
  2. p50/p95 resource usage
  3. Requested vs used ratio
  4. Scaling policy effectiveness
  5. Optimization actions and expected savings

High-Impact Actions (First 30 Days)

  1. Fix top 20 overprovisioned workloads.
  2. Turn on autoscaler tuning and spot for safe workloads.
  3. Auto-scale down non-prod outside office hours.
  4. Add policy to require requests/limits and cost labels.

These changes usually reduce spend without putting SLOs at risk, because they remove waste before touching reliability-sensitive workloads.