[18/24] U is for Upgrades: Managing Cluster Lifecycle

📚 This is Post #18 in the Kubernetes A-to-Z Series

Reading Order: ← Previous: Quality Assurance → Next: Authentication →

Series Progress: 18/24 complete | Difficulty: Advanced | Time: 30 min | Part 5/6: Operations

Upgrading Kubernetes is notoriously scary. It’s a complex distributed system, and changing the engine while the car is driving down the highway requires precision.

In this post, we’ll cover the “U” of Kubernetes: Upgrades.

The Golden Rule of Upgrades

Never skip a minor version.

Kubernetes versions are expressed as x.y.z (e.g., 1.29.1).

x: Major version (1)
y: Minor version (29)
z: Patch version (1)

You can upgrade from 1.28 to 1.29, but not from 1.28 to 1.30. You must go step-by-step.

Version Skew Policy

Kubernetes components have a specific compatibility matrix.

kube-apiserver: The source of truth.
kubelet: Can be up to 3 minor versions older than apiserver.
kubectl: Can be +/- 1 minor version of apiserver.

This means you upgrade the Control Plane first, then the Worker Nodes.

Upgrade Strategies

1. In-Place Upgrade (Kubeadm)

This is the standard way for self-managed clusters.

Step 1: Upgrade Control Plane

# On control plane node
sudo apt-get update && sudo apt-get install -y kubeadm=1.29.0-00
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.29.0

Step 2: Upgrade Kubelet & Kubectl

sudo apt-get install -y kubelet=1.29.0-00 kubectl=1.29.0-00
sudo systemctl daemon-reload
sudo systemctl restart kubelet

Step 3: Upgrade Worker Nodes (One by One) This is where it gets tricky. You need to move workloads off the node before upgrading it.

2. Node Draining

Before upgrading a node (or rebooting it), you must drain it.

kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

This command:

Cordons the node (marks it unschedulable).
Evicts all pods (safely terminates them so they restart elsewhere).

Once the node is empty, you upgrade the kubelet/OS, reboot, and then uncordon it.

kubectl uncordon node-1

3. Blue/Green Clusters (The Cloud Way)

If you are using a managed service (EKS, GKE, AKS) or have good automation, it’s often safer to create a new cluster with the new version and switch traffic over.

Create Cluster B (v1.29).
Deploy apps to Cluster B.
Switch DNS/LoadBalancer to Cluster B.
Delete Cluster A (v1.28).

Pros: Zero risk to existing workloads if upgrade fails. Cons: Costs double during the transition; stateful apps are hard to move.

Summary

Plan ahead: Read the release notes for breaking changes (API removals).
Backup etcd: Always backup before upgrading.
Drain nodes: Respect your workloads.
One step at a time: Don’t skip versions.

Next Steps

We’re almost at the end of the alphabet! Next up is Y is for YAML, where we’ll master the language that defines it all.

Series Navigation:

Previous: Q is for Quality Assurance
Next: A is for Authentication