[18/24] U is for Upgrades: Managing Cluster Lifecycle
📚 This is Post #18 in the Kubernetes A-to-Z Series
Reading Order: ← Previous: Quality Assurance → Next: Authentication →
Series Progress: 18/24 complete | Difficulty: Advanced | Time: 30 min | Part 5/6: Operations
Upgrading Kubernetes is notoriously scary. It’s a complex distributed system, and changing the engine while the car is driving down the highway requires precision.
In this post, we’ll cover the “U” of Kubernetes: Upgrades.
The Golden Rule of Upgrades
Never skip a minor version.
Kubernetes versions are expressed as x.y.z (e.g., 1.29.1).
x: Major version (1)y: Minor version (29)z: Patch version (1)
You can upgrade from 1.28 to 1.29, but not from 1.28 to 1.30. You must go step-by-step.
Version Skew Policy
Kubernetes components have a specific compatibility matrix.
- kube-apiserver: The source of truth.
- kubelet: Can be up to 3 minor versions older than apiserver.
- kubectl: Can be +/- 1 minor version of apiserver.
This means you upgrade the Control Plane first, then the Worker Nodes.
Upgrade Strategies
1. In-Place Upgrade (Kubeadm)
This is the standard way for self-managed clusters.
Step 1: Upgrade Control Plane
# On control plane node
sudo apt-get update && sudo apt-get install -y kubeadm=1.29.0-00
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.29.0
Step 2: Upgrade Kubelet & Kubectl
sudo apt-get install -y kubelet=1.29.0-00 kubectl=1.29.0-00
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Step 3: Upgrade Worker Nodes (One by One) This is where it gets tricky. You need to move workloads off the node before upgrading it.
2. Node Draining
Before upgrading a node (or rebooting it), you must drain it.
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
This command:
- Cordons the node (marks it unschedulable).
- Evicts all pods (safely terminates them so they restart elsewhere).
Once the node is empty, you upgrade the kubelet/OS, reboot, and then uncordon it.
kubectl uncordon node-1
3. Blue/Green Clusters (The Cloud Way)
If you are using a managed service (EKS, GKE, AKS) or have good automation, it’s often safer to create a new cluster with the new version and switch traffic over.
- Create Cluster B (v1.29).
- Deploy apps to Cluster B.
- Switch DNS/LoadBalancer to Cluster B.
- Delete Cluster A (v1.28).
Pros: Zero risk to existing workloads if upgrade fails. Cons: Costs double during the transition; stateful apps are hard to move.
Summary
- Plan ahead: Read the release notes for breaking changes (API removals).
- Backup etcd: Always backup before upgrading.
- Drain nodes: Respect your workloads.
- One step at a time: Don’t skip versions.
Next Steps
We’re almost at the end of the alphabet! Next up is Y is for YAML, where we’ll master the language that defines it all.
Series Navigation:
- Previous: Q is for Quality Assurance
- Next: A is for Authentication