Welcome to the twenty-fifth post in our Kubernetes A-to-Z Series! By now you have met Pods, ReplicaSets, Deployments, and Jobs in separate posts. This one zooms out and asks the bigger question: when you have an application to run, which workload resource should you actually pick? The answer depends on whether the app is stateless, stateful, scheduled, or node-local, and the wrong pick will cause subtle bugs in production.
What is a Workload?
In Kubernetes, a workload is an application running on your cluster. The actual process always runs inside a Pod, but you rarely create raw Pods yourself. Instead, you use a higher level workload resource that creates and manages Pods on your behalf, gives them a lifecycle, and reacts when something goes wrong.
The built-in workload resources are:
- Pod: the raw primitive. One or more co-located containers sharing network and storage.
- ReplicaSet: keeps N identical Pods running.
- Deployment: declarative wrapper around ReplicaSets that adds rolling updates and rollbacks.
- StatefulSet: like a Deployment, but each Pod has a stable identity (name, network, storage).
- DaemonSet: runs exactly one Pod per node (or per matching node).
- Job: runs Pods until a fixed number of them succeed, then stops.
- CronJob: creates Jobs on a schedule.
Custom resources (Operators, Argo Rollouts, Argo Workflows, KEDA ScaledJobs, Knative Services) extend this list, but every CRD ultimately produces Pods using the same mechanics, so understanding the built-ins is the foundation.
The Pod is Never the Final Answer
Raw Pod Workload Resource
┌───────────────────┐ ┌────────────────────────────┐
│ Pod: web-server │ │ Deployment: web-server │
│ - dies on node │ │ manages ReplicaSet │
│ crash │ │ manages 3 Pods │
│ - no replacement │ │ replaces dead Pods │
│ - no scaling │ │ rolling updates │
└───────────────────┘ └────────────────────────────┘
If you submit a raw Pod and the node it sits on dies, the Pod is gone. No controller will create a replacement. That is why production workloads always run under a controller.
Workload Types in Detail
Pod (Primitive Only)
A Pod is the smallest deployable unit. Use it directly only for short-lived debugging or one-off experiments. For anything that should outlive a node failure, wrap it.
# Quick debug shell, will not be restarted
kubectl run debug --image=busybox --rm -it -- sh
See the P is for Pods post for the full anatomy.
ReplicaSet
A ReplicaSet keeps a fixed number of identical Pods running. It is the lowest level controller that gives you self-healing.
You almost never write a ReplicaSet by hand. Deployments create ReplicaSets for you and use them as the unit of revision history. The R is for ReplicaSets post covers the standalone case.
Rule of thumb: if you find yourself writing kind: ReplicaSet in a YAML file, ask whether you really wanted kind: Deployment instead.
Deployment
A Deployment is the default choice for stateless applications: web servers, API gateways, stateless microservices, background workers that read from a queue and have no local state.
Key behaviors:
- Manages a ReplicaSet under the hood.
- Performs rolling updates when the Pod template changes. Old ReplicaSet scales down while new ReplicaSet scales up.
- Supports rollback via
kubectl rollout undo. - Pods get interchangeable identities. A Pod named
web-7d4f-abc12is functionally identical toweb-7d4f-xyz99.
Deep dive in D is for Deployments.
Minimal example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
labels:
app: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: nginx:1.27
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
StatefulSet
A StatefulSet manages Pods that need a stable identity. Use it for databases, message brokers, distributed stores, and anything that needs to know “I am replica 0” or “I am replica 2”.
Differences from a Deployment:
- Pods get predictable names:
mysql-0,mysql-1,mysql-2. The ordinal is part of the contract. - Pods are created and terminated in strict order by default. Pod 0 is ready before Pod 1 starts. Scaling down removes the highest ordinal first.
- Each Pod gets a stable DNS name through a headless Service:
mysql-0.mysql.default.svc.cluster.local. - Each Pod gets its own PersistentVolumeClaim via
volumeClaimTemplates. Whenmysql-0is rescheduled to another node, it reattaches to the same volume.
Use cases:
- Relational databases (PostgreSQL, MySQL primary/replica).
- Distributed databases (Cassandra, MongoDB ReplicaSet, Elasticsearch).
- Message brokers (Kafka, RabbitMQ cluster).
- Anything where a peer says “I trust the data on disk at
pvc-0”.
Example:
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
clusterIP: None
selector:
app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.4
ports:
- containerPort: 3306
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: data
mountPath: /var/lib/mysql
readinessProbe:
exec:
command: ["sh", "-c", "mysqladmin ping -h 127.0.0.1 -p$MYSQL_ROOT_PASSWORD"]
initialDelaySeconds: 10
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 20Gi
Note the headless Service (clusterIP: None) paired with serviceName: mysql. That headless Service is what gives each Pod its stable DNS name. Skip it, and replicas cannot find each other reliably.
DaemonSet
A DaemonSet runs one Pod per node. The controller watches the node list and ensures a Pod is scheduled on every node (or on every node matching a selector).
Use cases:
- Log shippers: Fluent Bit, Fluentd, Vector, Promtail.
- Node metrics agents: node-exporter, cAdvisor.
- Network plugins: Calico, Cilium, Flannel.
- Storage agents: CSI node drivers.
- Security agents: Falco, OSQuery, intrusion detection.
If the workload needs to inspect or expose something about the node itself, you want a DaemonSet.
Example: node-exporter DaemonSet.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
tolerations:
- operator: "Exists"
containers:
- name: node-exporter
image: quay.io/prometheus/node-exporter:v1.8.2
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--web.listen-address=:9100"
ports:
- containerPort: 9100
name: metrics
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
The tolerations: [{operator: "Exists"}] block tells the DaemonSet to schedule on every node, including control-plane nodes and nodes with custom taints. Without it, tainted nodes silently skip the DaemonSet, and you discover the gap only when a node stops shipping metrics.
Job
A Job runs Pods until a target number of them complete successfully, then stops. Unlike Deployments, a Job is finite.
Use cases:
- Database migrations on release.
- One-off data backfill.
- Batch processing: video transcode, ETL.
- CI/CD steps that run in-cluster.
Two important spec fields:
restartPolicy: must beNeverorOnFailure. NeverAlways.backoffLimit: how many times to retry a failed Pod before marking the Job failed. Default is 6.completionsandparallelism: for batch fan-out.
Minimal example:
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
backoffLimit: 3
ttlSecondsAfterFinished: 600
template:
spec:
restartPolicy: Never
containers:
- name: migrate
image: myapp/migrator:v1.2.0
command: ["./migrate", "up"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
The ttlSecondsAfterFinished: 600 field cleans up the Job and its Pods 10 minutes after completion, so finished migration Pods do not pile up in your namespace.
For deeper coverage of Job patterns, see J is for Jobs and CronJobs.
CronJob
A CronJob creates a Job on a recurring schedule, using standard cron syntax.
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 3 * * *"
timeZone: "Asia/Ho_Chi_Minh"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5
jobTemplate:
spec:
backoffLimit: 2
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: myapp/backup:v1.0.0
args: ["/bin/sh", "-c", "/scripts/backup.sh"]
A few fields that matter in practice:
concurrencyPolicy: Forbid: skip a run if the previous one is still going. Good for backups that should never overlap.concurrencyPolicy: Replace: kill the running one and start fresh.concurrencyPolicy: Allow(default): run them concurrently. Often the wrong choice.timeZone: respected as of Kubernetes 1.27 and later. Before that, schedules ran in the kube-controller-manager’s time zone, which surprised many teams.
Decision Tree: Picking the Right Workload
Walk through the questions in order. Stop at the first match.
Is the task short-lived and finite?
Yes -> Need to schedule it repeatedly?
Yes -> CronJob
No -> Job
No -> Continue.
Does the workload need a Pod on every node?
Yes -> DaemonSet
No -> Continue.
Does each replica need a stable identity (name, network, storage)?
Yes -> StatefulSet
No -> Deployment (default)
Concrete signals that push you toward each type:
| Signal | Workload |
|---|---|
| ”Replicas read and write to a local disk that must survive a reschedule” | StatefulSet |
| ”Replicas talk to each other by name to form a cluster” | StatefulSet |
| ”Replicas are interchangeable, traffic comes through a Service” | Deployment |
”I need to read /proc or /sys on the host” | DaemonSet |
| ”I need to install something on every new node automatically” | DaemonSet |
| ”Run once when we deploy a new release” | Job |
| ”Run every night at 3am” | CronJob |
Lifecycle Differences That Trip People Up
Scaling
- Deployment:
kubectl scale deploy/web --replicas=5. Pods come up in parallel, no ordering. - StatefulSet:
kubectl scale sts/mysql --replicas=5. Pods come up one at a time, in order. Scaling down also goes in reverse order. This is slow on purpose. - DaemonSet: you do not scale it. The replica count is “number of matching nodes”. To run more, cordon fewer nodes or add nodes.
- Job:
parallelismcontrols how many Pods run at once;completionscontrols the total successes needed.
Rolling Update Strategy
- Deployment:
RollingUpdate(default) orRecreate. RollingUpdate respectsmaxSurgeandmaxUnavailable. - StatefulSet:
RollingUpdate(default) updates from the highest ordinal down to 0.OnDeletemeans nothing happens until you manually delete a Pod, which is useful for databases where you want to control the rollout yourself. - DaemonSet:
RollingUpdate(default) orOnDelete. HonorsmaxUnavailableonly, sincemaxSurgedoes not make sense when there is exactly one Pod per node. - Job / CronJob: no rolling update. A change to the spec applies to the next Job created.
Pod Identity and Network
| Workload | Pod Name | DNS | Volume |
|---|---|---|---|
| Deployment | random suffix (web-7d4f-abc12) | round-robin via Service | shared or per-Pod ephemeral |
| StatefulSet | ordinal (mysql-0, mysql-1) | per-Pod via headless Service | per-Pod stable PVC |
| DaemonSet | node name suffix | per-node, often hostNetwork: true | usually hostPath |
| Job | random suffix | not addressable | per-Pod ephemeral |
Termination
- Deployment / StatefulSet / DaemonSet: Pods are restarted forever unless the workload is deleted.
- Job: Pods stop after the success count is reached. Set
ttlSecondsAfterFinishedto garbage collect. - CronJob: keeps a configurable history via
successfulJobsHistoryLimitandfailedJobsHistoryLimit.
Common Pitfalls
1. Using a Deployment for Something Stateful
A common mistake: using a Deployment with a single replica and a PersistentVolumeClaim for a database.
The replica count works, but on rollout (or node failure) the new Pod may schedule on a different node before the old Pod releases the volume. With ReadWriteOnce access mode, the new Pod gets stuck in ContainerCreating. Worse, with Recreate strategy you may still hit a race during reschedule.
If the data matters, use a StatefulSet. The strict order and per-Pod PVC are designed for exactly this.
2. Forgetting the Headless Service for a StatefulSet
A StatefulSet without a matching headless Service (clusterIP: None) still runs, but the per-Pod DNS names do not resolve. Cluster members cannot find each other, and you get cryptic errors from the database init script.
Always create the headless Service first and reference it via serviceName: in the StatefulSet spec.
3. DaemonSet Skipping Tainted Nodes
By default, a DaemonSet only schedules on nodes whose taints it tolerates. Control-plane nodes typically have node-role.kubernetes.io/control-plane:NoSchedule. A logging DaemonSet without tolerations will silently skip them, and you lose logs from the control plane.
Either add explicit tolerations for the taints you care about, or add a wildcard:
tolerations:
- operator: "Exists"
4. Job Retry Loops That Burn Money
A misconfigured Job with backoffLimit: 6 and a permanently failing image will spin up six Pods, each one slow to pull the image, before giving up. If the Pod requests a GPU or a large memory limit, this is expensive.
Set a low backoffLimit (1 or 2) for migrations. Set activeDeadlineSeconds to put a hard upper bound on total runtime.
spec:
backoffLimit: 2
activeDeadlineSeconds: 600
5. CronJob Schedules Running in the Wrong Time Zone
Before Kubernetes 1.27, CronJob schedules ran in the kube-controller-manager’s time zone, usually UTC. A schedule: "0 3 * * *" you assumed was 3 AM local time was actually 3 AM UTC. Always set timeZone: explicitly on Kubernetes 1.27 or newer, and double-check on older clusters.
6. Overlapping CronJob Runs
The default concurrencyPolicy: Allow lets a CronJob spawn a new Job even if the previous one is still running. For backups, batch ingest, or anything that touches the same data, this corrupts state. Use Forbid unless you specifically want overlap.
7. Deleting a StatefulSet Does Not Delete Its PVCs
By design, kubectl delete statefulset mysql removes the Pods but keeps the PersistentVolumeClaims. This protects your data, but it surprises new users who expect a clean slate. Use kubectl delete pvc -l app=mysql to actually release the storage. As of Kubernetes 1.27, you can set persistentVolumeClaimRetentionPolicy on the StatefulSet to opt into automatic PVC deletion.
Quick Workload Cheatsheet
| Workload | Scale | Rollout | Ordering | Identity | Use Case |
|---|---|---|---|---|---|
| Pod | none | none | none | random | Debugging only |
| ReplicaSet | manual | none | none | random | Rarely used directly |
| Deployment | manual or HPA | rolling, rollback | parallel | random | Stateless services |
| StatefulSet | manual or HPA | rolling (ordered) | strict ordinal | stable name, stable PVC | Databases, brokers, distributed stores |
| DaemonSet | per-node, automatic | rolling | per-node | per-node | Log shippers, CNI, node agents |
| Job | parallelism + completions | none | parallel | random | One-shot batch tasks |
| CronJob | via jobTemplate | none | per-schedule | random | Scheduled batch tasks |
Useful kubectl Snippets
# List every workload type at once
kubectl get deploy,sts,ds,job,cj -A
# Watch a rolling update
kubectl rollout status deploy/web
# Pause a Deployment mid-rollout (e.g. to tweak the rollout)
kubectl rollout pause deploy/web
kubectl rollout resume deploy/web
# Roll back to the previous revision
kubectl rollout undo deploy/web
# See the revision history
kubectl rollout history deploy/web
# Scale a StatefulSet (one Pod at a time, in order)
kubectl scale sts/mysql --replicas=5
# Trigger a CronJob manually for testing
kubectl create job --from=cronjob/nightly-backup backup-manual-$(date +%s)
# Get the per-Pod DNS for a StatefulSet member
kubectl get pod mysql-0 -o jsonpath='{.metadata.name}.{.spec.subdomain}'
Wrapping Up
Pick the workload that matches the shape of the application, not the shape of the YAML you remember writing last week.
- Stateless, horizontally scalable, traffic via Service: Deployment.
- Stateful, needs stable identity and per-replica storage: StatefulSet.
- One Pod per node, usually for observability or networking: DaemonSet.
- Finite task that runs to completion: Job.
- Finite task on a recurring schedule: CronJob.
Get this choice right and the rest of Kubernetes works with you. Get it wrong and you fight the platform every release.
Key Takeaways
- A workload is the high-level resource that manages Pods. You rarely manage Pods directly.
- Deployment is the default for stateless apps; StatefulSet is the default for stateful clusters.
- DaemonSet runs one Pod per node, for node-local concerns.
- Job and CronJob handle finite tasks, with
backoffLimitandconcurrencyPolicyas critical guard rails. - The biggest pitfalls are using the wrong workload for stateful data, forgetting the headless Service for StatefulSets, and misconfigured retry or schedule policies.
Resources for Further Learning
- Official Kubernetes Workloads Documentation
- Deployment Reference
- StatefulSet Reference
- DaemonSet Reference
- Job Reference
- CronJob Reference
Next Steps
Now that you can pick the right workload for the job, the next post tackles X is for eXtensions, covering CustomResourceDefinitions, Operators, and how to extend Kubernetes when the built-in workloads are not enough.