[13/24] O is for Operators: Extending Kubernetes Functionality
This is Post #12 in the Kubernetes A-to-Z Series
Reading Order: Previous: Helm | Next: YAML
Series Progress: 13/24 complete | Difficulty: Advanced | Time: 35-40 min | Part 4/6: Advanced Concepts
Welcome to the twelfth post in our Kubernetes A-to-Z Series! Now that you understand Helm, let’s explore Operators - a powerful pattern for extending Kubernetes with domain-specific knowledge. Operators automate complex application lifecycle management using custom controllers.
What is an Operator?
An Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends Kubernetes by adding Custom Resource Definitions (CRDs) and custom controllers that encode operational knowledge.
Traditional Approach:
┌─────────────────────────────────────────────────┐
│ Human Operator │
│ - Deploy database │
│ - Configure replication │
│ - Handle backups │
│ - Manage failover │
│ - Scale cluster │
│ Manual, error-prone, time-consuming │
└─────────────────────────────────────────────────┘
Kubernetes Operator:
┌─────────────────────────────────────────────────┐
│ Software Operator │
│ ┌─────────────────────────────────────┐ │
│ │ Custom Controller │ │
│ │ - Watches Custom Resources │ │
│ │ - Applies operational knowledge │ │
│ │ - Automates day-2 operations │ │
│ └─────────────────────────────────────┘ │
│ Automated, consistent, reliable │
└─────────────────────────────────────────────────┘
Operator Benefits
- Automation: Encode human operational knowledge
- Consistency: Repeatable, error-free operations
- Self-Healing: Automatic recovery from failures
- Domain Knowledge: Application-specific logic
- Native Integration: Uses Kubernetes APIs
Operator Pattern
Control Loop
┌─────────────────────────────────────────────────┐
│ Operator Control Loop │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Observe │───►│ Analyze │───►│ Act │ │
│ │ (Watch) │ │ (Diff) │ │(Reconcile) │
│ └────┬────┘ └─────────┘ └────┬────┘ │
│ │ │ │
│ └──────────────────────────────┘ │
│ Continuous Loop │
└─────────────────────────────────────────────────┘
1. Observe: Watch for changes to custom resources
2. Analyze: Compare desired state vs actual state
3. Act: Make changes to reach desired state
Custom Resource Definitions (CRDs)
What is a CRD?
A CRD extends the Kubernetes API with new resource types. Once defined, you can create, read, update, and delete custom resources just like built-in resources.
Creating a CRD
# database-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required:
- engine
- version
properties:
engine:
type: string
enum: ["postgresql", "mysql", "mongodb"]
version:
type: string
replicas:
type: integer
minimum: 1
maximum: 10
default: 1
storage:
type: object
properties:
size:
type: string
default: "10Gi"
storageClass:
type: string
backup:
type: object
properties:
enabled:
type: boolean
default: false
schedule:
type: string
status:
type: object
properties:
phase:
type: string
replicas:
type: integer
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
lastTransitionTime:
type: string
subresources:
status: {}
additionalPrinterColumns:
- name: Engine
type: string
jsonPath: .spec.engine
- name: Version
type: string
jsonPath: .spec.version
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Status
type: string
jsonPath: .status.phase
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames:
- db
Creating Custom Resources
# my-database.yaml
apiVersion: example.com/v1
kind: Database
metadata:
name: production-db
namespace: production
spec:
engine: postgresql
version: "14"
replicas: 3
storage:
size: "100Gi"
storageClass: fast-ssd
backup:
enabled: true
schedule: "0 2 * * *"
# Apply CRD
kubectl apply -f database-crd.yaml
# Create custom resource
kubectl apply -f my-database.yaml
# List databases
kubectl get databases
kubectl get db # short name
# Describe database
kubectl describe database production-db
# Delete database
kubectl delete database production-db
Popular Operators
Database Operators
# PostgreSQL Operator (Zalando)
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: production-cluster
spec:
teamId: platform
numberOfInstances: 3
postgresql:
version: "14"
volume:
size: 100Gi
storageClass: fast-ssd
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
# MongoDB Operator
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: mongodb-cluster
spec:
members: 3
type: ReplicaSet
version: "6.0.5"
security:
authentication:
modes: ["SCRAM"]
users:
- name: admin
db: admin
passwordSecretRef:
name: mongodb-admin-password
roles:
- name: clusterAdmin
db: admin
Prometheus Operator
# ServiceMonitor for automatic scraping
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: webapp-monitor
labels:
release: prometheus
spec:
selector:
matchLabels:
app: webapp
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
# PrometheusRule for alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: webapp-alerts
spec:
groups:
- name: webapp
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: High error rate detected
Cert-Manager Operator
# ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
---
# Certificate request
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: webapp-cert
namespace: production
spec:
secretName: webapp-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- www.example.com
- api.example.com
Building Custom Operators
Operator SDK
# Install Operator SDK
brew install operator-sdk
# Create new operator project
operator-sdk init --domain example.com --repo github.com/example/database-operator
# Create API and controller
operator-sdk create api --group database --version v1 --kind Database --resource --controller
# Project structure
database-operator/
├── api/v1/
│ └── database_types.go # CRD types
├── controllers/
│ └── database_controller.go # Reconciliation logic
├── config/
│ ├── crd/ # CRD manifests
│ ├── rbac/ # RBAC rules
│ └── manager/ # Operator deployment
├── main.go
└── Makefile
Defining Types
// api/v1/database_types.go
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type DatabaseSpec struct {
Engine string `json:"engine"`
Version string `json:"version"`
Replicas int32 `json:"replicas,omitempty"`
Storage StorageSpec `json:"storage,omitempty"`
}
type StorageSpec struct {
Size string `json:"size,omitempty"`
StorageClass string `json:"storageClass,omitempty"`
}
type DatabaseStatus struct {
Phase string `json:"phase,omitempty"`
Replicas int32 `json:"replicas,omitempty"`
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
type Database struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec DatabaseSpec `json:"spec,omitempty"`
Status DatabaseStatus `json:"status,omitempty"`
}
Controller Logic
// controllers/database_controller.go
package controllers
import (
"context"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
databasev1 "github.com/example/database-operator/api/v1"
)
type DatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
}
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := ctrl.LoggerFrom(ctx)
// Fetch the Database instance
var database databasev1.Database
if err := r.Get(ctx, req.NamespacedName, &database); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
log.Info("Reconciling Database", "name", database.Name)
// Create StatefulSet if not exists
// Create Service if not exists
// Configure replication
// Setup backups
// Update status
return ctrl.Result{}, nil
}
func (r *DatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1.Database{}).
Complete(r)
}
Building and Deploying
# Generate manifests
make manifests
# Build operator image
make docker-build docker-push IMG=myregistry/database-operator:v1.0.0
# Deploy operator
make deploy IMG=myregistry/database-operator:v1.0.0
# Or install with OLM
operator-sdk olm install
operator-sdk run bundle myregistry/database-operator-bundle:v1.0.0
Operator Lifecycle Manager (OLM)
ClusterServiceVersion
# database-operator.clusterserviceversion.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
name: database-operator.v1.0.0
spec:
displayName: Database Operator
description: Manages database clusters on Kubernetes
version: 1.0.0
replaces: database-operator.v0.9.0
keywords:
- database
- postgresql
maintainers:
- name: Platform Team
email: [email protected]
provider:
name: Example Inc
installModes:
- type: OwnNamespace
supported: true
- type: SingleNamespace
supported: true
- type: AllNamespaces
supported: true
install:
strategy: deployment
spec:
deployments:
- name: database-operator
spec:
replicas: 1
selector:
matchLabels:
name: database-operator
template:
spec:
containers:
- name: operator
image: myregistry/database-operator:v1.0.0
customresourcedefinitions:
owned:
- name: databases.example.com
version: v1
kind: Database
displayName: Database
description: A managed database cluster
Troubleshooting Operators
# Check operator deployment
kubectl get pods -n operators
kubectl logs -n operators deployment/database-operator
# Check CRDs
kubectl get crds | grep example.com
kubectl describe crd databases.example.com
# Check custom resources
kubectl get databases -A
kubectl describe database production-db
# Check operator events
kubectl get events --field-selector involvedObject.kind=Database
# Check RBAC
kubectl auth can-i create databases --as=system:serviceaccount:operators:database-operator
Commands Reference
# CRD Management
kubectl get crds
kubectl describe crd databases.example.com
kubectl delete crd databases.example.com
# Custom Resources
kubectl get databases
kubectl describe database my-db
kubectl delete database my-db
# Operator SDK
operator-sdk init --domain example.com
operator-sdk create api --group db --version v1 --kind Database
make manifests
make docker-build IMG=myrepo/operator:tag
make deploy IMG=myrepo/operator:tag
# OLM
operator-sdk olm install
operator-sdk olm status
Key Takeaways
- Operators encode operational knowledge in software
- CRDs extend Kubernetes with custom resource types
- Controllers implement reconciliation logic
- Popular operators exist for databases, monitoring, certificates
- Operator SDK simplifies building custom operators
- OLM manages operator lifecycle and upgrades
Next Steps
Now that you understand Operators, you’re ready to explore Logging and Monitoring in the next post. We’ll learn how to implement observability for Kubernetes applications.
Resources for Further Learning
Series Navigation:
Complete Series: Kubernetes A-to-Z Series Overview