[13/24] O is for Operators: Extending Kubernetes Functionality


This is Post #12 in the Kubernetes A-to-Z Series

Reading Order: Previous: Helm | Next: YAML

Series Progress: 13/24 complete | Difficulty: Advanced | Time: 35-40 min | Part 4/6: Advanced Concepts

Welcome to the twelfth post in our Kubernetes A-to-Z Series! Now that you understand Helm, let’s explore Operators - a powerful pattern for extending Kubernetes with domain-specific knowledge. Operators automate complex application lifecycle management using custom controllers.

What is an Operator?

An Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends Kubernetes by adding Custom Resource Definitions (CRDs) and custom controllers that encode operational knowledge.

Traditional Approach:
┌─────────────────────────────────────────────────┐
│  Human Operator                                 │
│  - Deploy database                              │
│  - Configure replication                        │
│  - Handle backups                               │
│  - Manage failover                              │
│  - Scale cluster                                │
│  Manual, error-prone, time-consuming            │
└─────────────────────────────────────────────────┘

Kubernetes Operator:
┌─────────────────────────────────────────────────┐
│  Software Operator                              │
│  ┌─────────────────────────────────────┐        │
│  │  Custom Controller                  │        │
│  │  - Watches Custom Resources         │        │
│  │  - Applies operational knowledge    │        │
│  │  - Automates day-2 operations       │        │
│  └─────────────────────────────────────┘        │
│  Automated, consistent, reliable                │
└─────────────────────────────────────────────────┘

Operator Benefits

  • Automation: Encode human operational knowledge
  • Consistency: Repeatable, error-free operations
  • Self-Healing: Automatic recovery from failures
  • Domain Knowledge: Application-specific logic
  • Native Integration: Uses Kubernetes APIs

Operator Pattern

Control Loop

┌─────────────────────────────────────────────────┐
│  Operator Control Loop                          │
│                                                 │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐     │
│  │ Observe │───►│ Analyze │───►│  Act    │     │
│  │ (Watch) │    │ (Diff)  │    │(Reconcile)    │
│  └────┬────┘    └─────────┘    └────┬────┘     │
│       │                              │          │
│       └──────────────────────────────┘          │
│              Continuous Loop                    │
└─────────────────────────────────────────────────┘

1. Observe: Watch for changes to custom resources
2. Analyze: Compare desired state vs actual state
3. Act: Make changes to reach desired state

Custom Resource Definitions (CRDs)

What is a CRD?

A CRD extends the Kubernetes API with new resource types. Once defined, you can create, read, update, and delete custom resources just like built-in resources.

Creating a CRD

# database-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required:
            - engine
            - version
            properties:
              engine:
                type: string
                enum: ["postgresql", "mysql", "mongodb"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 10
                default: 1
              storage:
                type: object
                properties:
                  size:
                    type: string
                    default: "10Gi"
                  storageClass:
                    type: string
              backup:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: false
                  schedule:
                    type: string
          status:
            type: object
            properties:
              phase:
                type: string
              replicas:
                type: integer
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    lastTransitionTime:
                      type: string
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Status
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames:
    - db

Creating Custom Resources

# my-database.yaml
apiVersion: example.com/v1
kind: Database
metadata:
  name: production-db
  namespace: production
spec:
  engine: postgresql
  version: "14"
  replicas: 3
  storage:
    size: "100Gi"
    storageClass: fast-ssd
  backup:
    enabled: true
    schedule: "0 2 * * *"
# Apply CRD
kubectl apply -f database-crd.yaml

# Create custom resource
kubectl apply -f my-database.yaml

# List databases
kubectl get databases
kubectl get db  # short name

# Describe database
kubectl describe database production-db

# Delete database
kubectl delete database production-db

Database Operators

# PostgreSQL Operator (Zalando)
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: production-cluster
spec:
  teamId: platform
  numberOfInstances: 3
  postgresql:
    version: "14"
  volume:
    size: 100Gi
    storageClass: fast-ssd
  resources:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "2"
      memory: "4Gi"
# MongoDB Operator
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-cluster
spec:
  members: 3
  type: ReplicaSet
  version: "6.0.5"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
  - name: admin
    db: admin
    passwordSecretRef:
      name: mongodb-admin-password
    roles:
    - name: clusterAdmin
      db: admin

Prometheus Operator

# ServiceMonitor for automatic scraping
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: webapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
# PrometheusRule for alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: webapp-alerts
spec:
  groups:
  - name: webapp
    rules:
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: High error rate detected

Cert-Manager Operator

# ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:
        ingress:
          class: nginx
---
# Certificate request
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: webapp-cert
  namespace: production
spec:
  secretName: webapp-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - www.example.com
  - api.example.com

Building Custom Operators

Operator SDK

# Install Operator SDK
brew install operator-sdk

# Create new operator project
operator-sdk init --domain example.com --repo github.com/example/database-operator

# Create API and controller
operator-sdk create api --group database --version v1 --kind Database --resource --controller

# Project structure
database-operator/
├── api/v1/
   └── database_types.go      # CRD types
├── controllers/
   └── database_controller.go # Reconciliation logic
├── config/
   ├── crd/                   # CRD manifests
   ├── rbac/                  # RBAC rules
   └── manager/               # Operator deployment
├── main.go
└── Makefile

Defining Types

// api/v1/database_types.go
package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

type DatabaseSpec struct {
    Engine   string `json:"engine"`
    Version  string `json:"version"`
    Replicas int32  `json:"replicas,omitempty"`
    Storage  StorageSpec `json:"storage,omitempty"`
}

type StorageSpec struct {
    Size         string `json:"size,omitempty"`
    StorageClass string `json:"storageClass,omitempty"`
}

type DatabaseStatus struct {
    Phase      string             `json:"phase,omitempty"`
    Replicas   int32              `json:"replicas,omitempty"`
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
type Database struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   DatabaseSpec   `json:"spec,omitempty"`
    Status DatabaseStatus `json:"status,omitempty"`
}

Controller Logic

// controllers/database_controller.go
package controllers

import (
    "context"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    databasev1 "github.com/example/database-operator/api/v1"
)

type DatabaseReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := ctrl.LoggerFrom(ctx)

    // Fetch the Database instance
    var database databasev1.Database
    if err := r.Get(ctx, req.NamespacedName, &database); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    log.Info("Reconciling Database", "name", database.Name)

    // Create StatefulSet if not exists
    // Create Service if not exists
    // Configure replication
    // Setup backups
    // Update status

    return ctrl.Result{}, nil
}

func (r *DatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&databasev1.Database{}).
        Complete(r)
}

Building and Deploying

# Generate manifests
make manifests

# Build operator image
make docker-build docker-push IMG=myregistry/database-operator:v1.0.0

# Deploy operator
make deploy IMG=myregistry/database-operator:v1.0.0

# Or install with OLM
operator-sdk olm install
operator-sdk run bundle myregistry/database-operator-bundle:v1.0.0

Operator Lifecycle Manager (OLM)

ClusterServiceVersion

# database-operator.clusterserviceversion.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
  name: database-operator.v1.0.0
spec:
  displayName: Database Operator
  description: Manages database clusters on Kubernetes
  version: 1.0.0
  replaces: database-operator.v0.9.0
  keywords:
  - database
  - postgresql
  maintainers:
  - name: Platform Team
    email: [email protected]
  provider:
    name: Example Inc
  installModes:
  - type: OwnNamespace
    supported: true
  - type: SingleNamespace
    supported: true
  - type: AllNamespaces
    supported: true
  install:
    strategy: deployment
    spec:
      deployments:
      - name: database-operator
        spec:
          replicas: 1
          selector:
            matchLabels:
              name: database-operator
          template:
            spec:
              containers:
              - name: operator
                image: myregistry/database-operator:v1.0.0
  customresourcedefinitions:
    owned:
    - name: databases.example.com
      version: v1
      kind: Database
      displayName: Database
      description: A managed database cluster

Troubleshooting Operators

# Check operator deployment
kubectl get pods -n operators
kubectl logs -n operators deployment/database-operator

# Check CRDs
kubectl get crds | grep example.com
kubectl describe crd databases.example.com

# Check custom resources
kubectl get databases -A
kubectl describe database production-db

# Check operator events
kubectl get events --field-selector involvedObject.kind=Database

# Check RBAC
kubectl auth can-i create databases --as=system:serviceaccount:operators:database-operator

Commands Reference

# CRD Management
kubectl get crds
kubectl describe crd databases.example.com
kubectl delete crd databases.example.com

# Custom Resources
kubectl get databases
kubectl describe database my-db
kubectl delete database my-db

# Operator SDK
operator-sdk init --domain example.com
operator-sdk create api --group db --version v1 --kind Database
make manifests
make docker-build IMG=myrepo/operator:tag
make deploy IMG=myrepo/operator:tag

# OLM
operator-sdk olm install
operator-sdk olm status

Key Takeaways

  • Operators encode operational knowledge in software
  • CRDs extend Kubernetes with custom resource types
  • Controllers implement reconciliation logic
  • Popular operators exist for databases, monitoring, certificates
  • Operator SDK simplifies building custom operators
  • OLM manages operator lifecycle and upgrades

Next Steps

Now that you understand Operators, you’re ready to explore Logging and Monitoring in the next post. We’ll learn how to implement observability for Kubernetes applications.

Resources for Further Learning


Series Navigation:

Complete Series: Kubernetes A-to-Z Series Overview