[20/24] F is for Federation: Multi-Cluster Management

This is Post #17 in the Kubernetes A-to-Z Series

Reading Order: Previous: Authentication and RBAC | Next: Zero-Downtime Deployments

Series Progress: 20/24 complete | Difficulty: Advanced | Time: 35-40 min | Part 6/6: Security & Production

Welcome to the seventeenth post in our Kubernetes A-to-Z Series! Now that you understand cluster security, let’s explore Federation - strategies for managing multiple Kubernetes clusters, enabling disaster recovery, geographic distribution, and organizational isolation.

Why Multi-Cluster?

┌─────────────────────────────────────────────────┐
│  Multi-Cluster Use Cases                        │
│                                                 │
│  1. High Availability                           │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │ US  │ │ EU  │ │Asia │  Geographic DR     │
│     └─────┘ └─────┘ └─────┘                    │
│                                                 │
│  2. Isolation                                   │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │Prod │ │Stage│ │ Dev │  Environment       │
│     └─────┘ └─────┘ └─────┘                    │
│                                                 │
│  3. Compliance                                  │
│     ┌─────┐ ┌─────┐                            │
│     │GDPR │ │HIPAA│        Data Residency      │
│     └─────┘ └─────┘                            │
│                                                 │
│  4. Scale                                       │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │Team1│ │Team2│ │Team3│  Organizational    │
│     └─────┘ └─────┘ └─────┘                    │
└─────────────────────────────────────────────────┘

Multi-Cluster Patterns

1. Replicated Pattern

┌─────────────────────────────────────────────────┐
│  Replicated: Same workloads everywhere          │
│                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │Cluster A │    │Cluster B │    │Cluster C │  │
│  │┌────────┐│    │┌────────┐│    │┌────────┐│  │
│  ││ App A  ││    ││ App A  ││    ││ App A  ││  │
│  ││ App B  ││    ││ App B  ││    ││ App B  ││  │
│  │└────────┘│    │└────────┘│    │└────────┘│  │
│  └──────────┘    └──────────┘    └──────────┘  │
│        │              │              │          │
│        └──────────────┼──────────────┘          │
│                       │                         │
│              Global Load Balancer               │
└─────────────────────────────────────────────────┘

2. Partitioned Pattern

┌─────────────────────────────────────────────────┐
│  Partitioned: Different workloads per cluster   │
│                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │  US-East │    │  EU-West │    │Asia-Pacific│ │
│  │┌────────┐│    │┌────────┐│    │┌────────┐│  │
│  ││US Users││    ││EU Users││    ││AP Users││  │
│  ││US Data ││    ││EU Data ││    ││AP Data ││  │
│  │└────────┘│    │└────────┘│    │└────────┘│  │
│  └──────────┘    └──────────┘    └──────────┘  │
│                                                 │
│  Data sovereignty + Latency optimization        │
└─────────────────────────────────────────────────┘

Kubernetes Federation v2 (KubeFed)

Installing KubeFed

# Install KubeFed
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm install kubefed kubefed-charts/kubefed \
  --namespace kube-federation-system \
  --create-namespace

# Join clusters
kubefedctl join cluster1 \
  --cluster-context cluster1-context \
  --host-cluster-context host-context \
  --v=2

kubefedctl join cluster2 \
  --cluster-context cluster2-context \
  --host-cluster-context host-context \
  --v=2

Federated Deployment

# federated-deployment.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: webapp
  namespace: production
spec:
  template:
    metadata:
      labels:
        app: webapp
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: webapp
      template:
        metadata:
          labels:
            app: webapp
        spec:
          containers:
          - name: webapp
            image: myapp:v1.0
            ports:
            - containerPort: 8080
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
    - name: cluster3
  overrides:
  - clusterName: cluster1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 5
  - clusterName: cluster2
    clusterOverrides:
    - path: "/spec/replicas"
      value: 3

Federated Service

# federated-service.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedService
metadata:
  name: webapp
  namespace: production
spec:
  template:
    spec:
      selector:
        app: webapp
      ports:
      - port: 80
        targetPort: 8080
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
    - name: cluster3

Service Mesh for Multi-Cluster

Istio Multi-Cluster

# istio-multicluster.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-multicluster
spec:
  profile: default
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster1
      network: network1
  meshConfig:
    defaultConfig:
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "true"
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"

Cross-Cluster Service Discovery

# service-entry.yaml
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-webapp
spec:
  hosts:
  - webapp.production.svc.cluster2.local
  location: MESH_INTERNAL
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS
  endpoints:
  - address: cluster2-istio-ingressgateway.istio-system.svc.cluster.local
    ports:
      http: 15443

Linkerd Multi-Cluster

# Link clusters
linkerd multicluster link --cluster-name cluster2 | kubectl apply -f -

# Export service
kubectl label svc webapp -n production mirror.linkerd.io/exported=true

# Access from other cluster
# webapp-cluster2.production.svc.cluster.local

GitOps Multi-Cluster with ArgoCD

ArgoCD ApplicationSet

# applicationset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: webapp
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - cluster: cluster1
        url: https://cluster1.example.com
        env: production
      - cluster: cluster2
        url: https://cluster2.example.com
        env: production
      - cluster: cluster3
        url: https://cluster3.example.com
        env: staging
  template:
    metadata:
      name: 'webapp-{{cluster}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/org/webapp
        targetRevision: HEAD
        path: 'deploy/{{env}}'
      destination:
        server: '{{url}}'
        namespace: webapp
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Cluster Generator

# cluster-generator.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: webapp-all-clusters
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          env: production
  template:
    metadata:
      name: 'webapp-{{name}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/org/webapp
        path: deploy/production
        targetRevision: HEAD
      destination:
        server: '{{server}}'
        namespace: webapp

Disaster Recovery

Velero Backup and Restore

# Install Velero
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.5.0 \
  --bucket my-backup-bucket \
  --backup-location-config region=us-west-2 \
  --snapshot-location-config region=us-west-2

# Create backup
velero backup create production-backup \
  --include-namespaces production \
  --ttl 720h

# Schedule backups
velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces production

# velero-backup-schedule.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: production-daily
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
    - production
    - database
    storageLocation: default
    volumeSnapshotLocations:
    - default
    ttl: 720h0m0s

Cross-Cluster Restore

# Restore to different cluster
velero restore create production-restore \
  --from-backup production-backup \
  --namespace-mappings production:production-dr

# Check restore status
velero restore describe production-restore
velero restore logs production-restore

Global Load Balancing

DNS-Based Load Balancing

# external-dns.yaml
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: webapp-global
  namespace: production
spec:
  endpoints:
  - dnsName: webapp.example.com
    recordTTL: 60
    recordType: A
    targets:
    - 203.0.113.1  # Cluster 1 IP
    - 203.0.113.2  # Cluster 2 IP
    - 203.0.113.3  # Cluster 3 IP
    setIdentifier: global
    providerSpecific:
    - name: aws/geolocation-country-code
      value: "*"

Multi-Cluster Ingress

# gke-multicluster-ingress.yaml
apiVersion: networking.gke.io/v1
kind: MultiClusterIngress
metadata:
  name: webapp-ingress
  namespace: production
spec:
  template:
    spec:
      backend:
        serviceName: webapp-mcs
        servicePort: 80
      rules:
      - host: webapp.example.com
        http:
          paths:
          - path: /
            backend:
              serviceName: webapp-mcs
              servicePort: 80
---
apiVersion: networking.gke.io/v1
kind: MultiClusterService
metadata:
  name: webapp-mcs
  namespace: production
spec:
  template:
    spec:
      selector:
        app: webapp
      ports:
      - port: 80
        targetPort: 8080
  clusters:
  - link: "us-east1/cluster1"
  - link: "europe-west1/cluster2"

Cluster API

Managing Clusters Declaratively

# cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: production-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production-cluster
spec:
  region: us-west-2
  sshKeyName: my-ssh-key

Multi-Cluster Commands

# KubeFed
kubefedctl join cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubefedctl unjoin cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubectl get kubefedclusters -n kube-federation-system

# Velero
velero backup create backup-name --include-namespaces ns1,ns2
velero restore create restore-name --from-backup backup-name
velero schedule create schedule-name --schedule="0 2 * * *"

# ArgoCD
argocd cluster add cluster-context --name cluster-name
argocd cluster list
argocd app sync webapp --cluster cluster-name

# Context switching
kubectl config get-contexts
kubectl config use-context cluster1-context
kubectl config current-context

Best Practices

Consistent Configuration: Use GitOps for all clusters
Network Connectivity: Ensure clusters can communicate
Identity Federation: Centralized authentication
Observability: Unified monitoring across clusters
Backup Strategy: Regular cross-cluster backups
DNS Strategy: Global DNS with health checks
Service Mesh: For secure cross-cluster communication

Key Takeaways

Multi-cluster enables HA, compliance, and scale
KubeFed federates resources across clusters
Service mesh enables cross-cluster communication
GitOps with ArgoCD manages multi-cluster deployments
Velero provides backup and disaster recovery
Global load balancing distributes traffic geographically
Cluster API manages cluster lifecycle declaratively

Next Steps

Now that you understand multi-cluster management, you’re ready for the final post - Zero-Downtime Deployments with advanced deployment strategies.

Resources for Further Learning

Series Navigation:

Previous: A is for Authentication and RBAC
Next: Z is for Zero-Downtime Deployments

Complete Series: Kubernetes A-to-Z Series Overview