One Kubernetes cluster is often enough until it is not. Disaster recovery, data residency, team isolation, and regional latency all push teams toward multiple clusters. Federation is the set of patterns and tools that keeps those clusters manageable instead of turning them into separate hand-maintained islands.

Why Multi-Cluster?

┌─────────────────────────────────────────────────┐
│  Multi-Cluster Use Cases                        │
│                                                 │
│  1. High Availability                           │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │ US  │ │ EU  │ │Asia │  Geographic DR     │
│     └─────┘ └─────┘ └─────┘                    │
│                                                 │
│  2. Isolation                                   │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │Prod │ │Stage│ │ Dev │  Environment       │
│     └─────┘ └─────┘ └─────┘                    │
│                                                 │
│  3. Compliance                                  │
│     ┌─────┐ ┌─────┐                            │
│     │GDPR │ │HIPAA│        Data Residency      │
│     └─────┘ └─────┘                            │
│                                                 │
│  4. Scale                                       │
│     ┌─────┐ ┌─────┐ ┌─────┐                    │
│     │Team1│ │Team2│ │Team3│  Organizational    │
│     └─────┘ └─────┘ └─────┘                    │
└─────────────────────────────────────────────────┘

Multi-Cluster Patterns

1. Replicated Pattern

┌─────────────────────────────────────────────────┐
│  Replicated: Same workloads everywhere          │
│                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │Cluster A │    │Cluster B │    │Cluster C │  │
│  │┌────────┐│    │┌────────┐│    │┌────────┐│  │
│  ││ App A  ││    ││ App A  ││    ││ App A  ││  │
│  ││ App B  ││    ││ App B  ││    ││ App B  ││  │
│  │└────────┘│    │└────────┘│    │└────────┘│  │
│  └──────────┘    └──────────┘    └──────────┘  │
│        │              │              │          │
│        └──────────────┼──────────────┘          │
│                       │                         │
│              Global Load Balancer               │
└─────────────────────────────────────────────────┘

2. Partitioned Pattern

┌─────────────────────────────────────────────────┐
│  Partitioned: Different workloads per cluster   │
│                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │  US-East │    │  EU-West │    │Asia-Pacific│ │
│  │┌────────┐│    │┌────────┐│    │┌────────┐│  │
│  ││US Users││    ││EU Users││    ││AP Users││  │
│  ││US Data ││    ││EU Data ││    ││AP Data ││  │
│  │└────────┘│    │└────────┘│    │└────────┘│  │
│  └──────────┘    └──────────┘    └──────────┘  │
│                                                 │
│  Data sovereignty + Latency optimization        │
└─────────────────────────────────────────────────┘

Kubernetes Federation v2 (KubeFed)

Installing KubeFed

# Install KubeFed
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm install kubefed kubefed-charts/kubefed \
  --namespace kube-federation-system \
  --create-namespace

# Join clusters
kubefedctl join cluster1 \
  --cluster-context cluster1-context \
  --host-cluster-context host-context \
  --v=2

kubefedctl join cluster2 \
  --cluster-context cluster2-context \
  --host-cluster-context host-context \
  --v=2

Federated Deployment

# federated-deployment.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: webapp
  namespace: production
spec:
  template:
    metadata:
      labels:
        app: webapp
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: webapp
      template:
        metadata:
          labels:
            app: webapp
        spec:
          containers:
          - name: webapp
            image: myapp:v1.0
            ports:
            - containerPort: 8080
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
    - name: cluster3
  overrides:
  - clusterName: cluster1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 5
  - clusterName: cluster2
    clusterOverrides:
    - path: "/spec/replicas"
      value: 3

Federated Service

# federated-service.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedService
metadata:
  name: webapp
  namespace: production
spec:
  template:
    spec:
      selector:
        app: webapp
      ports:
      - port: 80
        targetPort: 8080
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
    - name: cluster3

Service Mesh for Multi-Cluster

Istio Multi-Cluster

# istio-multicluster.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-multicluster
spec:
  profile: default
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster1
      network: network1
  meshConfig:
    defaultConfig:
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "true"
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"

Cross-Cluster Service Discovery

# service-entry.yaml
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-webapp
spec:
  hosts:
  - webapp.production.svc.cluster2.local
  location: MESH_INTERNAL
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS
  endpoints:
  - address: cluster2-istio-ingressgateway.istio-system.svc.cluster.local
    ports:
      http: 15443

Linkerd Multi-Cluster

# Link clusters
linkerd multicluster link --cluster-name cluster2 | kubectl apply -f -

# Export service
kubectl label svc webapp -n production mirror.linkerd.io/exported=true

# Access from other cluster
# webapp-cluster2.production.svc.cluster.local

GitOps Multi-Cluster with ArgoCD

ArgoCD ApplicationSet

# applicationset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: webapp
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - cluster: cluster1
        url: https://cluster1.example.com
        env: production
      - cluster: cluster2
        url: https://cluster2.example.com
        env: production
      - cluster: cluster3
        url: https://cluster3.example.com
        env: staging
  template:
    metadata:
      name: 'webapp-{{cluster}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/org/webapp
        targetRevision: HEAD
        path: 'deploy/{{env}}'
      destination:
        server: '{{url}}'
        namespace: webapp
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Cluster Generator

# cluster-generator.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: webapp-all-clusters
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          env: production
  template:
    metadata:
      name: 'webapp-{{name}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/org/webapp
        path: deploy/production
        targetRevision: HEAD
      destination:
        server: '{{server}}'
        namespace: webapp

Disaster Recovery

Velero Backup and Restore

# Install Velero
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.5.0 \
  --bucket my-backup-bucket \
  --backup-location-config region=us-west-2 \
  --snapshot-location-config region=us-west-2

# Create backup
velero backup create production-backup \
  --include-namespaces production \
  --ttl 720h

# Schedule backups
velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces production
# velero-backup-schedule.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: production-daily
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
    - production
    - database
    storageLocation: default
    volumeSnapshotLocations:
    - default
    ttl: 720h0m0s

Cross-Cluster Restore

# Restore to different cluster
velero restore create production-restore \
  --from-backup production-backup \
  --namespace-mappings production:production-dr

# Check restore status
velero restore describe production-restore
velero restore logs production-restore

Global Load Balancing

DNS-Based Load Balancing

# external-dns.yaml
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: webapp-global
  namespace: production
spec:
  endpoints:
  - dnsName: webapp.example.com
    recordTTL: 60
    recordType: A
    targets:
    - 203.0.113.1  # Cluster 1 IP
    - 203.0.113.2  # Cluster 2 IP
    - 203.0.113.3  # Cluster 3 IP
    setIdentifier: global
    providerSpecific:
    - name: aws/geolocation-country-code
      value: "*"

Multi-Cluster Ingress

# gke-multicluster-ingress.yaml
apiVersion: networking.gke.io/v1
kind: MultiClusterIngress
metadata:
  name: webapp-ingress
  namespace: production
spec:
  template:
    spec:
      backend:
        serviceName: webapp-mcs
        servicePort: 80
      rules:
      - host: webapp.example.com
        http:
          paths:
          - path: /
            backend:
              serviceName: webapp-mcs
              servicePort: 80
---
apiVersion: networking.gke.io/v1
kind: MultiClusterService
metadata:
  name: webapp-mcs
  namespace: production
spec:
  template:
    spec:
      selector:
        app: webapp
      ports:
      - port: 80
        targetPort: 8080
  clusters:
  - link: "us-east1/cluster1"
  - link: "europe-west1/cluster2"

Cluster API

Managing Clusters Declaratively

# cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: production-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production-cluster
spec:
  region: us-west-2
  sshKeyName: my-ssh-key

Multi-Cluster Commands

# KubeFed
kubefedctl join cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubefedctl unjoin cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubectl get kubefedclusters -n kube-federation-system

# Velero
velero backup create backup-name --include-namespaces ns1,ns2
velero restore create restore-name --from-backup backup-name
velero schedule create schedule-name --schedule="0 2 * * *"

# ArgoCD
argocd cluster add cluster-context --name cluster-name
argocd cluster list
argocd app sync webapp --cluster cluster-name

# Context switching
kubectl config get-contexts
kubectl config use-context cluster1-context
kubectl config current-context

Best Practices

  1. Consistent Configuration: Use GitOps for all clusters
  2. Network Connectivity: Ensure clusters can communicate
  3. Identity Federation: Centralized authentication
  4. Observability: Unified monitoring across clusters
  5. Backup Strategy: Regular cross-cluster backups
  6. DNS Strategy: Global DNS with health checks
  7. Service Mesh: For secure cross-cluster communication

What matters in practice

  • Multi-cluster enables HA, compliance, and scale
  • KubeFed federates resources across clusters
  • Service mesh enables cross-cluster communication
  • GitOps with ArgoCD manages multi-cluster deployments
  • Velero provides backup and disaster recovery
  • Global load balancing distributes traffic geographically
  • Cluster API manages cluster lifecycle declaratively

Where to go next

After multi-cluster management, the next concern is keeping releases calm while traffic is still flowing. That leads into Zero-Downtime Deployments and advanced deployment strategies.

Further reading