One Kubernetes cluster is often enough until it is not. Disaster recovery, data residency, team isolation, and regional latency all push teams toward multiple clusters. Federation is the set of patterns and tools that keeps those clusters manageable instead of turning them into separate hand-maintained islands.
Why Multi-Cluster?
┌─────────────────────────────────────────────────┐
│ Multi-Cluster Use Cases │
│ │
│ 1. High Availability │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ US │ │ EU │ │Asia │ Geographic DR │
│ └─────┘ └─────┘ └─────┘ │
│ │
│ 2. Isolation │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Prod │ │Stage│ │ Dev │ Environment │
│ └─────┘ └─────┘ └─────┘ │
│ │
│ 3. Compliance │
│ ┌─────┐ ┌─────┐ │
│ │GDPR │ │HIPAA│ Data Residency │
│ └─────┘ └─────┘ │
│ │
│ 4. Scale │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Team1│ │Team2│ │Team3│ Organizational │
│ └─────┘ └─────┘ └─────┘ │
└─────────────────────────────────────────────────┘
Multi-Cluster Patterns
1. Replicated Pattern
┌─────────────────────────────────────────────────┐
│ Replicated: Same workloads everywhere │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Cluster A │ │Cluster B │ │Cluster C │ │
│ │┌────────┐│ │┌────────┐│ │┌────────┐│ │
│ ││ App A ││ ││ App A ││ ││ App A ││ │
│ ││ App B ││ ││ App B ││ ││ App B ││ │
│ │└────────┘│ │└────────┘│ │└────────┘│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ │ │
│ Global Load Balancer │
└─────────────────────────────────────────────────┘
2. Partitioned Pattern
┌─────────────────────────────────────────────────┐
│ Partitioned: Different workloads per cluster │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ US-East │ │ EU-West │ │Asia-Pacific│ │
│ │┌────────┐│ │┌────────┐│ │┌────────┐│ │
│ ││US Users││ ││EU Users││ ││AP Users││ │
│ ││US Data ││ ││EU Data ││ ││AP Data ││ │
│ │└────────┘│ │└────────┘│ │└────────┘│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Data sovereignty + Latency optimization │
└─────────────────────────────────────────────────┘
Kubernetes Federation v2 (KubeFed)
Installing KubeFed
# Install KubeFed
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm install kubefed kubefed-charts/kubefed \
--namespace kube-federation-system \
--create-namespace
# Join clusters
kubefedctl join cluster1 \
--cluster-context cluster1-context \
--host-cluster-context host-context \
--v=2
kubefedctl join cluster2 \
--cluster-context cluster2-context \
--host-cluster-context host-context \
--v=2
Federated Deployment
# federated-deployment.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: webapp
namespace: production
spec:
template:
metadata:
labels:
app: webapp
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: myapp:v1.0
ports:
- containerPort: 8080
placement:
clusters:
- name: cluster1
- name: cluster2
- name: cluster3
overrides:
- clusterName: cluster1
clusterOverrides:
- path: "/spec/replicas"
value: 5
- clusterName: cluster2
clusterOverrides:
- path: "/spec/replicas"
value: 3
Federated Service
# federated-service.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedService
metadata:
name: webapp
namespace: production
spec:
template:
spec:
selector:
app: webapp
ports:
- port: 80
targetPort: 8080
placement:
clusters:
- name: cluster1
- name: cluster2
- name: cluster3
Service Mesh for Multi-Cluster
Istio Multi-Cluster
# istio-multicluster.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istio-multicluster
spec:
profile: default
values:
global:
meshID: mesh1
multiCluster:
clusterName: cluster1
network: network1
meshConfig:
defaultConfig:
proxyMetadata:
ISTIO_META_DNS_CAPTURE: "true"
ISTIO_META_DNS_AUTO_ALLOCATE: "true"
Cross-Cluster Service Discovery
# service-entry.yaml
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-webapp
spec:
hosts:
- webapp.production.svc.cluster2.local
location: MESH_INTERNAL
ports:
- number: 80
name: http
protocol: HTTP
resolution: DNS
endpoints:
- address: cluster2-istio-ingressgateway.istio-system.svc.cluster.local
ports:
http: 15443
Linkerd Multi-Cluster
# Link clusters
linkerd multicluster link --cluster-name cluster2 | kubectl apply -f -
# Export service
kubectl label svc webapp -n production mirror.linkerd.io/exported=true
# Access from other cluster
# webapp-cluster2.production.svc.cluster.local
GitOps Multi-Cluster with ArgoCD
ArgoCD ApplicationSet
# applicationset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: webapp
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: cluster1
url: https://cluster1.example.com
env: production
- cluster: cluster2
url: https://cluster2.example.com
env: production
- cluster: cluster3
url: https://cluster3.example.com
env: staging
template:
metadata:
name: 'webapp-{{cluster}}'
spec:
project: default
source:
repoURL: https://github.com/org/webapp
targetRevision: HEAD
path: 'deploy/{{env}}'
destination:
server: '{{url}}'
namespace: webapp
syncPolicy:
automated:
prune: true
selfHeal: true
Cluster Generator
# cluster-generator.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: webapp-all-clusters
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'webapp-{{name}}'
spec:
project: default
source:
repoURL: https://github.com/org/webapp
path: deploy/production
targetRevision: HEAD
destination:
server: '{{server}}'
namespace: webapp
Disaster Recovery
Velero Backup and Restore
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.0 \
--bucket my-backup-bucket \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2
# Create backup
velero backup create production-backup \
--include-namespaces production \
--ttl 720h
# Schedule backups
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production
# velero-backup-schedule.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: production-daily
namespace: velero
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- database
storageLocation: default
volumeSnapshotLocations:
- default
ttl: 720h0m0s
Cross-Cluster Restore
# Restore to different cluster
velero restore create production-restore \
--from-backup production-backup \
--namespace-mappings production:production-dr
# Check restore status
velero restore describe production-restore
velero restore logs production-restore
Global Load Balancing
DNS-Based Load Balancing
# external-dns.yaml
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
name: webapp-global
namespace: production
spec:
endpoints:
- dnsName: webapp.example.com
recordTTL: 60
recordType: A
targets:
- 203.0.113.1 # Cluster 1 IP
- 203.0.113.2 # Cluster 2 IP
- 203.0.113.3 # Cluster 3 IP
setIdentifier: global
providerSpecific:
- name: aws/geolocation-country-code
value: "*"
Multi-Cluster Ingress
# gke-multicluster-ingress.yaml
apiVersion: networking.gke.io/v1
kind: MultiClusterIngress
metadata:
name: webapp-ingress
namespace: production
spec:
template:
spec:
backend:
serviceName: webapp-mcs
servicePort: 80
rules:
- host: webapp.example.com
http:
paths:
- path: /
backend:
serviceName: webapp-mcs
servicePort: 80
---
apiVersion: networking.gke.io/v1
kind: MultiClusterService
metadata:
name: webapp-mcs
namespace: production
spec:
template:
spec:
selector:
app: webapp
ports:
- port: 80
targetPort: 8080
clusters:
- link: "us-east1/cluster1"
- link: "europe-west1/cluster2"
Cluster API
Managing Clusters Declaratively
# cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: production-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: production-cluster
spec:
region: us-west-2
sshKeyName: my-ssh-key
Multi-Cluster Commands
# KubeFed
kubefedctl join cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubefedctl unjoin cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubectl get kubefedclusters -n kube-federation-system
# Velero
velero backup create backup-name --include-namespaces ns1,ns2
velero restore create restore-name --from-backup backup-name
velero schedule create schedule-name --schedule="0 2 * * *"
# ArgoCD
argocd cluster add cluster-context --name cluster-name
argocd cluster list
argocd app sync webapp --cluster cluster-name
# Context switching
kubectl config get-contexts
kubectl config use-context cluster1-context
kubectl config current-context
Best Practices
- Consistent Configuration: Use GitOps for all clusters
- Network Connectivity: Ensure clusters can communicate
- Identity Federation: Centralized authentication
- Observability: Unified monitoring across clusters
- Backup Strategy: Regular cross-cluster backups
- DNS Strategy: Global DNS with health checks
- Service Mesh: For secure cross-cluster communication
What matters in practice
- Multi-cluster enables HA, compliance, and scale
- KubeFed federates resources across clusters
- Service mesh enables cross-cluster communication
- GitOps with ArgoCD manages multi-cluster deployments
- Velero provides backup and disaster recovery
- Global load balancing distributes traffic geographically
- Cluster API manages cluster lifecycle declaratively
Where to go next
After multi-cluster management, the next concern is keeping releases calm while traffic is still flowing. That leads into Zero-Downtime Deployments and advanced deployment strategies.