[20/24] F is for Federation: Multi-Cluster Management
This is Post #17 in the Kubernetes A-to-Z Series
Reading Order: Previous: Authentication and RBAC | Next: Zero-Downtime Deployments
Series Progress: 20/24 complete | Difficulty: Advanced | Time: 35-40 min | Part 6/6: Security & Production
Welcome to the seventeenth post in our Kubernetes A-to-Z Series! Now that you understand cluster security, let’s explore Federation - strategies for managing multiple Kubernetes clusters, enabling disaster recovery, geographic distribution, and organizational isolation.
Why Multi-Cluster?
┌─────────────────────────────────────────────────┐
│ Multi-Cluster Use Cases │
│ │
│ 1. High Availability │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ US │ │ EU │ │Asia │ Geographic DR │
│ └─────┘ └─────┘ └─────┘ │
│ │
│ 2. Isolation │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Prod │ │Stage│ │ Dev │ Environment │
│ └─────┘ └─────┘ └─────┘ │
│ │
│ 3. Compliance │
│ ┌─────┐ ┌─────┐ │
│ │GDPR │ │HIPAA│ Data Residency │
│ └─────┘ └─────┘ │
│ │
│ 4. Scale │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Team1│ │Team2│ │Team3│ Organizational │
│ └─────┘ └─────┘ └─────┘ │
└─────────────────────────────────────────────────┘
Multi-Cluster Patterns
1. Replicated Pattern
┌─────────────────────────────────────────────────┐
│ Replicated: Same workloads everywhere │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Cluster A │ │Cluster B │ │Cluster C │ │
│ │┌────────┐│ │┌────────┐│ │┌────────┐│ │
│ ││ App A ││ ││ App A ││ ││ App A ││ │
│ ││ App B ││ ││ App B ││ ││ App B ││ │
│ │└────────┘│ │└────────┘│ │└────────┘│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ │ │
│ Global Load Balancer │
└─────────────────────────────────────────────────┘
2. Partitioned Pattern
┌─────────────────────────────────────────────────┐
│ Partitioned: Different workloads per cluster │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ US-East │ │ EU-West │ │Asia-Pacific│ │
│ │┌────────┐│ │┌────────┐│ │┌────────┐│ │
│ ││US Users││ ││EU Users││ ││AP Users││ │
│ ││US Data ││ ││EU Data ││ ││AP Data ││ │
│ │└────────┘│ │└────────┘│ │└────────┘│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Data sovereignty + Latency optimization │
└─────────────────────────────────────────────────┘
Kubernetes Federation v2 (KubeFed)
Installing KubeFed
# Install KubeFed
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm install kubefed kubefed-charts/kubefed \
--namespace kube-federation-system \
--create-namespace
# Join clusters
kubefedctl join cluster1 \
--cluster-context cluster1-context \
--host-cluster-context host-context \
--v=2
kubefedctl join cluster2 \
--cluster-context cluster2-context \
--host-cluster-context host-context \
--v=2
Federated Deployment
# federated-deployment.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: webapp
namespace: production
spec:
template:
metadata:
labels:
app: webapp
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: myapp:v1.0
ports:
- containerPort: 8080
placement:
clusters:
- name: cluster1
- name: cluster2
- name: cluster3
overrides:
- clusterName: cluster1
clusterOverrides:
- path: "/spec/replicas"
value: 5
- clusterName: cluster2
clusterOverrides:
- path: "/spec/replicas"
value: 3
Federated Service
# federated-service.yaml
apiVersion: types.kubefed.io/v1beta1
kind: FederatedService
metadata:
name: webapp
namespace: production
spec:
template:
spec:
selector:
app: webapp
ports:
- port: 80
targetPort: 8080
placement:
clusters:
- name: cluster1
- name: cluster2
- name: cluster3
Service Mesh for Multi-Cluster
Istio Multi-Cluster
# istio-multicluster.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istio-multicluster
spec:
profile: default
values:
global:
meshID: mesh1
multiCluster:
clusterName: cluster1
network: network1
meshConfig:
defaultConfig:
proxyMetadata:
ISTIO_META_DNS_CAPTURE: "true"
ISTIO_META_DNS_AUTO_ALLOCATE: "true"
Cross-Cluster Service Discovery
# service-entry.yaml
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-webapp
spec:
hosts:
- webapp.production.svc.cluster2.local
location: MESH_INTERNAL
ports:
- number: 80
name: http
protocol: HTTP
resolution: DNS
endpoints:
- address: cluster2-istio-ingressgateway.istio-system.svc.cluster.local
ports:
http: 15443
Linkerd Multi-Cluster
# Link clusters
linkerd multicluster link --cluster-name cluster2 | kubectl apply -f -
# Export service
kubectl label svc webapp -n production mirror.linkerd.io/exported=true
# Access from other cluster
# webapp-cluster2.production.svc.cluster.local
GitOps Multi-Cluster with ArgoCD
ArgoCD ApplicationSet
# applicationset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: webapp
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: cluster1
url: https://cluster1.example.com
env: production
- cluster: cluster2
url: https://cluster2.example.com
env: production
- cluster: cluster3
url: https://cluster3.example.com
env: staging
template:
metadata:
name: 'webapp-{{cluster}}'
spec:
project: default
source:
repoURL: https://github.com/org/webapp
targetRevision: HEAD
path: 'deploy/{{env}}'
destination:
server: '{{url}}'
namespace: webapp
syncPolicy:
automated:
prune: true
selfHeal: true
Cluster Generator
# cluster-generator.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: webapp-all-clusters
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'webapp-{{name}}'
spec:
project: default
source:
repoURL: https://github.com/org/webapp
path: deploy/production
targetRevision: HEAD
destination:
server: '{{server}}'
namespace: webapp
Disaster Recovery
Velero Backup and Restore
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.0 \
--bucket my-backup-bucket \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2
# Create backup
velero backup create production-backup \
--include-namespaces production \
--ttl 720h
# Schedule backups
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production
# velero-backup-schedule.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: production-daily
namespace: velero
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- database
storageLocation: default
volumeSnapshotLocations:
- default
ttl: 720h0m0s
Cross-Cluster Restore
# Restore to different cluster
velero restore create production-restore \
--from-backup production-backup \
--namespace-mappings production:production-dr
# Check restore status
velero restore describe production-restore
velero restore logs production-restore
Global Load Balancing
DNS-Based Load Balancing
# external-dns.yaml
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
name: webapp-global
namespace: production
spec:
endpoints:
- dnsName: webapp.example.com
recordTTL: 60
recordType: A
targets:
- 203.0.113.1 # Cluster 1 IP
- 203.0.113.2 # Cluster 2 IP
- 203.0.113.3 # Cluster 3 IP
setIdentifier: global
providerSpecific:
- name: aws/geolocation-country-code
value: "*"
Multi-Cluster Ingress
# gke-multicluster-ingress.yaml
apiVersion: networking.gke.io/v1
kind: MultiClusterIngress
metadata:
name: webapp-ingress
namespace: production
spec:
template:
spec:
backend:
serviceName: webapp-mcs
servicePort: 80
rules:
- host: webapp.example.com
http:
paths:
- path: /
backend:
serviceName: webapp-mcs
servicePort: 80
---
apiVersion: networking.gke.io/v1
kind: MultiClusterService
metadata:
name: webapp-mcs
namespace: production
spec:
template:
spec:
selector:
app: webapp
ports:
- port: 80
targetPort: 8080
clusters:
- link: "us-east1/cluster1"
- link: "europe-west1/cluster2"
Cluster API
Managing Clusters Declaratively
# cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: production-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: production-cluster
spec:
region: us-west-2
sshKeyName: my-ssh-key
Multi-Cluster Commands
# KubeFed
kubefedctl join cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubefedctl unjoin cluster-name --cluster-context ctx --host-cluster-context host-ctx
kubectl get kubefedclusters -n kube-federation-system
# Velero
velero backup create backup-name --include-namespaces ns1,ns2
velero restore create restore-name --from-backup backup-name
velero schedule create schedule-name --schedule="0 2 * * *"
# ArgoCD
argocd cluster add cluster-context --name cluster-name
argocd cluster list
argocd app sync webapp --cluster cluster-name
# Context switching
kubectl config get-contexts
kubectl config use-context cluster1-context
kubectl config current-context
Best Practices
- Consistent Configuration: Use GitOps for all clusters
- Network Connectivity: Ensure clusters can communicate
- Identity Federation: Centralized authentication
- Observability: Unified monitoring across clusters
- Backup Strategy: Regular cross-cluster backups
- DNS Strategy: Global DNS with health checks
- Service Mesh: For secure cross-cluster communication
Key Takeaways
- Multi-cluster enables HA, compliance, and scale
- KubeFed federates resources across clusters
- Service mesh enables cross-cluster communication
- GitOps with ArgoCD manages multi-cluster deployments
- Velero provides backup and disaster recovery
- Global load balancing distributes traffic geographically
- Cluster API manages cluster lifecycle declaratively
Next Steps
Now that you understand multi-cluster management, you’re ready for the final post - Zero-Downtime Deployments with advanced deployment strategies.
Resources for Further Learning
Series Navigation:
- Previous: A is for Authentication and RBAC
- Next: Z is for Zero-Downtime Deployments
Complete Series: Kubernetes A-to-Z Series Overview