Building Internal Developer Platforms on Kubernetes: A Comprehensive Guide
The rise of Platform Engineering has transformed how organizations enable their developers. Internal Developer Platforms (IDPs) abstract infrastructure complexity, provide self-service capabilities, and establish “golden paths” for common development tasks.
This comprehensive guide explores how to build an effective Internal Developer Platform on Kubernetes, focusing on practical implementation with tools like Backstage and modern platform engineering practices.
What is an Internal Developer Platform?
An Internal Developer Platform (IDP) is a curated set of tools, services, and workflows that enable developers to build, deploy, and manage applications without requiring deep infrastructure expertise.
Traditional Approach:
┌──────────────────────────────────────────────┐
│ Developer needs to: │
│ - Learn Kubernetes YAML │
│ - Understand networking │
│ - Configure CI/CD pipelines │
│ - Set up monitoring │
│ - Manage infrastructure │
│ - Handle security policies │
│ ↓ Result: Low velocity, high cognitive load │
└──────────────────────────────────────────────┘
With Internal Developer Platform:
┌──────────────────────────────────────────────┐
│ Developer uses: │
│ - Self-service portal │
│ - Template-based deployment │
│ - Automated CI/CD │
│ - Built-in observability │
│ - Standardized workflows │
│ - Integrated security │
│ ↓ Result: High velocity, focus on business │
└──────────────────────────────────────────────┘
Platform Engineering vs DevOps
Platform Engineering is an evolution of DevOps that treats internal platforms as products:
| Aspect | DevOps | Platform Engineering |
|---|---|---|
| Focus | Process & Culture | Product & Experience |
| Approach | ”You build it, you run it" | "We enable you to build it” |
| Responsibility | Developers own everything | Platform team enables developers |
| Abstraction | Low-level tools | High-level self-service |
| Goal | Automation | Developer experience |
| Team Structure | Cross-functional teams | Dedicated platform team |
Core IDP Components
1. Developer Portal (Backstage)
Purpose: Unified interface for developers
- Service catalog
- Documentation hub
- Software templates
- Kubernetes integration
- Plugins ecosystem
2. Self-Service Infrastructure
Purpose: Automated provisioning
- Environment creation
- Database provisioning
- Secret management
- Resource quotas
3. Golden Paths
Purpose: Standardized workflows
- Application templates
- CI/CD pipelines
- Deployment strategies
- Best practices
4. Observability
Purpose: Insights and debugging
- Centralized logging
- Metrics and dashboards
- Distributed tracing
- Cost visibility
5. Security and Governance
Purpose: Policy enforcement
- RBAC and authentication
- Network policies
- Compliance checks
- Audit trails
Installing Backstage on Kubernetes
Prerequisites
# Install Node.js (v18+)
# macOS
brew install node@18
# Install Backstage CLI
npm install -g @backstage/cli
Create Backstage Application
# Create new Backstage app
npx @backstage/create-app@latest
# Follow prompts:
# ? Enter a name for the app: my-platform
# ? Select database: PostgreSQL
cd my-platform
# Install dependencies
yarn install
# Run locally (development)
yarn dev
Backstage Configuration
# app-config.yaml
app:
title: My Internal Developer Platform
baseUrl: https://platform.mycompany.com
organization:
name: My Company
backend:
baseUrl: https://platform.mycompany.com
listen:
port: 7007
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
catalog:
import:
entityFilename: catalog-info.yaml
rules:
- allow: [Component, System, API, Resource, Location]
locations:
# GitHub organization discovery
- type: url
target: https://github.com/myorg/backstage-entities/blob/main/catalog-info.yaml
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterLocatorMethods:
- type: 'config'
clusters:
- url: https://kubernetes.default.svc
name: production
authProvider: 'serviceAccount'
skipTLSVerify: false
skipMetricsLookup: false
auth:
environment: production
providers:
github:
production:
clientId: ${AUTH_GITHUB_CLIENT_ID}
clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}
Deploy to Kubernetes
# backstage-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
name: backstage
---
apiVersion: v1
kind: Secret
metadata:
name: postgres-secrets
namespace: backstage
type: Opaque
stringData:
POSTGRES_USER: backstage
POSTGRES_PASSWORD: changeme
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: backstage
spec:
selector:
app: postgres
ports:
- port: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
namespace: backstage
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secrets
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secrets
key: POSTGRES_PASSWORD
- name: POSTGRES_DB
value: backstage
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: postgres-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: backstage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: backstage
namespace: backstage
spec:
replicas: 2
selector:
matchLabels:
app: backstage
template:
metadata:
labels:
app: backstage
spec:
serviceAccountName: backstage
containers:
- name: backstage
image: myregistry/backstage:latest
ports:
- containerPort: 7007
env:
- name: POSTGRES_HOST
value: postgres
- name: POSTGRES_PORT
value: "5432"
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secrets
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secrets
key: POSTGRES_PASSWORD
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: backstage
namespace: backstage
spec:
selector:
app: backstage
ports:
- port: 80
targetPort: 7007
type: LoadBalancer
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: backstage
namespace: backstage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: backstage-reader
rules:
- apiGroups:
- "*"
resources:
- "*"
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: backstage-reader
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: backstage-reader
subjects:
- kind: ServiceAccount
name: backstage
namespace: backstage
Build and Push Backstage Image
# Build Docker image
yarn build:backend
# Create Dockerfile
cat > packages/backend/Dockerfile <<'EOF'
FROM node:18-bullseye-slim
WORKDIR /app
# Install dependencies
COPY package.json yarn.lock ./
COPY packages/backend/package.json packages/backend/
RUN yarn install --frozen-lockfile --production
# Copy built backend
COPY packages/backend/dist packages/backend/dist
CMD ["node", "packages/backend/dist/index.js"]
EOF
# Build and push
docker build -t myregistry/backstage:latest -f packages/backend/Dockerfile .
docker push myregistry/backstage:latest
# Deploy to Kubernetes
kubectl apply -f backstage-deployment.yaml
Software Templates (Golden Paths)
Creating a Service Template
# templates/nodejs-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: nodejs-service
title: Node.js Microservice
description: Create a new Node.js microservice with CI/CD
tags:
- nodejs
- recommended
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- description
- owner
properties:
name:
title: Name
type: string
description: Unique name for the service
pattern: '^[a-z0-9-]+$'
description:
title: Description
type: string
description: What does this service do?
owner:
title: Owner
type: string
description: Team or person responsible
ui:field: OwnerPicker
ui:options:
allowedKinds:
- Group
- User
- title: Configuration
properties:
database:
title: Database
type: string
enum:
- postgres
- mysql
- mongodb
- none
default: postgres
cache:
title: Cache
type: boolean
description: Enable Redis cache
default: true
steps:
- id: fetch-base
name: Fetch Base
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
database: ${{ parameters.database }}
cache: ${{ parameters.cache }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
allowedHosts: ['github.com']
description: ${{ parameters.description }}
repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
defaultBranch: main
repoVisibility: private
- id: create-argocd-app
name: Create ArgoCD Application
action: argocd:create-app
input:
name: ${{ parameters.name }}
namespace: ${{ parameters.name }}
repoUrl: https://github.com/myorg/${{ parameters.name }}
path: kubernetes/
- id: register
name: Register Component
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: '/catalog-info.yaml'
output:
links:
- title: Repository
url: ${{ steps.publish.output.remoteUrl }}
- title: Open in catalog
icon: catalog
entityRef: ${{ steps.register.output.entityRef }}
Template Skeleton Structure
templates/nodejs-service/skeleton/
├── catalog-info.yaml
├── package.json
├── src/
│ └── index.js
├── Dockerfile
├── kubernetes/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── ingress.yaml
├── .github/
│ └── workflows/
│ └── ci.yaml
└── README.md
Catalog Info Template
# templates/nodejs-service/skeleton/catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: ${{ values.name }}
description: ${{ values.description }}
annotations:
github.com/project-slug: myorg/${{ values.name }}
backstage.io/kubernetes-id: ${{ values.name }}
argocd/app-name: ${{ values.name }}
tags:
- nodejs
- microservice
links:
- url: https://github.com/myorg/${{ values.name }}
title: Repository
icon: github
spec:
type: service
lifecycle: production
owner: ${{ values.owner }}
system: platform
dependsOn:
{%- if values.database != 'none' %}
- resource:${{ values.database }}
{%- endif %}
{%- if values.cache %}
- resource:redis
{%- endif %}
providesApis:
- ${{ values.name }}-api
Kubernetes Plugin Integration
Installing Kubernetes Plugin
# Install plugin
yarn add --cwd packages/app @backstage/plugin-kubernetes
# Install backend plugin
yarn add --cwd packages/backend @backstage/plugin-kubernetes-backend
Backend Configuration
// packages/backend/src/plugins/kubernetes.ts
import { KubernetesBuilder } from '@backstage/plugin-kubernetes-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
import { CatalogClient } from '@backstage/catalog-client';
export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const catalogApi = new CatalogClient({ discoveryApi: env.discovery });
const { router } = await KubernetesBuilder.createBuilder({
logger: env.logger,
config: env.config,
catalogApi,
permissions: env.permissions,
}).build();
return router;
}
Frontend Integration
// packages/app/src/components/catalog/EntityPage.tsx
import { EntityKubernetesContent } from '@backstage/plugin-kubernetes';
const serviceEntityPage = (
<EntityLayout>
<EntityLayout.Route path="/kubernetes" title="Kubernetes">
<EntityKubernetesContent refreshIntervalMs={30000} />
</EntityLayout.Route>
</EntityLayout>
);
Self-Service Database Provisioning
Database Operator Template
# templates/database/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: postgresql-database
title: PostgreSQL Database
description: Provision a new PostgreSQL database
spec:
owner: platform-team
type: resource
parameters:
- title: Database Configuration
required:
- name
- owner
properties:
name:
title: Database Name
type: string
pattern: '^[a-z0-9-]+$'
owner:
title: Owner
type: string
ui:field: OwnerPicker
size:
title: Storage Size
type: string
enum:
- 10Gi
- 50Gi
- 100Gi
default: 10Gi
backup:
title: Enable Backups
type: boolean
default: true
steps:
- id: create-postgres
name: Create PostgreSQL Instance
action: kubernetes:apply
input:
namespaced: true
namespace: databases
manifest: |
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: ${{ parameters.name }}
spec:
instances: 3
storage:
size: ${{ parameters.size }}
backup:
enabled: ${{ parameters.backup }}
retentionPolicy: "30d"
- id: create-secret
name: Create Database Credentials Secret
action: kubernetes:apply
input:
manifest: |
apiVersion: v1
kind: Secret
metadata:
name: ${{ parameters.name }}-credentials
namespace: databases
stringData:
database: ${{ parameters.name }}
username: app
password: ${{ generatePassword() }}
- id: register
name: Register Resource
action: catalog:register
input:
catalogInfoContent: |
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: ${{ parameters.name }}
description: PostgreSQL database
annotations:
backstage.io/kubernetes-id: ${{ parameters.name }}
spec:
type: database
owner: ${{ parameters.owner }}
system: platform
CI/CD Integration with ArgoCD
ArgoCD Plugin for Backstage
# Install ArgoCD plugin
yarn add --cwd packages/app @roadiehq/backstage-plugin-argo-cd
yarn add --cwd packages/backend @roadiehq/backstage-plugin-argo-cd-backend
ArgoCD Backend Setup
// packages/backend/src/plugins/argocd.ts
import { createRouter } from '@roadiehq/backstage-plugin-argo-cd-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
return await createRouter({
logger: env.logger,
config: env.config,
});
}
Configuration
# app-config.yaml
argocd:
baseUrl: https://argocd.mycompany.com
username: ${ARGOCD_USERNAME}
password: ${ARGOCD_PASSWORD}
appLocatorMethods:
- type: 'config'
instances:
- name: production
url: https://argocd.mycompany.com
token: ${ARGOCD_TOKEN}
Cost Visibility with Kubecost
Kubecost Integration
# kubecost-values.yaml
global:
prometheus:
enabled: false
fqdn: http://prometheus-server.prometheus.svc:80
kubecostProductConfigs:
clusterName: production
ingress:
enabled: true
hosts:
- cost.mycompany.com
# Install Kubecost
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost \
--create-namespace \
-f kubecost-values.yaml
Kubecost Plugin for Backstage
// Custom plugin for cost visibility
export const CostWidget = () => {
const { entity } = useEntity();
const namespace = entity.metadata.annotations?.['backstage.io/kubernetes-namespace'];
return (
<InfoCard title="Monthly Cost">
<iframe
src={`https://cost.mycompany.com/allocation?namespace=${namespace}`}
width="100%"
height="400px"
/>
</InfoCard>
);
};
Developer Portal Features
Service Catalog
# catalog-info.yaml - System
apiVersion: backstage.io/v1alpha1
kind: System
metadata:
name: ecommerce
description: E-commerce platform
spec:
owner: platform-team
---
# catalog-info.yaml - Component
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: checkout-service
description: Handles checkout process
annotations:
github.com/project-slug: myorg/checkout-service
backstage.io/kubernetes-id: checkout-service
backstage.io/techdocs-ref: dir:.
spec:
type: service
lifecycle: production
owner: checkout-team
system: ecommerce
dependsOn:
- resource:postgres-checkout
- resource:redis-cache
providesApis:
- checkout-api
---
# catalog-info.yaml - API
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: checkout-api
description: Checkout API
spec:
type: openapi
lifecycle: production
owner: checkout-team
system: ecommerce
definition:
$text: https://github.com/myorg/checkout-service/blob/main/openapi.yaml
TechDocs Integration
# Install TechDocs
yarn add --cwd packages/app @backstage/plugin-techdocs
yarn add --cwd packages/backend @backstage/plugin-techdocs-backend
# mkdocs.yml
site_name: 'Checkout Service Documentation'
nav:
- Home: index.md
- Architecture: architecture.md
- API Reference: api.md
- Runbooks: runbooks.md
plugins:
- techdocs-core
Platform Metrics and Monitoring
Golden Signals Dashboard
# grafana-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: platform-dashboard
namespace: monitoring
data:
platform-metrics.json: |
{
"dashboard": {
"title": "Platform Health",
"panels": [
{
"title": "Deployment Frequency",
"targets": [{
"expr": "rate(argocd_app_sync_total[1d])"
}]
},
{
"title": "Lead Time for Changes",
"targets": [{
"expr": "argocd_app_sync_duration_seconds"
}]
},
{
"title": "Mean Time to Recovery",
"targets": [{
"expr": "avg(time() - kube_pod_created)"
}]
},
{
"title": "Change Failure Rate",
"targets": [{
"expr": "rate(argocd_app_sync_failed_total[1d])"
}]
}
]
}
}
DORA Metrics Tracking
// Custom backend plugin for DORA metrics
export class DORAMetricsCollector {
async getDeploymentFrequency(timeRange: string): Promise<number> {
const syncs = await this.argocd.getSyncs(timeRange);
return syncs.length / this.getDays(timeRange);
}
async getLeadTimeForChanges(): Promise<number> {
const commits = await this.github.getCommits();
const deployments = await this.argocd.getDeployments();
return this.calculateAverageTime(commits, deployments);
}
async getMTTR(): Promise<number> {
const incidents = await this.incidents.getResolved();
return this.calculateAverageResolutionTime(incidents);
}
async getChangeFailureRate(): Promise<number> {
const deployments = await this.argocd.getDeployments();
const failed = deployments.filter(d => d.status === 'Failed');
return (failed.length / deployments.length) * 100;
}
}
Security and Compliance
Policy Enforcement with OPA
# opa-policy.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: platform-policies
namespace: opa
data:
policy.rego: |
package platform.admission
import future.keywords.if
deny[msg] if {
input.request.kind.kind == "Deployment"
not input.request.object.spec.template.spec.securityContext
msg := "Deployments must define securityContext"
}
deny[msg] if {
input.request.kind.kind == "Deployment"
container := input.request.object.spec.template.spec.containers[_]
not container.resources.limits
msg := sprintf("Container %v must define resource limits", [container.name])
}
deny[msg] if {
input.request.kind.kind == "Deployment"
container := input.request.object.spec.template.spec.containers[_]
container.image
not startswith(container.image, "myregistry.com/")
msg := sprintf("Container %v uses unauthorized registry", [container.name])
}
Automated Security Scanning
# .github/workflows/security-scan.yaml
name: Security Scan
on: [push]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myregistry.com/${{ github.repository }}:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload to Security Tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Policy Check
run: |
conftest test kubernetes/ --policy opa-policies/
Best Practices for IDP Success
1. Start with User Research
## Developer Survey Questions
1. What are your biggest pain points in deploying applications?
2. How much time do you spend on infrastructure tasks?
3. What would make your development workflow better?
4. What documentation is missing or unclear?
5. What repetitive tasks would you like automated?
2. Establish Golden Paths
Golden Path Characteristics:
✅ Opinionated but flexible
✅ Well-documented
✅ Automated end-to-end
✅ Secure by default
✅ Observable out-of-the-box
✅ Cost-optimized
3. Measure Platform Adoption
# Platform metrics to track
metrics:
adoption:
- services_using_templates
- self_service_provisioning_rate
- documentation_usage
efficiency:
- time_to_first_deployment
- deployment_frequency
- pr_to_production_time
satisfaction:
- nps_score
- support_ticket_volume
- developer_survey_results
4. Treat Platform as a Product
- Assign product manager
- Regular user feedback sessions
- Roadmap planning
- Feature prioritization
- Marketing and communication
- Training and onboarding
Common Pitfalls to Avoid
❌ Building Everything In-House
Problem: Reinventing the wheel Solution: Leverage existing tools (Backstage, ArgoCD, etc.)
❌ Too Much Abstraction
Problem: Hiding too much complexity Solution: Provide escape hatches for advanced users
❌ Lack of Documentation
Problem: Low adoption due to unclear usage Solution: Comprehensive, up-to-date documentation
❌ Ignoring Developer Feedback
Problem: Building features nobody wants Solution: Regular feedback loops and user research
❌ No Metrics
Problem: Can’t prove platform value Solution: Track DORA metrics, adoption, and satisfaction
Measuring Platform Success
Key Performance Indicators
| Metric | Target | Measurement |
|---|---|---|
| Time to First Deploy | < 1 day | Onboarding to production |
| Deployment Frequency | Multiple/day | GitOps sync rate |
| MTTR | < 1 hour | Incident to resolution |
| Platform Adoption | > 80% | Services on platform |
| Developer Satisfaction | NPS > 50 | Quarterly surveys |
| Self-Service Rate | > 90% | Automated vs manual |
Key Takeaways
- Platform Engineering treats internal platforms as products
- Backstage provides foundation for developer portals
- Golden Paths standardize common workflows
- Self-Service reduces toil and increases velocity
- Templates enable consistent, secure deployments
- Observability built-in from day one
- Metrics prove platform value and guide improvements
- Developer Experience is the primary focus
Resources for Further Learning
- Backstage Documentation
- Platform Engineering Guide
- Team Topologies
- CNCF Platforms White Paper
- Backstage Community
- Platform Engineering Slack
Building an Internal Developer Platform is a journey, not a destination. Start small, focus on developer needs, measure success, and iterate continuously. The investment in platform engineering pays dividends through increased developer productivity, reduced cognitive load, and faster time to market.