Building Internal Developer Platforms on Kubernetes: A Comprehensive Guide


The rise of Platform Engineering has transformed how organizations enable their developers. Internal Developer Platforms (IDPs) abstract infrastructure complexity, provide self-service capabilities, and establish “golden paths” for common development tasks.

This comprehensive guide explores how to build an effective Internal Developer Platform on Kubernetes, focusing on practical implementation with tools like Backstage and modern platform engineering practices.

What is an Internal Developer Platform?

An Internal Developer Platform (IDP) is a curated set of tools, services, and workflows that enable developers to build, deploy, and manage applications without requiring deep infrastructure expertise.

Traditional Approach:
┌──────────────────────────────────────────────┐
│  Developer needs to:                         │
│  - Learn Kubernetes YAML                     │
│  - Understand networking                     │
│  - Configure CI/CD pipelines                 │
│  - Set up monitoring                         │
│  - Manage infrastructure                     │
│  - Handle security policies                  │
│  ↓ Result: Low velocity, high cognitive load │
└──────────────────────────────────────────────┘

With Internal Developer Platform:
┌──────────────────────────────────────────────┐
│  Developer uses:                             │
│  - Self-service portal                       │
│  - Template-based deployment                 │
│  - Automated CI/CD                           │
│  - Built-in observability                    │
│  - Standardized workflows                    │
│  - Integrated security                       │
│  ↓ Result: High velocity, focus on business  │
└──────────────────────────────────────────────┘

Platform Engineering vs DevOps

Platform Engineering is an evolution of DevOps that treats internal platforms as products:

AspectDevOpsPlatform Engineering
FocusProcess & CultureProduct & Experience
Approach”You build it, you run it""We enable you to build it”
ResponsibilityDevelopers own everythingPlatform team enables developers
AbstractionLow-level toolsHigh-level self-service
GoalAutomationDeveloper experience
Team StructureCross-functional teamsDedicated platform team

Core IDP Components

1. Developer Portal (Backstage)

Purpose: Unified interface for developers

  • Service catalog
  • Documentation hub
  • Software templates
  • Kubernetes integration
  • Plugins ecosystem

2. Self-Service Infrastructure

Purpose: Automated provisioning

  • Environment creation
  • Database provisioning
  • Secret management
  • Resource quotas

3. Golden Paths

Purpose: Standardized workflows

  • Application templates
  • CI/CD pipelines
  • Deployment strategies
  • Best practices

4. Observability

Purpose: Insights and debugging

  • Centralized logging
  • Metrics and dashboards
  • Distributed tracing
  • Cost visibility

5. Security and Governance

Purpose: Policy enforcement

  • RBAC and authentication
  • Network policies
  • Compliance checks
  • Audit trails

Installing Backstage on Kubernetes

Prerequisites

# Install Node.js (v18+)
# macOS
brew install node@18

# Install Backstage CLI
npm install -g @backstage/cli

Create Backstage Application

# Create new Backstage app
npx @backstage/create-app@latest

# Follow prompts:
# ? Enter a name for the app: my-platform
# ? Select database: PostgreSQL

cd my-platform

# Install dependencies
yarn install

# Run locally (development)
yarn dev

Backstage Configuration

# app-config.yaml
app:
  title: My Internal Developer Platform
  baseUrl: https://platform.mycompany.com

organization:
  name: My Company

backend:
  baseUrl: https://platform.mycompany.com
  listen:
    port: 7007
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}

catalog:
  import:
    entityFilename: catalog-info.yaml
  rules:
    - allow: [Component, System, API, Resource, Location]

  locations:
    # GitHub organization discovery
    - type: url
      target: https://github.com/myorg/backstage-entities/blob/main/catalog-info.yaml

kubernetes:
  serviceLocatorMethod:
    type: 'multiTenant'
  clusterLocatorMethods:
    - type: 'config'
      clusters:
        - url: https://kubernetes.default.svc
          name: production
          authProvider: 'serviceAccount'
          skipTLSVerify: false
          skipMetricsLookup: false

auth:
  environment: production
  providers:
    github:
      production:
        clientId: ${AUTH_GITHUB_CLIENT_ID}
        clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}

Deploy to Kubernetes

# backstage-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: backstage
---
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secrets
  namespace: backstage
type: Opaque
stringData:
  POSTGRES_USER: backstage
  POSTGRES_PASSWORD: changeme
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: backstage
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: backstage
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secrets
              key: POSTGRES_USER
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secrets
              key: POSTGRES_PASSWORD
        - name: POSTGRES_DB
          value: backstage
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: backstage
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backstage
  namespace: backstage
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backstage
  template:
    metadata:
      labels:
        app: backstage
    spec:
      serviceAccountName: backstage
      containers:
      - name: backstage
        image: myregistry/backstage:latest
        ports:
        - containerPort: 7007
        env:
        - name: POSTGRES_HOST
          value: postgres
        - name: POSTGRES_PORT
          value: "5432"
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secrets
              key: POSTGRES_USER
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secrets
              key: POSTGRES_PASSWORD
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: backstage
  namespace: backstage
spec:
  selector:
    app: backstage
  ports:
  - port: 80
    targetPort: 7007
  type: LoadBalancer
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: backstage
  namespace: backstage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backstage-reader
rules:
- apiGroups:
  - "*"
  resources:
  - "*"
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backstage-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: backstage-reader
subjects:
- kind: ServiceAccount
  name: backstage
  namespace: backstage

Build and Push Backstage Image

# Build Docker image
yarn build:backend

# Create Dockerfile
cat > packages/backend/Dockerfile <<'EOF'
FROM node:18-bullseye-slim

WORKDIR /app

# Install dependencies
COPY package.json yarn.lock ./
COPY packages/backend/package.json packages/backend/
RUN yarn install --frozen-lockfile --production

# Copy built backend
COPY packages/backend/dist packages/backend/dist

CMD ["node", "packages/backend/dist/index.js"]
EOF

# Build and push
docker build -t myregistry/backstage:latest -f packages/backend/Dockerfile .
docker push myregistry/backstage:latest

# Deploy to Kubernetes
kubectl apply -f backstage-deployment.yaml

Software Templates (Golden Paths)

Creating a Service Template

# templates/nodejs-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: nodejs-service
  title: Node.js Microservice
  description: Create a new Node.js microservice with CI/CD
  tags:
    - nodejs
    - recommended
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service Information
      required:
        - name
        - description
        - owner
      properties:
        name:
          title: Name
          type: string
          description: Unique name for the service
          pattern: '^[a-z0-9-]+$'
        description:
          title: Description
          type: string
          description: What does this service do?
        owner:
          title: Owner
          type: string
          description: Team or person responsible
          ui:field: OwnerPicker
          ui:options:
            allowedKinds:
              - Group
              - User

    - title: Configuration
      properties:
        database:
          title: Database
          type: string
          enum:
            - postgres
            - mysql
            - mongodb
            - none
          default: postgres
        cache:
          title: Cache
          type: boolean
          description: Enable Redis cache
          default: true

  steps:
    - id: fetch-base
      name: Fetch Base
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          database: ${{ parameters.database }}
          cache: ${{ parameters.cache }}

    - id: publish
      name: Publish to GitHub
      action: publish:github
      input:
        allowedHosts: ['github.com']
        description: ${{ parameters.description }}
        repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
        defaultBranch: main
        repoVisibility: private

    - id: create-argocd-app
      name: Create ArgoCD Application
      action: argocd:create-app
      input:
        name: ${{ parameters.name }}
        namespace: ${{ parameters.name }}
        repoUrl: https://github.com/myorg/${{ parameters.name }}
        path: kubernetes/

    - id: register
      name: Register Component
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

  output:
    links:
      - title: Repository
        url: ${{ steps.publish.output.remoteUrl }}
      - title: Open in catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}

Template Skeleton Structure

templates/nodejs-service/skeleton/
├── catalog-info.yaml
├── package.json
├── src/
│   └── index.js
├── Dockerfile
├── kubernetes/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml
├── .github/
│   └── workflows/
│       └── ci.yaml
└── README.md

Catalog Info Template

# templates/nodejs-service/skeleton/catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${{ values.name }}
  description: ${{ values.description }}
  annotations:
    github.com/project-slug: myorg/${{ values.name }}
    backstage.io/kubernetes-id: ${{ values.name }}
    argocd/app-name: ${{ values.name }}
  tags:
    - nodejs
    - microservice
  links:
    - url: https://github.com/myorg/${{ values.name }}
      title: Repository
      icon: github
spec:
  type: service
  lifecycle: production
  owner: ${{ values.owner }}
  system: platform
  dependsOn:
    {%- if values.database != 'none' %}
    - resource:${{ values.database }}
    {%- endif %}
    {%- if values.cache %}
    - resource:redis
    {%- endif %}
  providesApis:
    - ${{ values.name }}-api

Kubernetes Plugin Integration

Installing Kubernetes Plugin

# Install plugin
yarn add --cwd packages/app @backstage/plugin-kubernetes

# Install backend plugin
yarn add --cwd packages/backend @backstage/plugin-kubernetes-backend

Backend Configuration

// packages/backend/src/plugins/kubernetes.ts
import { KubernetesBuilder } from '@backstage/plugin-kubernetes-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
import { CatalogClient } from '@backstage/catalog-client';

export default async function createPlugin(
  env: PluginEnvironment,
): Promise<Router> {
  const catalogApi = new CatalogClient({ discoveryApi: env.discovery });

  const { router } = await KubernetesBuilder.createBuilder({
    logger: env.logger,
    config: env.config,
    catalogApi,
    permissions: env.permissions,
  }).build();

  return router;
}

Frontend Integration

// packages/app/src/components/catalog/EntityPage.tsx
import { EntityKubernetesContent } from '@backstage/plugin-kubernetes';

const serviceEntityPage = (
  <EntityLayout>
    <EntityLayout.Route path="/kubernetes" title="Kubernetes">
      <EntityKubernetesContent refreshIntervalMs={30000} />
    </EntityLayout.Route>
  </EntityLayout>
);

Self-Service Database Provisioning

Database Operator Template

# templates/database/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: postgresql-database
  title: PostgreSQL Database
  description: Provision a new PostgreSQL database
spec:
  owner: platform-team
  type: resource

  parameters:
    - title: Database Configuration
      required:
        - name
        - owner
      properties:
        name:
          title: Database Name
          type: string
          pattern: '^[a-z0-9-]+$'
        owner:
          title: Owner
          type: string
          ui:field: OwnerPicker
        size:
          title: Storage Size
          type: string
          enum:
            - 10Gi
            - 50Gi
            - 100Gi
          default: 10Gi
        backup:
          title: Enable Backups
          type: boolean
          default: true

  steps:
    - id: create-postgres
      name: Create PostgreSQL Instance
      action: kubernetes:apply
      input:
        namespaced: true
        namespace: databases
        manifest: |
          apiVersion: postgresql.cnpg.io/v1
          kind: Cluster
          metadata:
            name: ${{ parameters.name }}
          spec:
            instances: 3
            storage:
              size: ${{ parameters.size }}
            backup:
              enabled: ${{ parameters.backup }}
              retentionPolicy: "30d"

    - id: create-secret
      name: Create Database Credentials Secret
      action: kubernetes:apply
      input:
        manifest: |
          apiVersion: v1
          kind: Secret
          metadata:
            name: ${{ parameters.name }}-credentials
            namespace: databases
          stringData:
            database: ${{ parameters.name }}
            username: app
            password: ${{ generatePassword() }}

    - id: register
      name: Register Resource
      action: catalog:register
      input:
        catalogInfoContent: |
          apiVersion: backstage.io/v1alpha1
          kind: Resource
          metadata:
            name: ${{ parameters.name }}
            description: PostgreSQL database
            annotations:
              backstage.io/kubernetes-id: ${{ parameters.name }}
          spec:
            type: database
            owner: ${{ parameters.owner }}
            system: platform

CI/CD Integration with ArgoCD

ArgoCD Plugin for Backstage

# Install ArgoCD plugin
yarn add --cwd packages/app @roadiehq/backstage-plugin-argo-cd
yarn add --cwd packages/backend @roadiehq/backstage-plugin-argo-cd-backend

ArgoCD Backend Setup

// packages/backend/src/plugins/argocd.ts
import { createRouter } from '@roadiehq/backstage-plugin-argo-cd-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';

export default async function createPlugin(
  env: PluginEnvironment,
): Promise<Router> {
  return await createRouter({
    logger: env.logger,
    config: env.config,
  });
}

Configuration

# app-config.yaml
argocd:
  baseUrl: https://argocd.mycompany.com
  username: ${ARGOCD_USERNAME}
  password: ${ARGOCD_PASSWORD}
  appLocatorMethods:
    - type: 'config'
      instances:
        - name: production
          url: https://argocd.mycompany.com
          token: ${ARGOCD_TOKEN}

Cost Visibility with Kubecost

Kubecost Integration

# kubecost-values.yaml
global:
  prometheus:
    enabled: false
    fqdn: http://prometheus-server.prometheus.svc:80

kubecostProductConfigs:
  clusterName: production

ingress:
  enabled: true
  hosts:
    - cost.mycompany.com
# Install Kubecost
helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost \
  --create-namespace \
  -f kubecost-values.yaml

Kubecost Plugin for Backstage

// Custom plugin for cost visibility
export const CostWidget = () => {
  const { entity } = useEntity();
  const namespace = entity.metadata.annotations?.['backstage.io/kubernetes-namespace'];

  return (
    <InfoCard title="Monthly Cost">
      <iframe
        src={`https://cost.mycompany.com/allocation?namespace=${namespace}`}
        width="100%"
        height="400px"
      />
    </InfoCard>
  );
};

Developer Portal Features

Service Catalog

# catalog-info.yaml - System
apiVersion: backstage.io/v1alpha1
kind: System
metadata:
  name: ecommerce
  description: E-commerce platform
spec:
  owner: platform-team
---
# catalog-info.yaml - Component
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-service
  description: Handles checkout process
  annotations:
    github.com/project-slug: myorg/checkout-service
    backstage.io/kubernetes-id: checkout-service
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  lifecycle: production
  owner: checkout-team
  system: ecommerce
  dependsOn:
    - resource:postgres-checkout
    - resource:redis-cache
  providesApis:
    - checkout-api
---
# catalog-info.yaml - API
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: checkout-api
  description: Checkout API
spec:
  type: openapi
  lifecycle: production
  owner: checkout-team
  system: ecommerce
  definition:
    $text: https://github.com/myorg/checkout-service/blob/main/openapi.yaml

TechDocs Integration

# Install TechDocs
yarn add --cwd packages/app @backstage/plugin-techdocs
yarn add --cwd packages/backend @backstage/plugin-techdocs-backend
# mkdocs.yml
site_name: 'Checkout Service Documentation'
nav:
  - Home: index.md
  - Architecture: architecture.md
  - API Reference: api.md
  - Runbooks: runbooks.md

plugins:
  - techdocs-core

Platform Metrics and Monitoring

Golden Signals Dashboard

# grafana-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: platform-dashboard
  namespace: monitoring
data:
  platform-metrics.json: |
    {
      "dashboard": {
        "title": "Platform Health",
        "panels": [
          {
            "title": "Deployment Frequency",
            "targets": [{
              "expr": "rate(argocd_app_sync_total[1d])"
            }]
          },
          {
            "title": "Lead Time for Changes",
            "targets": [{
              "expr": "argocd_app_sync_duration_seconds"
            }]
          },
          {
            "title": "Mean Time to Recovery",
            "targets": [{
              "expr": "avg(time() - kube_pod_created)"
            }]
          },
          {
            "title": "Change Failure Rate",
            "targets": [{
              "expr": "rate(argocd_app_sync_failed_total[1d])"
            }]
          }
        ]
      }
    }

DORA Metrics Tracking

// Custom backend plugin for DORA metrics
export class DORAMetricsCollector {
  async getDeploymentFrequency(timeRange: string): Promise<number> {
    const syncs = await this.argocd.getSyncs(timeRange);
    return syncs.length / this.getDays(timeRange);
  }

  async getLeadTimeForChanges(): Promise<number> {
    const commits = await this.github.getCommits();
    const deployments = await this.argocd.getDeployments();

    return this.calculateAverageTime(commits, deployments);
  }

  async getMTTR(): Promise<number> {
    const incidents = await this.incidents.getResolved();
    return this.calculateAverageResolutionTime(incidents);
  }

  async getChangeFailureRate(): Promise<number> {
    const deployments = await this.argocd.getDeployments();
    const failed = deployments.filter(d => d.status === 'Failed');

    return (failed.length / deployments.length) * 100;
  }
}

Security and Compliance

Policy Enforcement with OPA

# opa-policy.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: platform-policies
  namespace: opa
data:
  policy.rego: |
    package platform.admission

    import future.keywords.if

    deny[msg] if {
      input.request.kind.kind == "Deployment"
      not input.request.object.spec.template.spec.securityContext
      msg := "Deployments must define securityContext"
    }

    deny[msg] if {
      input.request.kind.kind == "Deployment"
      container := input.request.object.spec.template.spec.containers[_]
      not container.resources.limits
      msg := sprintf("Container %v must define resource limits", [container.name])
    }

    deny[msg] if {
      input.request.kind.kind == "Deployment"
      container := input.request.object.spec.template.spec.containers[_]
      container.image
      not startswith(container.image, "myregistry.com/")
      msg := sprintf("Container %v uses unauthorized registry", [container.name])
    }

Automated Security Scanning

# .github/workflows/security-scan.yaml
name: Security Scan
on: [push]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myregistry.com/${{ github.repository }}:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload to Security Tab
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Policy Check
        run: |
          conftest test kubernetes/ --policy opa-policies/

Best Practices for IDP Success

1. Start with User Research

## Developer Survey Questions

1. What are your biggest pain points in deploying applications?
2. How much time do you spend on infrastructure tasks?
3. What would make your development workflow better?
4. What documentation is missing or unclear?
5. What repetitive tasks would you like automated?

2. Establish Golden Paths

Golden Path Characteristics:
✅ Opinionated but flexible
✅ Well-documented
✅ Automated end-to-end
✅ Secure by default
✅ Observable out-of-the-box
✅ Cost-optimized

3. Measure Platform Adoption

# Platform metrics to track
metrics:
  adoption:
    - services_using_templates
    - self_service_provisioning_rate
    - documentation_usage

  efficiency:
    - time_to_first_deployment
    - deployment_frequency
    - pr_to_production_time

  satisfaction:
    - nps_score
    - support_ticket_volume
    - developer_survey_results

4. Treat Platform as a Product

  • Assign product manager
  • Regular user feedback sessions
  • Roadmap planning
  • Feature prioritization
  • Marketing and communication
  • Training and onboarding

Common Pitfalls to Avoid

❌ Building Everything In-House

Problem: Reinventing the wheel Solution: Leverage existing tools (Backstage, ArgoCD, etc.)

❌ Too Much Abstraction

Problem: Hiding too much complexity Solution: Provide escape hatches for advanced users

❌ Lack of Documentation

Problem: Low adoption due to unclear usage Solution: Comprehensive, up-to-date documentation

❌ Ignoring Developer Feedback

Problem: Building features nobody wants Solution: Regular feedback loops and user research

❌ No Metrics

Problem: Can’t prove platform value Solution: Track DORA metrics, adoption, and satisfaction

Measuring Platform Success

Key Performance Indicators

MetricTargetMeasurement
Time to First Deploy< 1 dayOnboarding to production
Deployment FrequencyMultiple/dayGitOps sync rate
MTTR< 1 hourIncident to resolution
Platform Adoption> 80%Services on platform
Developer SatisfactionNPS > 50Quarterly surveys
Self-Service Rate> 90%Automated vs manual

Key Takeaways

  • Platform Engineering treats internal platforms as products
  • Backstage provides foundation for developer portals
  • Golden Paths standardize common workflows
  • Self-Service reduces toil and increases velocity
  • Templates enable consistent, secure deployments
  • Observability built-in from day one
  • Metrics prove platform value and guide improvements
  • Developer Experience is the primary focus

Resources for Further Learning


Building an Internal Developer Platform is a journey, not a destination. Start small, focus on developer needs, measure success, and iterate continuously. The investment in platform engineering pays dividends through increased developer productivity, reduced cognitive load, and faster time to market.