Skip to main content
InfraGap.com Logo
Home
Getting Started
Core Concept What is a CDE? How It Works Benefits CDE Assessment Getting Started Guide
Implementation
Architecture Patterns DevContainers Language Quickstarts IDE Integration AI/ML Workloads Advanced DevContainers
Operations
Performance Optimization High Availability & DR Monitoring Capacity Planning Troubleshooting Runbooks
Security
Security Deep Dive Secrets Management Vulnerability Management Network Security IAM Guide Compliance Guide
Planning
Pilot Program Design Stakeholder Communication Risk Management Migration Guide Cost Analysis Vendor Evaluation Training Resources Team Structure Industry Guides
Resources
Tools Comparison CDE vs Alternatives Case Studies Lessons Learned Glossary FAQ

Platform Engineering Team Structure

Build a world-class platform engineering team to deliver CDEs, standardized developer environments, and infrastructure automation. From role definitions to hiring frameworks.

Why Platform Engineering Teams?

The business case for dedicated platform engineering teams managing CDEs and developer infrastructure

Value Proposition

  • Developer Productivity

    90% faster onboarding, zero environment drift, one-click workspace provisioning

  • Security & Compliance

    Code never leaves VPC, HITRUST/SOC2 alignment, audit logging built-in

  • Cost Control

    Auto-stop idle workspaces, right-sized resources, eliminate over-provisioned laptops

  • Standardization

    Eliminate "works on my machine" with Infrastructure as Code templates

Business Impact

Developer Time Saved 40%

Less time fighting local environment issues

Onboarding Speed 90%

Minutes vs days for new hire setup

Infrastructure Cost Reduction 35%

Auto-stop policies and right-sizing

Security Incident Reduction 80%

Code stays in VPC, no laptop exfiltration

Team Structure Models

Three proven organizational models for platform engineering teams

Centralized

Single platform team owns all CDE infrastructure, templates, and tooling

CEO
 └── VP Engineering
      └── Platform Engineering (8-12)
           ├── Platform Lead
           ├── Platform Engineers (4-6)
           ├── DevEx Engineer (1-2)
           └── SRE (2-3)
Clear ownership and accountability
Consistent standards across all teams
Can become bottleneck at scale

Best For

Mid-sized companies (100-500 devs), regulated industries needing strict control

Embedded

RECOMMENDED

Platform engineers embedded in product teams, with central center of excellence

Platform CoE (4-6)
 ├── Platform Lead
 └── Core Platform Team

Product Teams
 ├── Team A (+ 1 Platform Eng)
 ├── Team B (+ 1 Platform Eng)
 └── Team C (+ 1 Platform Eng)
Team-specific customization
Faster iteration, fewer bottlenecks
Risk of template fragmentation

Best For

Large enterprises (500+ devs), fast-moving product orgs

Hybrid

Central team owns infrastructure, product teams self-serve with templates

Platform Team (6-8)
 ├── Owns: Infrastructure
 ├── Owns: Base Templates
 └── Owns: Self-Service Portal

Product Teams
 └── Customize Templates
 └── Self-Service Provisioning
Team autonomy with guardrails
Platform team focuses on core infra
Requires mature developer culture

Best For

Tech-forward companies, DevOps-mature organizations

Role Definitions

Clear responsibilities, required skills, and job descriptions for each platform engineering role

Platform Engineering Lead

IC or Manager depending on team size - Strategic owner of developer experience

Level
Senior/Staff

Key Responsibilities

  • Define platform engineering roadmap and vision
  • Own CDE tool selection (Coder, Gitpod, Codespaces)
  • Establish SLOs for platform availability and performance
  • Partner with security on compliance (HITRUST, SOC2)
  • Manage team hiring, growth, and career development
  • Communicate platform value to leadership (ROI metrics)

Required Skills

  • Deep Kubernetes and Terraform expertise
  • Experience with CDE platforms (Coder, Gitpod, etc)
  • AWS/Azure/GCP architecture design
  • Developer experience (DevEx) product mindset
  • Technical leadership and mentoring
  • Stakeholder management and communication

Typical Background

8+ years in DevOps/SRE/Platform Engineering. Has built or scaled developer infrastructure at 100+ developer companies. Deep experience with IaC, container orchestration, and developer tooling.

Platform Engineer

Builds and maintains CDE infrastructure and developer tooling

Level
Mid-Senior

Key Responsibilities

  • Build Terraform templates for workspace provisioning
  • Manage Kubernetes clusters for CDE infrastructure
  • Create DevContainer configurations for major stacks
  • Automate workspace lifecycle (auto-stop, backups)
  • Monitor platform health (Prometheus, Grafana)
  • Support developer onboarding and troubleshooting

Required Skills

  • Terraform and Infrastructure as Code
  • Kubernetes (deployments, services, ingress)
  • Docker and containerization
  • CI/CD pipelines (GitHub Actions, GitLab CI)
  • Scripting (Python, Bash, Go)
  • Networking and security fundamentals

Typical Background

4-7 years in DevOps or SRE. Experience managing cloud infrastructure and container platforms. Comfortable writing code and building automation.

Developer Experience Engineer

Product manager for internal developer platform

Level
Mid-Senior

Key Responsibilities

  • Gather developer feedback and pain points
  • Design self-service portal and documentation
  • Create onboarding guides and video tutorials
  • Measure developer satisfaction (NPS, surveys)
  • Run office hours and training sessions
  • Prioritize platform features based on developer needs

Required Skills

  • Strong communication and empathy
  • Technical writing and documentation
  • Product management fundamentals
  • Data analysis (usage metrics, adoption rates)
  • Software development background
  • User research and feedback synthesis

Typical Background

Former software engineer or technical writer with passion for developer tools. 3-5 years experience. Hybrid tech/product skillset.

SRE / Operations Engineer

Ensures platform reliability, uptime, and incident response

Level
Mid-Senior

Key Responsibilities

  • Define and monitor SLOs/SLIs for platform
  • Incident response and postmortem leadership
  • Implement observability (logs, metrics, traces)
  • Capacity planning and resource optimization
  • Disaster recovery and backup strategies
  • On-call rotation and escalation procedures

Required Skills

  • Observability tools (Prometheus, Grafana, Datadog)
  • Kubernetes troubleshooting and debugging
  • Incident management and RCA
  • Performance optimization and tuning
  • Automation and scripting
  • High-pressure problem solving

Typical Background

5+ years in SRE or operations. Experience maintaining high-availability systems. Strong troubleshooting and debugging skills.

Security Engineer (Platform)

Embeds security into CDE infrastructure and templates

Level
Senior

Key Responsibilities

  • Implement IAM and RBAC for workspace access
  • Network security and VPC design
  • Secrets management (HashiCorp Vault, AWS Secrets)
  • Container security scanning and hardening
  • Compliance audit preparation (HITRUST, SOC2)
  • Security training for developers

Required Skills

  • Cloud security (AWS IAM, Azure AD, GCP IAM)
  • Kubernetes security best practices
  • Zero-trust architecture
  • Compliance frameworks (HITRUST, SOC2, FedRAMP)
  • Security tooling (Falco, Trivy, OPA)
  • Threat modeling and risk assessment

Typical Background

6+ years in security engineering. Cloud infrastructure security experience. Often embedded from central security team.

Team Sizing by Organization Size

How many platform engineers do you need? Guidance based on developer count

Startup

Developer Count
10-50
Platform Engineers 1-2
DevEx Engineer 0-1*
SRE Shared
Security Shared

Strategy: Start with 1 platform engineer who wears multiple hats.

* DevEx is part-time or shared responsibility

Mid-Market

MOST COMMON
Developer Count
50-500
Platform Lead 1
Platform Engineers 3-6
DevEx Engineer 1-2
SRE 1-2
Security 0.5-1

Strategy: Dedicated team with clear specializations. Security is embedded part-time or full-time.

Enterprise

Developer Count
500+
Platform Lead 1 (Manager)
Platform Engineers 8-15
DevEx Engineers 2-3
SRE 3-5
Security 2-3

Strategy: Large centralized team OR embedded model with CoE. Often multi-region support.

Rule of Thumb: Platform Engineer to Developer Ratio

1:25
Early Stage
Lots of setup work
1:50
Mature Platform
Automated workflows
1:100
Self-Service
Highly automated

Responsibilities Matrix

Who owns what? Clear ownership mapping for platform engineering teams

Responsibility Area Lead Engineer DevEx SRE Security
CDE Tool Selection -
Infrastructure (K8s, Cloud) -
Terraform Templates - -
DevContainer Configs - - -
Monitoring & Observability - - -
Incident Response - -
Documentation - - -
Developer Support - - -
Security Policies (IAM, RBAC) - - -
Compliance Audits - -
Cost Optimization - -
Owner: Primary responsibility
Collaborator: Shared ownership
Reviewer: Reviews/approves work

Hiring Checklist & Interview Guide

Comprehensive guide to evaluate platform engineering candidates effectively

Recommended Interview Process

1

Resume Screen

15 min

2

Phone Screen

30 min

3

Take-Home

4-6 hrs

4

Technical Deep-Dive

90 min

5

Final Panel

60 min

Resume Screening Criteria

Must-Have Keywords

  • Kubernetes / K8s
  • Terraform / IaC
  • Docker / Containers
  • CI/CD pipelines
  • Cloud (AWS/Azure/GCP)

Nice-to-Have

  • Platform Engineering
  • Developer Experience (DevEx)
  • Internal Developer Platform
  • Coder / Gitpod / Codespaces
  • DevContainers

Red Flags

  • Only on-prem experience
  • No automation/scripting
  • Job-hopping every 6 months
  • Buzzword-heavy, no depth
  • No team collaboration mentions

Phone Screen Questions (30 min)

Background & Motivation (10 min)

  • "Tell me about your current role and what you're looking for in your next position."
  • "What draws you to platform engineering specifically?"
  • "What's the scale of infrastructure you've managed? (users, clusters, requests/sec)"

Technical Baseline (15 min)

  • "Explain Kubernetes at a high level to a non-technical stakeholder."
  • "What's your experience with infrastructure as code? Which tools?"
  • "Tell me about a production incident you handled. What was your role?"

Phone Screen Pass/Fail Criteria

Pass if: Clear communication, relevant experience, genuine interest, can explain technical concepts
Reject if: Can't explain basic K8s concepts, no cloud experience, poor communication, compensation misalignment

Technical Deep-Dive Questions

Infrastructure as Code (15 min)

  • "Walk me through how you'd provision a Kubernetes-based CDE using Terraform"
  • "How would you manage state across multiple environments?"
  • "Explain Terraform modules vs workspaces - when would you use each?"
  • "How do you handle secrets in Terraform? What about drift detection?"
  • "Describe your strategy for Terraform code review and testing"

Kubernetes Deep-Dive (20 min)

  • "Design a namespace strategy for multi-tenant CDE workspaces"
  • "How would you implement resource quotas and limit ranges for dev teams?"
  • "Explain pod security standards and admission controllers"
  • "A pod is stuck in Pending. Walk me through your debugging process."
  • "How would you implement persistent storage for developer workspaces?"
  • "Explain network policies and how you'd isolate workspaces"

Developer Experience (15 min)

  • "A developer says workspaces are slow. Walk me through your debugging."
  • "How would you measure developer satisfaction with the platform?"
  • "Design an onboarding experience for new developers"
  • "How do you balance platform stability vs. developer feature requests?"
  • "What metrics would you track to prove platform ROI to leadership?"

Security & Compliance (10 min)

  • "How would you implement SSO/OIDC for CDE authentication?"
  • "Explain your approach to secrets management in cloud environments"
  • "How do you ensure compliance with SOC2/HITRUST in a CDE?"
  • "Walk me through container image security best practices"

Behavioral & Situational Questions

Stakeholder Management

  • "Leadership wants ROI metrics for the platform. What do you track and present?"
  • "How do you prioritize features when 5 teams want different things?"
  • "Tell me about a time you had to say 'no' to a senior engineer or manager"
  • "How do you communicate platform changes to 100+ developers?"
  • "Describe a time you had to get buy-in for a major infrastructure change"

Problem Solving & Incidents

  • "Describe the most complex infrastructure problem you've solved"
  • "How do you approach debugging distributed systems?"
  • "Walk me through a production incident you led to resolution"
  • "Tell me about a time when you had to make a quick decision with incomplete information"
  • "Describe a project that failed. What did you learn?"

Team Collaboration & Leadership

  • "How do you work with security teams on compliance requirements?"
  • "Describe your code review philosophy"
  • "How do you mentor junior engineers or share knowledge?"
  • "Tell me about working with a difficult colleague. How did you handle it?"
  • "How do you stay current with rapidly evolving cloud technologies?"

Culture & Values

  • "What does 'developer experience' mean to you?"
  • "How do you balance moving fast vs. doing things right?"
  • "What's your approach to documentation?"
  • "Where do you see the future of platform engineering in 5 years?"

Take-Home Assessment (4-6 hours)

Scenario: Design a CDE Platform for 200 Developers

Your company has 200 developers across 15 teams and wants to migrate from local development to cloud development environments. Design the infrastructure and rollout plan.

Part 1: Infrastructure (Required)

Write Terraform to provision a Kubernetes cluster with auto-scaling, node pools for different workload types, and proper networking

Part 2: DevContainer (Required)

Create a devcontainer.json for a Node.js + PostgreSQL + Redis stack with proper extensions and settings

Part 3: Observability (Required)

Design a monitoring strategy: metrics to collect, alerting rules, Grafana dashboard mockup

Part 4: Rollout Plan (Required)

Write a 2-page implementation plan with phases, risks, and success criteria

Bonus Points (Optional)
  • CI/CD pipeline for workspace templates
  • Cost estimation and optimization strategy
  • Security architecture diagram
  • Disaster recovery plan
Code quality & documentation
Time: 4-6 hours
Due: 5 business days
Follow-up: 30-min review

Interview Scoring Rubric

Competency Weight 5 - Expert 3 - Proficient 1 - Developing
Kubernetes 25% Designs complex multi-tenant architectures Solid operational knowledge Basic understanding only
Infrastructure as Code 20% Creates reusable modules, manages state at scale Writes clean Terraform Modifies existing code
Developer Experience 20% Drives adoption, measures satisfaction Understands developer needs Infrastructure-focused only
Problem Solving 15% Systematic approach, leads incidents Good debugging skills Needs guidance
Communication 10% Explains complex topics clearly to any audience Clear technical communication Struggles to simplify
Culture Fit 10% Aligns with values, growth mindset Good team player Concerns about fit
4.0+
Strong Hire
3.0-3.9
Hire (with coaching plan)
< 3.0
No Hire

Reference Check Questions

Performance & Technical Skills

  • "On a scale of 1-10, how would you rate [name]'s technical skills? What makes you say that?"
  • "What was their most significant technical contribution?"
  • "How did they handle production incidents or high-pressure situations?"
  • "What areas would you suggest they develop further?"

Teamwork & Culture

  • "How would you describe their working style?"
  • "How did they collaborate with other teams (security, product, etc.)?"
  • "Would you hire them again? Why or why not?"
  • "Is there anything I should know that I haven't asked about?"

Success Metrics & KPIs

How to measure platform engineering team performance

Developer Productivity

  • Onboarding Time: Hours to first commit (target: <2 hours)
  • Workspace Uptime: % workspaces running successfully (target: >99%)
  • Daily Active Users: % developers using CDEs daily (target: >80%)

Platform Reliability

  • Platform SLO: Uptime target (target: 99.9%)
  • MTTR: Mean time to recovery (target: <30 minutes)
  • Incident Frequency: Major incidents per month (target: <2)

Developer Satisfaction

  • NPS Score: Net Promoter Score (target: >50)
  • Support Ticket Volume: Tickets per user per month (target: <0.5)
  • Adoption Rate: New feature usage within 30 days (target: >60%)

Cost Efficiency

  • Cost per Developer: Monthly cloud spend per dev (benchmark: $50-200)
  • Idle Workspace Rate: % workspaces auto-stopped (target: >70%)
  • Resource Utilization: CPU/memory efficiency (target: >60%)

Security & Compliance

  • Vulnerability Remediation: Time to patch CVEs (target: <7 days)
  • Compliance Audit Pass Rate: (target: 100%)
  • Policy Violations: Security policy breaches (target: 0)

Team Velocity

  • Feature Delivery: New features shipped per quarter (benchmark: 3-5)
  • Template Coverage: % stacks with templates (target: >90%)
  • Documentation Coverage: % features documented (target: 100%)

Common Anti-Patterns to Avoid

Learn from others' mistakes - what NOT to do when building platform engineering teams

Building Without User Research

Platform team builds what they think developers need without talking to actual developers.

Instead:

Run weekly office hours, quarterly developer surveys, and embed with product teams. Treat developers as customers.

No Clear Ownership Model

Platform team, SRE, and security all touch CDEs with no clear responsibility boundaries.

Instead:

Create a RACI matrix (see Responsibilities section above). Document who owns infrastructure, who owns templates, who owns support.

Over-Engineering from Day 1

Spending 6 months building a perfect self-service portal before anyone can use a CDE.

Instead:

Start with manual provisioning for pilot users. Automate based on actual pain points. Ship value early and iterate.

Ignoring Cost from the Start

No auto-stop policies, no resource quotas. Cloud bill explodes and leadership loses trust.

Instead:

Implement auto-stop (2-4 hours), resource quotas, and cost dashboards from day 1. Track cost-per-developer monthly.

Treating Security as an Afterthought

Building the platform first, then trying to bolt on IAM, RBAC, and audit logging later.

Instead:

Embed a security engineer from the start. Design IAM, secrets management, and audit logging into the architecture.

Zero Documentation Strategy

"We'll document it later." Six months in, only tribal knowledge exists. Developer frustration soars.

Instead:

Hire a DevEx engineer who owns docs. Create getting-started guides, video walkthroughs, and troubleshooting FAQs from day 1.

One-Size-Fits-All Templates

Forcing data scientists who need GPUs and backend engineers into the same workspace template.

Instead:

Create stack-specific templates: Node.js, Python ML, Java microservices, etc. Let teams customize within guardrails.

No Feedback Loop

Platform team ships features into the void with no metrics on adoption, usage, or satisfaction.

Instead:

Track usage metrics, run quarterly NPS surveys, analyze support tickets, and hold developer forums for direct feedback.

Ready to Build Your Platform Engineering Team?

Explore our comprehensive guides on CDE implementation, tooling comparison, and compliance frameworks