Pilot Program Design
Structure your CDE pilot for success with team selection criteria, success metrics, 90-day evaluation scorecards, and go/no-go decision frameworks.
Pilot Team Selection Criteria
Choose teams that will set your pilot up for success
Ideal Pilot Team Characteristics
-
Enthusiastic team lead
Manager willing to champion the change and handle resistance
-
Modern tech stack
Teams using containers, VS Code, or standard tooling (not legacy)
-
Team size 5-15 developers
Large enough for valid data, small enough to manage closely
-
Stable roadmap
Not in middle of critical deadline or major refactoring
-
Recent or planned hires
Can demonstrate onboarding improvements immediately
-
AI agent readiness
Teams already using AI coding assistants or planning to adopt AI agent workflows
Avoid for Initial Pilot
-
Teams under deadline pressure
Any friction will be blamed on CDE, not given fair evaluation
-
Specialized hardware needs
GPU workloads, embedded development, iOS builds (for first pilot)
-
Known skeptics/blockers
Don't start with teams vocally opposed - win them later with success
-
Legacy monolith teams
Complex local dependencies increase pilot complexity
-
High-latency regions
Teams far from your cloud region will have poor experience
-
Uncontrolled AI agent usage
Teams running AI agents without guardrails add unpredictable cost and security risk to the pilot
Team Selection Scorecard
| Criteria | Weight | Team A | Team B | Team C |
|---|---|---|---|---|
| Manager enthusiasm (1-5) | 3x | |||
| Tech stack compatibility (1-5) | 2x | |||
| Roadmap stability (1-5) | 2x | |||
| Team size (5-15 = 5, else lower) | 1x | |||
| Onboarding needs (1-5) | 2x | |||
| AI agent readiness (1-5) | 2x | |||
| Weighted Total | (max 60) | - | - | - |
Pilot Success Metrics
Define what success looks like before you start
Productivity
- Onboarding time < 4 hours
- Workspace startup < 5 min
- Env issues/week < 2
Experience
- Developer NPS > +30
- Would recommend > 70%
- Satisfaction score > 4.0/5
Reliability
- Platform uptime > 99.5%
- P95 latency < 100ms
- Support tickets < 5/week
Adoption
- Daily active users > 80%
- Local dev usage < 10%
- Return to local < 5%
AI Agents
- Agent task success > 75%
- Agent cost/task < budget
- Sandbox escapes 0
Cost Control
- Cost per dev/month < target
- LLM cost attribution 100%
- Idle workspace waste < 10%
90-Day Evaluation Scorecard
Weekly checkpoints for pilot evaluation
| Phase | Week | Milestone | Success Criteria | Status |
|---|---|---|---|---|
| Setup | 1 | Infrastructure deployed | Platform accessible, SSO working | - |
| 2 | Training complete | 100% pilot team attended | - | |
| 2 | AI agent sandbox configured | Agent workspaces isolated, cost limits set | - | |
| Active Pilot | 3 | First sprint on CDE | No blockers, sprint completed | - |
| 4 | First pulse survey | Satisfaction > 3.5/5 | - | |
| 5-6 | Steady state | DAU > 70%, < 3 support tickets | - | |
| 6 | AI agent workflow validation | Agent tasks completing in sandbox, no escapes | - | |
| 7-8 | New hire onboarding | Onboarding < 4 hours | - | |
| 9-10 | Edge case testing | Complex workflows and AI agent edge cases validated | - | |
| 11 | Final survey | NPS > +30, recommend > 70% | - | |
| Decision | 12 | Data analysis | All metrics compiled, AI agent cost review | - |
| 13 | Go/No-Go Decision | Executive presentation | - |
AI Agent Pilot Considerations
In 2026, any CDE pilot should account for AI agent workloads alongside human developers
Why AI agents change CDE pilots
AI coding agents like Claude Code, GitHub Copilot, and Cursor autonomously spin up workspaces, generate code, run tests, and submit pull requests. CDE platforms such as Coder, Ona (formerly Gitpod), and GitHub Codespaces now serve both human developers and autonomous agents. Your pilot must evaluate how well the platform handles both workload types, including sandboxing, cost attribution, and lifecycle management for unattended agent sessions.
Sandbox Isolation
AI agents must run in isolated workspaces with no access to production data or other developer environments. Validate that your CDE enforces strict network and filesystem boundaries for agent sessions.
- Network egress restrictions per workspace
- Filesystem isolation between agent and human workspaces
- Automatic workspace termination on timeout
Cost Attribution
AI agents can run unattended workloads around the clock, making cost tracking critical. Your pilot needs to separate agent compute and LLM API costs from human developer workspace costs.
- Per-agent and per-developer cost tagging
- LLM token usage tracked per workspace
- Budget alerts before runaway spend
Agent Lifecycle
Unlike human developers who close their laptops, AI agents can run indefinitely. Your pilot should define how agent workspaces are created, monitored, and automatically cleaned up.
- Maximum session duration limits
- Auto-shutdown on idle or task completion
- Workspace cleanup and artifact retention policies
Agent Observability
You need visibility into what AI agents are doing inside CDE workspaces. Evaluate whether the platform provides audit logs, action traces, and output review for autonomous agent sessions.
- Full audit trail of agent actions
- Human-in-the-loop review gates
- Anomaly detection for unexpected behavior
Platform Compatibility
Not all CDE platforms handle AI agent workloads equally. During your pilot, evaluate how well your chosen platform (Coder, Ona, Codespaces, DevPod, Daytona) supports headless agent sessions.
- Headless workspace creation via API
- Template support for agent-specific images
- Resource quotas separate from human workspaces
Governance Policies
Define clear guardrails for AI agent behavior during the pilot. Establish which repos agents can access, what actions they can take autonomously, and what requires human approval.
- Repository access allowlists for agents
- PR merge requires human approval
- No direct production deployment by agents
AI Agent Pilot Readiness Checklist
Before launch
- Agent workspace templates created and tested
- Network isolation rules configured
- Cost budget and alerts set for agent workloads
- Maximum session duration defined
- Audit logging enabled for agent workspaces
During pilot
- Monitor agent compute costs weekly
- Review agent output quality and accuracy
- Track sandbox escape attempts (should be zero)
- Measure developer trust in agent-generated code
- Validate idle workspace auto-shutdown is working
Go/No-Go Decision Framework
Objective criteria for the expansion decision
GO
All must be true
- Developer satisfaction > 4.0/5
- Platform uptime > 99.5%
- No critical blockers unresolved
- DAU > 80% of pilot team
- Onboarding < 4 hours achieved
- Manager recommends expansion
- AI agent costs within budget
- Zero sandbox escape incidents
CONDITIONAL
Extend pilot 30 days
- Satisfaction 3.5-4.0 (trending up)
- Uptime 99-99.5% (fixable issues)
- 1-2 blockers with clear fix path
- DAU 60-80% (adoption growing)
- Mixed manager feedback
- Agent costs over budget but trending down
NO-GO
Any one triggers stop
- Developer satisfaction < 3.5/5
- Platform uptime < 99%
- Security incident occurred
- DAU < 60% (low adoption)
- > 3 critical unfixed blockers
- Manager recommends rollback
- AI agent sandbox breach detected
- Uncontrolled agent cost overruns
Continue Planning Your Pilot
Related resources for successful implementation