CDE Capacity Planning
Right-size your Cloud Development Environment infrastructure for human developers, AI agents, and GPU-accelerated workloads in 2026.
Quick Sizing Calculator
Estimate your infrastructure requirements
Your Requirements
Autonomous coding agents (Claude Code, Copilot agents, Devin, etc.)
Estimated Requirements
m7i.2xlarge (8 vCPU, 32GB)
Add 30-40% headroom for AI agent bursts and auto-scaling
Workload Profiles
Resource recommendations by development type for 2026
Light
Scripts, docs, web frontend
Examples: Python scripts, React, Vue.js, documentation
Medium
Full-stack, microservices
Examples: Node.js, Java Spring, Go services, containers
Heavy
Compiled languages, builds
Examples: C++, Rust, Scala, monorepo builds
AI-Assisted Dev
Copilot, Claude Code, Cursor
Examples: Inline AI completion, code generation, agent-driven refactoring
Autonomous Agent
Headless AI agent workspaces
Examples: Devin, SWE-agent, Claude Code headless, OpenHands
GPU
LLM inference, ML training
Examples: PyTorch, TensorFlow, local LLM inference, CUDA, fine-tuning
AI Agent Capacity Planning
Size infrastructure for LLM inference, autonomous agents, and GPU-accelerated development
Why AI Changes Capacity Planning
In 2026, AI coding agents are no longer optional add-ons - they are primary consumers of CDE infrastructure. Platforms like Coder, Ona (formerly Gitpod), and GitHub Codespaces now provision dedicated agent workspaces alongside human developer environments. Each autonomous agent session consumes CPU, memory, and storage just like a human developer workspace, but with different usage patterns: higher burst CPU during code generation, sustained memory for context windows, and rapid I/O for file operations.
API-Based Agents
Claude Code, Copilot, Cursor - calls remote LLM APIs
CPU spikes during tool execution (builds, tests, linting). Memory needed for workspace tooling and file indexing.
Local LLM Inference
Self-hosted models for air-gapped or low-latency needs
A single L4 (24GB VRAM) can serve a quantized 7B-13B parameter model. Larger models need A100/H100 or multi-GPU setups.
Multi-Agent Orchestration
Multiple agents per task - planning, coding, testing, review
Multiply single-agent resources by pipeline concurrency. Use ephemeral workspaces to limit blast radius.
Agent Capacity Formula
Plan for peak concurrent agents, not total registered agents
API-Based Agent Sizing
Total vCPU = (peak_agents x 6) + (developers x workspace_cpu) Total RAM = (peak_agents x 12GB) + (developers x workspace_ram)Example: 50 devs + 20 agents = (20 x 6) + (50 x 4) = 320 vCPU
GPU Inference Sizing
GPU nodes = ceil(concurrent_inference_requests / requests_per_gpu) VRAM = model_params x 2 bytes (FP16) or x 1 byte (INT8)Example: 13B model at INT8 = ~13GB VRAM - fits on a single L4 (24GB)
Autonomous Development Scaling
Plan for the shift from human-only to human-plus-agent development teams
Scaling Stages
Copilot-Augmented (1:0 ratio)
Developers use inline AI completions. Add 2-4 GB RAM per workspace for language server and AI indexing overhead. No dedicated agent workspaces needed.
Agent-Assisted (1:1 ratio)
Each developer triggers one agent at a time. Plan for peak_developers x 1.0 additional workspaces. Agents share the same node pools but need isolated workspaces.
Multi-Agent (1:3 ratio)
Developers orchestrate multiple concurrent agents for coding, testing, and review. Plan for peak_developers x 3.0 additional workspaces with rapid spin-up.
Autonomous Fleet (1:10+ ratio)
Agents work independently on backlogs 24/7. Infrastructure must handle overnight surges. Use dedicated agent node pools with aggressive auto-scaling and ephemeral workspaces.
24/7 Agent Scheduling
Unlike human developers, autonomous agents can run around the clock. Your scheduled scaling rules need a separate policy for agent workloads that does not scale down overnight. Budget for sustained off-hours compute when agents process backlogs autonomously.
Agent Isolation and Limits
Each agent workspace should have strict resource limits (CPU, memory, storage quotas) and network policies. Use ephemeral workspaces with automatic cleanup. Ona and Coder both support workspace templates with hard resource caps to prevent runaway agents from consuming the cluster.
Burst Capacity Planning
Agent workloads are bursty: a single PR review might spawn 5 agent workspaces for 10 minutes, then release them. Use Karpenter or cluster autoscaler with aggressive scale-up (under 60 seconds) and moderate scale-down (5-10 minutes) to handle this pattern cost-effectively.
Node Pool Strategy
Configure node pools for different workload types
| Node Pool | Instance Type | Min/Max | Workloads | Cost/hr |
|---|---|---|---|---|
| control-plane | m7i.xlarge | 3 / 3 | Coder, DB, monitoring | $0.202 x 3 |
| workspace-standard | m7i.2xlarge | 2 / 20 | Light/Medium workspaces | $0.403 |
| workspace-compute | c7i.4xlarge | 0 / 10 | Heavy build workloads | $0.714 |
| workspace-agent | m7i.2xlarge | 0 / 50 | AI agent workspaces | $0.403 |
| workspace-gpu | g6.2xlarge | 0 / 10 | LLM inference, ML/AI | $0.978 |
| workspace-spot | m7i.2xlarge (spot) | 0 / 30 | Cost-optimized + agent overflow | ~$0.121 (70% off) |
EKS Node Group Configuration
# Terraform - EKS Managed Node Groups (2026)
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "cde-cluster"
cluster_version = "1.31"
eks_managed_node_groups = {
# Control plane nodes - always on
control-plane = {
name = "control-plane"
instance_types = ["m7i.xlarge"]
min_size = 3
max_size = 3
desired_size = 3
labels = {
role = "control-plane"
}
taints = [{
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_SCHEDULE"
}]
}
# Standard workspace nodes - auto-scaling
workspace-standard = {
name = "workspace-standard"
instance_types = ["m7i.2xlarge"]
min_size = 2
max_size = 20
desired_size = 5
labels = {
role = "workspace"
workload = "standard"
}
}
# AI agent workspace nodes - high scale limit
workspace-agent = {
name = "workspace-agent"
instance_types = ["m7i.2xlarge"]
min_size = 0
max_size = 50
desired_size = 0
labels = {
role = "workspace"
workload = "agent"
}
taints = [{
key = "agent-only"
value = "true"
effect = "NO_SCHEDULE"
}]
}
# Spot instances for cost savings + agent overflow
workspace-spot = {
name = "workspace-spot"
instance_types = ["m7i.2xlarge", "m7a.2xlarge", "m6i.2xlarge"]
capacity_type = "SPOT"
min_size = 0
max_size = 30
desired_size = 0
labels = {
role = "workspace"
workload = "spot"
}
taints = [{
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}]
}
# GPU nodes for LLM inference and ML workloads
workspace-gpu = {
name = "workspace-gpu"
instance_types = ["g6.2xlarge"]
min_size = 0
max_size = 10
desired_size = 0
ami_type = "AL2023_x86_64_GPU"
labels = {
role = "workspace"
workload = "gpu"
"nvidia.com/gpu" = "true"
}
taints = [{
key = "nvidia.com/gpu"
value = "true"
effect = "NO_SCHEDULE"
}]
}
}
}Auto-Scaling Configuration
Scale infrastructure based on demand
Cluster Autoscaler
Node-level scaling
Scale-up triggers
Pending pods that cannot be scheduled
Scale-down triggers
Nodes underutilized for 10+ minutes
Recommended settings
scale-down-delay: 10m, scale-down-utilization: 0.5
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-wasteKarpenter (Recommended)
Fast, flexible provisioning
Faster scaling
Provisions nodes in seconds vs minutes
Right-sized instances
Picks optimal instance type per workload
Spot integration
Automatic spot instance fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: workspace
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m7i.2xlarge", "m7a.2xlarge"]
limits:
cpu: 1000
memory: 2000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30sScheduled Scaling
Pre-scale before peak hours, manage agent workloads around the clock
Morning Ramp (7-9 AM)
Scale developer nodes to 60% of expected peak 30 min before start of business
Peak Hours (9 AM - 5 PM)
Autoscaler handles demand for both human and agent workspaces
Dev Off-Hours (6 PM - 7 AM)
Scale developer nodes to minimum, auto-stop idle workspaces after 2 hours
Agent Overnight (24/7)
Agent node pools remain active for autonomous backlog processing - use spot instances to reduce overnight costs
Storage Planning
Persistent volume and storage class recommendations
High-Performance SSD
For build-heavy workloads
AWS: io2, Azure: Premium SSD v2, GCP: pd-extreme
Balanced SSD
Best value for most workloads
AWS: gp3, Azure: Premium SSD, GCP: pd-ssd
Cold Storage
For stopped workspace snapshots
AWS: S3 IA, Azure: Cool Blob, GCP: Nearline
Cost Optimization Strategies
Auto-Stop Idle Workspaces
Stop workspaces after 2-4 hours of inactivity
Spot Instances
Use spot for non-critical workspaces
Reserved Capacity
Commit to 1-3 year reserved instances for baseline
Right-Size Templates
Match resources to actual workload needs
Prebuilds
Build images ahead of time, not on-demand
Storage Cleanup
Delete orphaned volumes, archive inactive workspaces
Ephemeral Agent Workspaces
Destroy agent workspaces on task completion, use spot for overflow
Shared LLM Inference
Pool GPU resources across teams with vLLM or TGI instead of per-workspace GPUs
