How do I size CDE workspaces?

Start with profiles: Small (2 CPU, 4GB RAM) for simple projects, Medium (4 CPU, 8GB RAM) for typical development, Large (8 CPU, 16GB RAM) for builds/compilation, XL (16+ CPU, 32GB+ RAM) for AI/ML or monorepo work.

How does auto-scaling work for CDEs?

Configure cluster autoscaler to add nodes when workspace demand exceeds capacity. Set up multiple node pools for different workspace sizes. Use auto-stop to terminate idle workspaces and scale down unused nodes.

How do I optimize CDE costs?

Key strategies: aggressive auto-stop timeouts (30-60 min idle), spot/preemptible instances for non-critical workloads, right-size workspace defaults, implement resource quotas, and use reserved capacity for baseline load.

CDE Capacity Planning

Right-size your Cloud Development Environment infrastructure for human developers, AI agents, and GPU-accelerated workloads in 2026.

4-8 vCPU

Typical workspace

8-16 GB

RAM per workspace

30-100 GB

Storage per workspace

8-24 GB VRAM

AI agent GPU needs

60-70%

Target utilization

Quick Sizing Calculator

Estimate your infrastructure requirements

Your Requirements

Total Developers

Concurrent AI Agents

Autonomous coding agents (Claude Code, Copilot agents, Devin, etc.)

Concurrent Usage (%)

10% 40% 100%

Workload Type

Storage per Workspace (GB)

Estimated Requirements

Concurrent Workspaces 20

Agent Workspaces 10

Total vCPU Required 80

Total RAM (GB) 160

Total Storage (TB) 1.5

Recommended Nodes 5

m7i.2xlarge (8 vCPU, 32GB)

Add 30-40% headroom for AI agent bursts and auto-scaling

Workload Profiles

Resource recommendations by development type for 2026

Light

Scripts, docs, web frontend

CPU 2 vCPU

Memory 4 GB

Storage 20 GB

Examples: Python scripts, React, Vue.js, documentation

RECOMMENDED

Medium

Full-stack, microservices

CPU 4 vCPU

Memory 8 GB

Storage 30 GB

Examples: Node.js, Java Spring, Go services, containers

Heavy

Compiled languages, builds

CPU 8 vCPU

Memory 16 GB

Storage 50 GB

Examples: C++, Rust, Scala, monorepo builds

NEW FOR 2026

AI-Assisted Dev

Copilot, Claude Code, Cursor

CPU 8 vCPU

Memory 16 GB

Storage 50 GB

Examples: Inline AI completion, code generation, agent-driven refactoring

NEW FOR 2026

Autonomous Agent

Headless AI agent workspaces

CPU 4-8 vCPU

Memory 8-16 GB

Storage 30 GB

Lifecycle Ephemeral

Examples: Devin, SWE-agent, Claude Code headless, OpenHands

GPU

LLM inference, ML training

CPU 8 vCPU

Memory 32-64 GB

GPU 1x L4/A10G/H100

Storage 100 GB

Examples: PyTorch, TensorFlow, local LLM inference, CUDA, fine-tuning

AI Agent Capacity Planning

Size infrastructure for LLM inference, autonomous agents, and GPU-accelerated development

Why AI Changes Capacity Planning

In 2026, AI coding agents are no longer optional add-ons - they are primary consumers of CDE infrastructure. Platforms like Coder, Ona (formerly Gitpod), and GitHub Codespaces now provision dedicated agent workspaces alongside human developer environments. Each autonomous agent session consumes CPU, memory, and storage just like a human developer workspace, but with different usage patterns: higher burst CPU during code generation, sustained memory for context windows, and rapid I/O for file operations.

MOST COMMON

API-Based Agents

Claude Code, Copilot, Cursor - calls remote LLM APIs

CPU per agent 4-8 vCPU

Memory per agent 8-16 GB

GPU None required

Network Low-latency egress

CPU spikes during tool execution (builds, tests, linting). Memory needed for workspace tooling and file indexing.

GPU REQUIRED

Local LLM Inference

Self-hosted models for air-gapped or low-latency needs

CPU per instance 8-16 vCPU

Memory per instance 32-64 GB

GPU VRAM 24-80 GB

Recommended GPU L4 / A10G / H100

A single L4 (24GB VRAM) can serve a quantized 7B-13B parameter model. Larger models need A100/H100 or multi-GPU setups.

ADVANCED

Multi-Agent Orchestration

Multiple agents per task - planning, coding, testing, review

CPU per pipeline 16-32 vCPU

Memory per pipeline 32-64 GB

Concurrent agents 3-6 per pipeline

Isolation MicroVM per agent

Multiply single-agent resources by pipeline concurrency. Use ephemeral workspaces to limit blast radius.

Agent Capacity Formula

Plan for peak concurrent agents, not total registered agents

API-Based Agent Sizing

Total vCPU = (peak_agents x 6) + (developers x workspace_cpu) Total RAM = (peak_agents x 12GB) + (developers x workspace_ram)

Example: 50 devs + 20 agents = (20 x 6) + (50 x 4) = 320 vCPU

GPU Inference Sizing

GPU nodes = ceil(concurrent_inference_requests / requests_per_gpu) VRAM = model_params x 2 bytes (FP16) or x 1 byte (INT8)

Example: 13B model at INT8 = ~13GB VRAM - fits on a single L4 (24GB)

Autonomous Development Scaling

Plan for the shift from human-only to human-plus-agent development teams

Scaling Stages

Copilot-Augmented (1:0 ratio)

Developers use inline AI completions. Add 2-4 GB RAM per workspace for language server and AI indexing overhead. No dedicated agent workspaces needed.

Agent-Assisted (1:1 ratio)

Each developer triggers one agent at a time. Plan for peak_developers x 1.0 additional workspaces. Agents share the same node pools but need isolated workspaces.

Multi-Agent (1:3 ratio)

Developers orchestrate multiple concurrent agents for coding, testing, and review. Plan for peak_developers x 3.0 additional workspaces with rapid spin-up.

Autonomous Fleet (1:10+ ratio)

Agents work independently on backlogs 24/7. Infrastructure must handle overnight surges. Use dedicated agent node pools with aggressive auto-scaling and ephemeral workspaces.

24/7 Agent Scheduling

Unlike human developers, autonomous agents can run around the clock. Your scheduled scaling rules need a separate policy for agent workloads that does not scale down overnight. Budget for sustained off-hours compute when agents process backlogs autonomously.

Agent Isolation and Limits

Each agent workspace should have strict resource limits (CPU, memory, storage quotas) and network policies. Use ephemeral workspaces with automatic cleanup. Ona and Coder both support workspace templates with hard resource caps to prevent runaway agents from consuming the cluster.

Burst Capacity Planning

Agent workloads are bursty: a single PR review might spawn 5 agent workspaces for 10 minutes, then release them. Use Karpenter or cluster autoscaler with aggressive scale-up (under 60 seconds) and moderate scale-down (5-10 minutes) to handle this pattern cost-effectively.

Node Pool Strategy

Configure node pools for different workload types

Node Pool	Instance Type	Min/Max	Workloads	Cost/hr
control-plane	m7i.xlarge	3 / 3	Coder, DB, monitoring	$0.202 x 3
workspace-standard	m7i.2xlarge	2 / 20	Light/Medium workspaces	$0.403
workspace-compute	c7i.4xlarge	0 / 10	Heavy build workloads	$0.714
workspace-agent	m7i.2xlarge	0 / 50	AI agent workspaces	$0.403
workspace-gpu	g6.2xlarge	0 / 10	LLM inference, ML/AI	$0.978
workspace-spot	m7i.2xlarge (spot)	0 / 30	Cost-optimized + agent overflow	~$0.121 (70% off)

EKS Node Group Configuration

# Terraform - EKS Managed Node Groups (2026)

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "cde-cluster"
  cluster_version = "1.31"

  eks_managed_node_groups = {
    # Control plane nodes - always on
    control-plane = {
      name           = "control-plane"
      instance_types = ["m7i.xlarge"]
      min_size       = 3
      max_size       = 3
      desired_size   = 3

      labels = {
        role = "control-plane"
      }

      taints = [{
        key    = "CriticalAddonsOnly"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # Standard workspace nodes - auto-scaling
    workspace-standard = {
      name           = "workspace-standard"
      instance_types = ["m7i.2xlarge"]
      min_size       = 2
      max_size       = 20
      desired_size   = 5

      labels = {
        role     = "workspace"
        workload = "standard"
      }
    }

    # AI agent workspace nodes - high scale limit
    workspace-agent = {
      name           = "workspace-agent"
      instance_types = ["m7i.2xlarge"]
      min_size       = 0
      max_size       = 50
      desired_size   = 0

      labels = {
        role     = "workspace"
        workload = "agent"
      }

      taints = [{
        key    = "agent-only"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # Spot instances for cost savings + agent overflow
    workspace-spot = {
      name           = "workspace-spot"
      instance_types = ["m7i.2xlarge", "m7a.2xlarge", "m6i.2xlarge"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 30
      desired_size   = 0

      labels = {
        role     = "workspace"
        workload = "spot"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # GPU nodes for LLM inference and ML workloads
    workspace-gpu = {
      name           = "workspace-gpu"
      instance_types = ["g6.2xlarge"]
      min_size       = 0
      max_size       = 10
      desired_size   = 0

      ami_type = "AL2023_x86_64_GPU"

      labels = {
        role     = "workspace"
        workload = "gpu"
        "nvidia.com/gpu" = "true"
      }

      taints = [{
        key    = "nvidia.com/gpu"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

Auto-Scaling Configuration

Scale infrastructure based on demand

Cluster Autoscaler

Node-level scaling

Scale-up triggers

Pending pods that cannot be scheduled

Scale-down triggers

Nodes underutilized for 10+ minutes

Recommended settings

scale-down-delay: 10m, scale-down-utilization: 0.5

--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-waste

Karpenter (Recommended)

Fast, flexible provisioning

Faster scaling

Provisions nodes in seconds vs minutes

Right-sized instances

Picks optimal instance type per workload

Spot integration

Automatic spot instance fallback

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: workspace
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m7i.2xlarge", "m7a.2xlarge"]
  limits:
    cpu: 1000
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Scheduled Scaling

Pre-scale before peak hours, manage agent workloads around the clock

Morning Ramp (7-9 AM)

Scale developer nodes to 60% of expected peak 30 min before start of business

Peak Hours (9 AM - 5 PM)

Autoscaler handles demand for both human and agent workspaces

Dev Off-Hours (6 PM - 7 AM)

Scale developer nodes to minimum, auto-stop idle workspaces after 2 hours

Agent Overnight (24/7)

Agent node pools remain active for autonomous backlog processing - use spot instances to reduce overnight costs

Storage Planning

Persistent volume and storage class recommendations

PREMIUM

High-Performance SSD

For build-heavy workloads

IOPS 16,000+

Throughput 1,000 MB/s

Cost $0.125/GB/mo

AWS: io2, Azure: Premium SSD v2, GCP: pd-extreme

RECOMMENDED

Balanced SSD

Best value for most workloads

IOPS 3,000

Throughput 125 MB/s

Cost $0.08/GB/mo

AWS: gp3, Azure: Premium SSD, GCP: pd-ssd

Cold Storage

For stopped workspace snapshots

Access Infrequent

Retrieval Minutes

Cost $0.01/GB/mo

AWS: S3 IA, Azure: Cool Blob, GCP: Nearline

Cost Optimization Strategies

Auto-Stop Idle Workspaces

Stop workspaces after 2-4 hours of inactivity

40-60% savings

Spot Instances

Use spot for non-critical workspaces

60-70% savings

Reserved Capacity

Commit to 1-3 year reserved instances for baseline

30-40% savings

Right-Size Templates

Match resources to actual workload needs

20-30% savings

Prebuilds

Build images ahead of time, not on-demand

Faster + Cheaper

Storage Cleanup

Delete orphaned volumes, archive inactive workspaces

10-20% savings

Ephemeral Agent Workspaces

Destroy agent workspaces on task completion, use spot for overflow

50-70% savings

Shared LLM Inference

Pool GPU resources across teams with vLLM or TGI instead of per-workspace GPUs

60-80% GPU savings

CDE Capacity Planning

Quick Sizing Calculator

Your Requirements

Estimated Requirements

Workload Profiles

Light

Medium

Heavy

AI-Assisted Dev

Autonomous Agent

GPU

AI Agent Capacity Planning

Why AI Changes Capacity Planning

API-Based Agents

Local LLM Inference

Multi-Agent Orchestration

Agent Capacity Formula

API-Based Agent Sizing

GPU Inference Sizing

Autonomous Development Scaling

Scaling Stages

Copilot-Augmented (1:0 ratio)

Agent-Assisted (1:1 ratio)

Multi-Agent (1:3 ratio)

Autonomous Fleet (1:10+ ratio)

24/7 Agent Scheduling

Agent Isolation and Limits

Burst Capacity Planning

Node Pool Strategy

EKS Node Group Configuration

Auto-Scaling Configuration

Cluster Autoscaler

Scale-up triggers

Scale-down triggers

Recommended settings

Karpenter (Recommended)

Faster scaling

Right-sized instances

Spot integration

Scheduled Scaling

Morning Ramp (7-9 AM)

Peak Hours (9 AM - 5 PM)

Dev Off-Hours (6 PM - 7 AM)

Agent Overnight (24/7)

Storage Planning

High-Performance SSD

Balanced SSD

Cold Storage

Cost Optimization Strategies

Auto-Stop Idle Workspaces

Spot Instances

Reserved Capacity

Right-Size Templates

Prebuilds

Storage Cleanup

Ephemeral Agent Workspaces

Shared LLM Inference

Related Resources

Cost Analysis

Monitoring Guide

Architecture