Skip to main content
InfraGap.com Logo
Home
Getting Started
Core Concept What is a CDE? How It Works Benefits CDE Assessment Getting Started Guide CDEs for Startups
AI & Automation
AI Coding Assistants Agentic AI AI-Native IDEs Agentic Engineering AI Agent Orchestration AI Governance AI-Assisted Architecture Shift-Left AI LLMOps Autonomous Development AI/ML Workloads GPU Computing
Implementation
Architecture Patterns DevContainers Advanced DevContainers Language Quickstarts IDE Integration CI/CD Integration Platform Engineering Developer Portals Container Registry Multi-CDE Strategies Remote Dev Protocols Nix Environments
Operations
Performance Optimization High Availability & DR Disaster Recovery Monitoring Capacity Planning Multi-Cluster Development Troubleshooting Runbooks Ephemeral Environments
Security
Security Deep Dive Zero Trust Architecture Secrets Management Vulnerability Management Network Security IAM Guide Supply Chain Security Air-Gapped Environments AI Agent Security MicroVM Isolation Compliance Guide Governance
Planning
Pilot Program Design Stakeholder Communication Risk Management Migration Guide Cost Analysis FinOps GreenOps Vendor Evaluation Training Resources Developer Onboarding Team Structure DevEx Metrics Industry Guides
Resources
Tools Comparison CDE vs Alternatives Case Studies Lessons Learned Glossary FAQ

CDE Capacity Planning

Right-size your Cloud Development Environment infrastructure for human developers, AI agents, and GPU-accelerated workloads in 2026.

4-8 vCPU
Typical workspace
8-16 GB
RAM per workspace
30-100 GB
Storage per workspace
8-24 GB VRAM
AI agent GPU needs
60-70%
Target utilization

Quick Sizing Calculator

Estimate your infrastructure requirements

Your Requirements

Autonomous coding agents (Claude Code, Copilot agents, Devin, etc.)

10% 40% 100%

Estimated Requirements

Concurrent Workspaces 20
Agent Workspaces 10
Total vCPU Required 80
Total RAM (GB) 160
Total Storage (TB) 1.5
Recommended Nodes 5

m7i.2xlarge (8 vCPU, 32GB)

Add 30-40% headroom for AI agent bursts and auto-scaling

Workload Profiles

Resource recommendations by development type for 2026

Light

Scripts, docs, web frontend

CPU 2 vCPU
Memory 4 GB
Storage 20 GB

Examples: Python scripts, React, Vue.js, documentation

RECOMMENDED

Medium

Full-stack, microservices

CPU 4 vCPU
Memory 8 GB
Storage 30 GB

Examples: Node.js, Java Spring, Go services, containers

Heavy

Compiled languages, builds

CPU 8 vCPU
Memory 16 GB
Storage 50 GB

Examples: C++, Rust, Scala, monorepo builds

NEW FOR 2026

AI-Assisted Dev

Copilot, Claude Code, Cursor

CPU 8 vCPU
Memory 16 GB
Storage 50 GB

Examples: Inline AI completion, code generation, agent-driven refactoring

NEW FOR 2026

Autonomous Agent

Headless AI agent workspaces

CPU 4-8 vCPU
Memory 8-16 GB
Storage 30 GB
Lifecycle Ephemeral

Examples: Devin, SWE-agent, Claude Code headless, OpenHands

GPU

LLM inference, ML training

CPU 8 vCPU
Memory 32-64 GB
GPU 1x L4/A10G/H100
Storage 100 GB

Examples: PyTorch, TensorFlow, local LLM inference, CUDA, fine-tuning

AI Agent Capacity Planning

Size infrastructure for LLM inference, autonomous agents, and GPU-accelerated development

Why AI Changes Capacity Planning

In 2026, AI coding agents are no longer optional add-ons - they are primary consumers of CDE infrastructure. Platforms like Coder, Ona (formerly Gitpod), and GitHub Codespaces now provision dedicated agent workspaces alongside human developer environments. Each autonomous agent session consumes CPU, memory, and storage just like a human developer workspace, but with different usage patterns: higher burst CPU during code generation, sustained memory for context windows, and rapid I/O for file operations.

MOST COMMON

API-Based Agents

Claude Code, Copilot, Cursor - calls remote LLM APIs

CPU per agent 4-8 vCPU
Memory per agent 8-16 GB
GPU None required
Network Low-latency egress

CPU spikes during tool execution (builds, tests, linting). Memory needed for workspace tooling and file indexing.

GPU REQUIRED

Local LLM Inference

Self-hosted models for air-gapped or low-latency needs

CPU per instance 8-16 vCPU
Memory per instance 32-64 GB
GPU VRAM 24-80 GB
Recommended GPU L4 / A10G / H100

A single L4 (24GB VRAM) can serve a quantized 7B-13B parameter model. Larger models need A100/H100 or multi-GPU setups.

ADVANCED

Multi-Agent Orchestration

Multiple agents per task - planning, coding, testing, review

CPU per pipeline 16-32 vCPU
Memory per pipeline 32-64 GB
Concurrent agents 3-6 per pipeline
Isolation MicroVM per agent

Multiply single-agent resources by pipeline concurrency. Use ephemeral workspaces to limit blast radius.

Agent Capacity Formula

Plan for peak concurrent agents, not total registered agents

API-Based Agent Sizing

Total vCPU = (peak_agents x 6) + (developers x workspace_cpu) Total RAM = (peak_agents x 12GB) + (developers x workspace_ram)

Example: 50 devs + 20 agents = (20 x 6) + (50 x 4) = 320 vCPU

GPU Inference Sizing

GPU nodes = ceil(concurrent_inference_requests / requests_per_gpu) VRAM = model_params x 2 bytes (FP16) or x 1 byte (INT8)

Example: 13B model at INT8 = ~13GB VRAM - fits on a single L4 (24GB)

Autonomous Development Scaling

Plan for the shift from human-only to human-plus-agent development teams

Scaling Stages

1

Copilot-Augmented (1:0 ratio)

Developers use inline AI completions. Add 2-4 GB RAM per workspace for language server and AI indexing overhead. No dedicated agent workspaces needed.

2

Agent-Assisted (1:1 ratio)

Each developer triggers one agent at a time. Plan for peak_developers x 1.0 additional workspaces. Agents share the same node pools but need isolated workspaces.

3

Multi-Agent (1:3 ratio)

Developers orchestrate multiple concurrent agents for coding, testing, and review. Plan for peak_developers x 3.0 additional workspaces with rapid spin-up.

4

Autonomous Fleet (1:10+ ratio)

Agents work independently on backlogs 24/7. Infrastructure must handle overnight surges. Use dedicated agent node pools with aggressive auto-scaling and ephemeral workspaces.

24/7 Agent Scheduling

Unlike human developers, autonomous agents can run around the clock. Your scheduled scaling rules need a separate policy for agent workloads that does not scale down overnight. Budget for sustained off-hours compute when agents process backlogs autonomously.

Agent Isolation and Limits

Each agent workspace should have strict resource limits (CPU, memory, storage quotas) and network policies. Use ephemeral workspaces with automatic cleanup. Ona and Coder both support workspace templates with hard resource caps to prevent runaway agents from consuming the cluster.

Burst Capacity Planning

Agent workloads are bursty: a single PR review might spawn 5 agent workspaces for 10 minutes, then release them. Use Karpenter or cluster autoscaler with aggressive scale-up (under 60 seconds) and moderate scale-down (5-10 minutes) to handle this pattern cost-effectively.

Node Pool Strategy

Configure node pools for different workload types

Node PoolInstance TypeMin/MaxWorkloadsCost/hr
control-planem7i.xlarge3 / 3Coder, DB, monitoring$0.202 x 3
workspace-standardm7i.2xlarge2 / 20Light/Medium workspaces$0.403
workspace-computec7i.4xlarge0 / 10Heavy build workloads$0.714
workspace-agentm7i.2xlarge0 / 50AI agent workspaces$0.403
workspace-gpug6.2xlarge0 / 10LLM inference, ML/AI$0.978
workspace-spotm7i.2xlarge (spot)0 / 30Cost-optimized + agent overflow~$0.121 (70% off)

EKS Node Group Configuration

# Terraform - EKS Managed Node Groups (2026)

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "cde-cluster"
  cluster_version = "1.31"

  eks_managed_node_groups = {
    # Control plane nodes - always on
    control-plane = {
      name           = "control-plane"
      instance_types = ["m7i.xlarge"]
      min_size       = 3
      max_size       = 3
      desired_size   = 3

      labels = {
        role = "control-plane"
      }

      taints = [{
        key    = "CriticalAddonsOnly"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # Standard workspace nodes - auto-scaling
    workspace-standard = {
      name           = "workspace-standard"
      instance_types = ["m7i.2xlarge"]
      min_size       = 2
      max_size       = 20
      desired_size   = 5

      labels = {
        role     = "workspace"
        workload = "standard"
      }
    }

    # AI agent workspace nodes - high scale limit
    workspace-agent = {
      name           = "workspace-agent"
      instance_types = ["m7i.2xlarge"]
      min_size       = 0
      max_size       = 50
      desired_size   = 0

      labels = {
        role     = "workspace"
        workload = "agent"
      }

      taints = [{
        key    = "agent-only"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # Spot instances for cost savings + agent overflow
    workspace-spot = {
      name           = "workspace-spot"
      instance_types = ["m7i.2xlarge", "m7a.2xlarge", "m6i.2xlarge"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 30
      desired_size   = 0

      labels = {
        role     = "workspace"
        workload = "spot"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # GPU nodes for LLM inference and ML workloads
    workspace-gpu = {
      name           = "workspace-gpu"
      instance_types = ["g6.2xlarge"]
      min_size       = 0
      max_size       = 10
      desired_size   = 0

      ami_type = "AL2023_x86_64_GPU"

      labels = {
        role     = "workspace"
        workload = "gpu"
        "nvidia.com/gpu" = "true"
      }

      taints = [{
        key    = "nvidia.com/gpu"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

Auto-Scaling Configuration

Scale infrastructure based on demand

Cluster Autoscaler

Node-level scaling

Scale-up triggers

Pending pods that cannot be scheduled

Scale-down triggers

Nodes underutilized for 10+ minutes

Recommended settings

scale-down-delay: 10m, scale-down-utilization: 0.5

--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-waste

Karpenter (Recommended)

Fast, flexible provisioning

Faster scaling

Provisions nodes in seconds vs minutes

Right-sized instances

Picks optimal instance type per workload

Spot integration

Automatic spot instance fallback

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: workspace
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m7i.2xlarge", "m7a.2xlarge"]
  limits:
    cpu: 1000
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Scheduled Scaling

Pre-scale before peak hours, manage agent workloads around the clock

Morning Ramp (7-9 AM)

Scale developer nodes to 60% of expected peak 30 min before start of business

Peak Hours (9 AM - 5 PM)

Autoscaler handles demand for both human and agent workspaces

Dev Off-Hours (6 PM - 7 AM)

Scale developer nodes to minimum, auto-stop idle workspaces after 2 hours

Agent Overnight (24/7)

Agent node pools remain active for autonomous backlog processing - use spot instances to reduce overnight costs

Storage Planning

Persistent volume and storage class recommendations

PREMIUM

High-Performance SSD

For build-heavy workloads

IOPS 16,000+
Throughput 1,000 MB/s
Cost $0.125/GB/mo

AWS: io2, Azure: Premium SSD v2, GCP: pd-extreme

RECOMMENDED

Balanced SSD

Best value for most workloads

IOPS 3,000
Throughput 125 MB/s
Cost $0.08/GB/mo

AWS: gp3, Azure: Premium SSD, GCP: pd-ssd

ARCHIVE

Cold Storage

For stopped workspace snapshots

Access Infrequent
Retrieval Minutes
Cost $0.01/GB/mo

AWS: S3 IA, Azure: Cool Blob, GCP: Nearline

Cost Optimization Strategies

Auto-Stop Idle Workspaces

Stop workspaces after 2-4 hours of inactivity

40-60% savings

Spot Instances

Use spot for non-critical workspaces

60-70% savings

Reserved Capacity

Commit to 1-3 year reserved instances for baseline

30-40% savings

Right-Size Templates

Match resources to actual workload needs

20-30% savings

Prebuilds

Build images ahead of time, not on-demand

Faster + Cheaper

Storage Cleanup

Delete orphaned volumes, archive inactive workspaces

10-20% savings

Ephemeral Agent Workspaces

Destroy agent workspaces on task completion, use spot for overflow

50-70% savings

Shared LLM Inference

Pool GPU resources across teams with vLLM or TGI instead of per-workspace GPUs

60-80% GPU savings