Skip to main content
InfraGap.com Logo
Home
Getting Started
Core Concept What is a CDE? How It Works Benefits CDE Assessment Getting Started Guide
Implementation
Architecture Patterns DevContainers Language Quickstarts IDE Integration AI/ML Workloads Advanced DevContainers
Operations
Performance Optimization High Availability & DR Monitoring Capacity Planning Troubleshooting Runbooks
Security
Security Deep Dive Secrets Management Vulnerability Management Network Security IAM Guide Compliance Guide
Planning
Pilot Program Design Stakeholder Communication Risk Management Migration Guide Cost Analysis Vendor Evaluation Training Resources Team Structure Industry Guides
Resources
Tools Comparison CDE vs Alternatives Case Studies Lessons Learned Glossary FAQ

CDE Capacity Planning

Right-size your Cloud Development Environment infrastructure for performance, cost optimization, and growth.

2-4 vCPU
Typical workspace
4-8 GB
RAM per workspace
20-50 GB
Storage per workspace
60-70%
Target utilization

Quick Sizing Calculator

Estimate your infrastructure requirements

Your Requirements

10% 40% 100%

Estimated Requirements

Concurrent Workspaces 20
Total vCPU Required 80
Total RAM (GB) 160
Total Storage (TB) 1.5
Recommended Nodes 5

m5.2xlarge (8 vCPU, 32GB)

Add 20-30% headroom for auto-scaling and burst capacity

Workload Profiles

Resource recommendations by development type

Light

Scripts, docs, web frontend

CPU 2 vCPU
Memory 4 GB
Storage 20 GB

Examples: Python scripts, React, Vue.js, documentation

RECOMMENDED

Medium

Full-stack, microservices

CPU 4 vCPU
Memory 8 GB
Storage 30 GB

Examples: Node.js, Java Spring, Go services, containers

Heavy

Compiled languages, builds

CPU 8 vCPU
Memory 16 GB
Storage 50 GB

Examples: C++, Rust, Scala, monorepo builds

GPU

ML/AI, data science

CPU 4 vCPU
Memory 16-32 GB
GPU 1x T4/A10G
Storage 100 GB

Examples: PyTorch, TensorFlow, CUDA development

Node Pool Strategy

Configure node pools for different workload types

Node Pool Instance Type Min/Max Workloads Cost/hr
control-plane m5.xlarge 3 / 3 Coder, DB, monitoring $0.192 x 3
workspace-standard m5.2xlarge 2 / 20 Light/Medium workspaces $0.384
workspace-compute c5.4xlarge 0 / 10 Heavy build workloads $0.680
workspace-gpu g4dn.xlarge 0 / 5 ML/AI workloads $0.526
workspace-spot m5.2xlarge (spot) 0 / 30 Cost-optimized workspaces ~$0.115 (70% off)

EKS Node Group Configuration

# Terraform - EKS Managed Node Groups

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "cde-cluster"
  cluster_version = "1.28"

  eks_managed_node_groups = {
    # Control plane nodes - always on
    control-plane = {
      name           = "control-plane"
      instance_types = ["m5.xlarge"]
      min_size       = 3
      max_size       = 3
      desired_size   = 3

      labels = {
        role = "control-plane"
      }

      taints = [{
        key    = "CriticalAddonsOnly"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # Standard workspace nodes - auto-scaling
    workspace-standard = {
      name           = "workspace-standard"
      instance_types = ["m5.2xlarge"]
      min_size       = 2
      max_size       = 20
      desired_size   = 5

      labels = {
        role     = "workspace"
        workload = "standard"
      }
    }

    # Spot instances for cost savings
    workspace-spot = {
      name           = "workspace-spot"
      instance_types = ["m5.2xlarge", "m5a.2xlarge", "m4.2xlarge"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 30
      desired_size   = 0

      labels = {
        role     = "workspace"
        workload = "spot"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }

    # GPU nodes for ML workloads
    workspace-gpu = {
      name           = "workspace-gpu"
      instance_types = ["g4dn.xlarge"]
      min_size       = 0
      max_size       = 5
      desired_size   = 0

      ami_type = "AL2_x86_64_GPU"

      labels = {
        role     = "workspace"
        workload = "gpu"
        "nvidia.com/gpu" = "true"
      }

      taints = [{
        key    = "nvidia.com/gpu"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

Auto-Scaling Configuration

Scale infrastructure based on demand

Cluster Autoscaler

Node-level scaling

Scale-up triggers

Pending pods that cannot be scheduled

Scale-down triggers

Nodes underutilized for 10+ minutes

Recommended settings

scale-down-delay: 10m, scale-down-utilization: 0.5

--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-waste

Karpenter (Recommended)

Fast, flexible provisioning

Faster scaling

Provisions nodes in seconds vs minutes

Right-sized instances

Picks optimal instance type per workload

Spot integration

Automatic spot instance fallback

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: workspace
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot", "on-demand"]
  limits:
    resources:
      cpu: 1000
      memory: 2000Gi
  ttlSecondsAfterEmpty: 30

Scheduled Scaling

Pre-scale before peak hours, scale down after hours

Morning Ramp (7-9 AM)

Scale to 60% of expected peak 30 min before start of business

Peak Hours (9 AM - 5 PM)

Autoscaler handles demand, maintain headroom for burst

Off-Hours (6 PM - 7 AM)

Scale to minimum, auto-stop idle workspaces after 2 hours

Storage Planning

Persistent volume and storage class recommendations

PREMIUM

High-Performance SSD

For build-heavy workloads

IOPS 16,000+
Throughput 1,000 MB/s
Cost $0.125/GB/mo

AWS: io2, Azure: Premium SSD v2, GCP: pd-extreme

RECOMMENDED

Balanced SSD

Best value for most workloads

IOPS 3,000
Throughput 125 MB/s
Cost $0.08/GB/mo

AWS: gp3, Azure: Premium SSD, GCP: pd-ssd

ARCHIVE

Cold Storage

For stopped workspace snapshots

Access Infrequent
Retrieval Minutes
Cost $0.01/GB/mo

AWS: S3 IA, Azure: Cool Blob, GCP: Nearline

Cost Optimization Strategies

Auto-Stop Idle Workspaces

Stop workspaces after 2-4 hours of inactivity

40-60% savings

Spot Instances

Use spot for non-critical workspaces

60-70% savings

Reserved Capacity

Commit to 1-3 year reserved instances for baseline

30-40% savings

Right-Size Templates

Match resources to actual workload needs

20-30% savings

Prebuilds

Build images ahead of time, not on-demand

Faster + Cheaper

Storage Cleanup

Delete orphaned volumes, archive inactive workspaces

10-20% savings