CDE Capacity Planning
Right-size your Cloud Development Environment infrastructure for performance, cost optimization, and growth.
Quick Sizing Calculator
Estimate your infrastructure requirements
Your Requirements
Estimated Requirements
m5.2xlarge (8 vCPU, 32GB)
Add 20-30% headroom for auto-scaling and burst capacity
Workload Profiles
Resource recommendations by development type
Light
Scripts, docs, web frontend
Examples: Python scripts, React, Vue.js, documentation
Medium
Full-stack, microservices
Examples: Node.js, Java Spring, Go services, containers
Heavy
Compiled languages, builds
Examples: C++, Rust, Scala, monorepo builds
GPU
ML/AI, data science
Examples: PyTorch, TensorFlow, CUDA development
Node Pool Strategy
Configure node pools for different workload types
| Node Pool | Instance Type | Min/Max | Workloads | Cost/hr |
|---|---|---|---|---|
| control-plane | m5.xlarge | 3 / 3 | Coder, DB, monitoring | $0.192 x 3 |
| workspace-standard | m5.2xlarge | 2 / 20 | Light/Medium workspaces | $0.384 |
| workspace-compute | c5.4xlarge | 0 / 10 | Heavy build workloads | $0.680 |
| workspace-gpu | g4dn.xlarge | 0 / 5 | ML/AI workloads | $0.526 |
| workspace-spot | m5.2xlarge (spot) | 0 / 30 | Cost-optimized workspaces | ~$0.115 (70% off) |
EKS Node Group Configuration
# Terraform - EKS Managed Node Groups
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = "cde-cluster"
cluster_version = "1.28"
eks_managed_node_groups = {
# Control plane nodes - always on
control-plane = {
name = "control-plane"
instance_types = ["m5.xlarge"]
min_size = 3
max_size = 3
desired_size = 3
labels = {
role = "control-plane"
}
taints = [{
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_SCHEDULE"
}]
}
# Standard workspace nodes - auto-scaling
workspace-standard = {
name = "workspace-standard"
instance_types = ["m5.2xlarge"]
min_size = 2
max_size = 20
desired_size = 5
labels = {
role = "workspace"
workload = "standard"
}
}
# Spot instances for cost savings
workspace-spot = {
name = "workspace-spot"
instance_types = ["m5.2xlarge", "m5a.2xlarge", "m4.2xlarge"]
capacity_type = "SPOT"
min_size = 0
max_size = 30
desired_size = 0
labels = {
role = "workspace"
workload = "spot"
}
taints = [{
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}]
}
# GPU nodes for ML workloads
workspace-gpu = {
name = "workspace-gpu"
instance_types = ["g4dn.xlarge"]
min_size = 0
max_size = 5
desired_size = 0
ami_type = "AL2_x86_64_GPU"
labels = {
role = "workspace"
workload = "gpu"
"nvidia.com/gpu" = "true"
}
taints = [{
key = "nvidia.com/gpu"
value = "true"
effect = "NO_SCHEDULE"
}]
}
}
}
Auto-Scaling Configuration
Scale infrastructure based on demand
Cluster Autoscaler
Node-level scaling
Scale-up triggers
Pending pods that cannot be scheduled
Scale-down triggers
Nodes underutilized for 10+ minutes
Recommended settings
scale-down-delay: 10m, scale-down-utilization: 0.5
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-waste
Karpenter (Recommended)
Fast, flexible provisioning
Faster scaling
Provisions nodes in seconds vs minutes
Right-sized instances
Picks optimal instance type per workload
Spot integration
Automatic spot instance fallback
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: workspace
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
limits:
resources:
cpu: 1000
memory: 2000Gi
ttlSecondsAfterEmpty: 30
Scheduled Scaling
Pre-scale before peak hours, scale down after hours
Morning Ramp (7-9 AM)
Scale to 60% of expected peak 30 min before start of business
Peak Hours (9 AM - 5 PM)
Autoscaler handles demand, maintain headroom for burst
Off-Hours (6 PM - 7 AM)
Scale to minimum, auto-stop idle workspaces after 2 hours
Storage Planning
Persistent volume and storage class recommendations
High-Performance SSD
For build-heavy workloads
AWS: io2, Azure: Premium SSD v2, GCP: pd-extreme
Balanced SSD
Best value for most workloads
AWS: gp3, Azure: Premium SSD, GCP: pd-ssd
Cold Storage
For stopped workspace snapshots
AWS: S3 IA, Azure: Cool Blob, GCP: Nearline
Cost Optimization Strategies
Auto-Stop Idle Workspaces
Stop workspaces after 2-4 hours of inactivity
Spot Instances
Use spot for non-critical workspaces
Reserved Capacity
Commit to 1-3 year reserved instances for baseline
Right-Size Templates
Match resources to actual workload needs
Prebuilds
Build images ahead of time, not on-demand
Storage Cleanup
Delete orphaned volumes, archive inactive workspaces