A data analytics company eliminated $28K/month in waste across Spark, DBT, and Snowflake workloads without performance regression.
Migrated Spark to EMR Serverless, right-sized DBT resources, downgraded Snowflake tier, migrated to ARM64, replaced Cluster Autoscaler with Karpenter, and removed unused sandbox environment.
Baseline cost was $65K/month ($30K AWS + $35K Snowflake) with 67% idle capacity on Spark workloads and 83% over-provisioning on DBT pods.
Reduced monthly spend to $37K/month, saving $28K/month ($336K/year). SLOs unchanged post-migration. EC2 node count reduced from 35 to 6 per environment.
Data analytics company running multi-environment workloads: Spark data processing, DBT transformation pipelines, and Snowflake data warehousing across production, staging, pre-production, and sandbox environments.
Architecture: EKS on EC2 for containerized workloads, MWAA orchestrating data pipelines via Kubernetes operators.
Infrastructure consuming $65K/month: $30K AWS (EC2, EKS, MWAA) and $35K Snowflake. Waste from idle compute (Spark jobs idle 67% of time), over-provisioned resources (DBT pods 4x actual usage), and inefficient scheduling (pre-prod running 24/7).
Problem: Business Critical tier at $4.9/credit. Workload characteristics aligned with Enterprise tier capabilities.
Decision: Migrated to Enterprise tier ($3.7/credit). Same compute performance and HA guarantees.
Result: ~$8K/month savings (~$100K/year). 25% reduction in credit costs. No performance regression observed.
Problem: Jobs ran every 30 minutes with 10-minute runtime, leaving EC2 idle 67% of the time.
Decision: Migrated from virtual EMR on EKS to EMR Serverless. Pay-per-execution model.
Result: ~$7K/month savings. Eliminated idle EC2 capacity. Job performance unchanged.
Problem: Pods configured 4 vCPU/4GB RAM, actual usage 1 vCPU/2GB RAM. 35 nodes per environment where 6 sufficed.
Decision: Reduced resource requests to 1 vCPU/2GB RAM. Minimum node count 8 → 2.
Result: ~$4.5K/month savings. EC2 nodes per environment: 35 → 6 (83% reduction). Baseline cost floor reduced.
Problem: x86 instances (m5.2xlarge) more expensive than ARM64 for equivalent capacity.
Decision: Migrated to m6g.2xlarge (Graviton2). Updated Docker images to ARM64.
Result: ~$2.5K/month savings. ~20% cost reduction on remaining EC2 footprint.
Problem: Cluster Autoscaler slow scale-down and inefficient bin-packing kept underutilized nodes active.
Decision: Replaced Cluster Autoscaler with Karpenter. Zero-downtime gradual rollout.
Result: ~$1.5K/month savings. Improved bin-packing and faster scale-down. Planning a similar migration? See our Cluster Autoscaler to Karpenter Migration Guide 2026.
Problem: Pre-prod running 24/7, needed only weekdays 9am-5pm. MWAA mw1.medium across 5 environments.
Decision: Scheduled jobs weekdays only. Resized MWAA mw1.medium → mw1.small.
Result: ~$2K/month savings. 40% reduction in pre-prod costs. Full functionality maintained during business hours.
Problem: Unused sandbox: MWAA medium, EKS 5×m5.2xlarge, OpenSearch 4 large nodes, MySQL db.m5.2xlarge.
Decision: Removed sandbox environment. Verified no dependencies.
Result: ~$2.5K/month savings. 100% cost elimination for zero-value environment.
DBT → Fargate migration estimated at ~$5K/month additional savings. Post-roadmap potential: ~$33K/month+ (~$396K/year), representing 51% reduction from $65K/month baseline.
Get a FinOps-grade cost reduction plan. We'll identify where your AWS bill is hiding waste and quantify exact savings opportunities.