How We Reduced CI/CD Pipeline Time by 87%: Fixing Slow Node.js Builds & Tests

When a SaaS startup came to us complaining about 45-minute CI/CD pipeline runs blocking deployments, we knew we had a classic optimization problem. The team was running a Node.js application with 2,000+ tests, heavy Docker builds, and a monolithic pipeline that ran everything on every commit. Developers were frustrated, deployments were delayed, and the velocity was suffering. This is a common challenge we see in our infrastructure audits.

After a comprehensive audit and systematic optimization, we reduced their pipeline time from 45 minutes to just 6 minutes - an 87% improvement. More importantly, we eliminated flaky tests, reduced CI costs by 60%, and enabled the team to ship code 7x faster. Similar improvements are detailed in our scaling case study and CI/CD security guide.

Real Impact Delivered

87%

Faster Pipelines

6 min

New Build Time

60%

Cost Reduction

Flaky Tests

🔍 1. Identifying the Bottlenecks

Before optimizing anything, we needed to understand where time was actually being spent. The team assumed tests were the problem, but our profiling revealed a different story.

Time-Boxed Profiling of Each Pipeline Stage

We instrumented their GitHub Actions workflow to measure execution time for each stage:

GitHub Actions pipeline stage breakdown showing time spent in each stage: Install Dependencies 8min (18%), Build Docker Image 12min (27%), Run Unit Tests 15min (33%), Run Integration Tests 8min (18%), Deploy to Staging 2min (4%), Total 45 minutes

GitHub Actions pipeline stage breakdown showing time distribution across each stage before optimization

# Before optimization - Stage breakdown
Install Dependencies: 8 minutes (18%)
Build Docker Image: 12 minutes (27%)
Run Unit Tests: 15 minutes (33%)
Run Integration Tests: 8 minutes (18%)
Deploy to Staging: 2 minutes (4%)
Total: 45 minutes

The results were surprising: Docker builds and test execution were the main culprits, not dependency installation as they suspected. For teams using GitHub Actions or other CI/CD platforms, proper job orchestration is critical for performance.

Measuring Queue Times vs. Execution Times

We discovered that runners were spending significant time waiting in queues. On average:

Queue time: 3-5 minutes per job (shared runners)
Actual execution: 40 minutes
Total wall-clock time: 45 minutes

This queue time was invisible to developers but added up to 10-15% overhead. Moving to self-hosted runners eliminated this entirely.

Detecting Flaky Tests, Redundant Jobs, Slow Containers, and Heavy Dependencies

Our analysis uncovered several hidden issues:

Flaky tests: 12 tests failing randomly 15-20% of the time, causing re-runs
Redundant jobs: Running linting, type-checking, and tests separately when they could be parallelized
Slow containers: Using `node:16` base image (800MB) instead of `node:16-alpine` (120MB)
Heavy dependencies: Installing all dev dependencies including unused packages (2,400 packages total)

Key Insight: The team was running a full test suite (2,000+ tests) on every commit, even for documentation-only changes. This was the single biggest waste of CI resources.

⚙️ 2. Reducing Build & Test Times

Implementing Incremental Builds / Remote Caching

For Node.js applications, we implemented several caching strategies:

npm Cache Restoration

# GitHub Actions example
- name: Cache node modules
 uses: actions/cache@v3
 with:
 path: ~/.npm
 key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
 restore-keys: |
 ${{ runner.os }}-node-

This reduced dependency installation from 8 minutes to 30 seconds on cache hits (95% of builds).

Docker Layer Caching

We implemented Docker BuildKit cache mounts and layer caching:

# Dockerfile optimization
FROM node:16-alpine AS dependencies
WORKDIR /app
COPY package*.json./
RUN npm ci --only=production && \
 npm cache clean --force

FROM node:16-alpine AS builder
WORKDIR /app
COPY package*.json./
RUN npm ci
COPY..
RUN npm run build

FROM node:16-alpine
WORKDIR /app
COPY --from=dependencies /app/node_modules./node_modules
COPY --from=builder /app/dist./dist
CMD ["node", "dist/index.js"]

Layer caching reduced Docker build time from 12 minutes to 2-3 minutes when only application code changed.

Parallelizing Test Execution

Jest supports parallel execution out of the box, but the team wasn't leveraging it effectively. We configured:

// jest.config.js
module.exports = {
 maxWorkers: '50%', // Use half of available CPUs
 testTimeout: 10000,
 // Split tests by type
 projects: [
 {
 displayName: 'unit',
 testMatch: ['/src/**/*.test.js'],
 },
 {
 displayName: 'integration',
 testMatch: ['/tests/**/*.test.js'],
 },
 ],
};

We also split the test suite across multiple jobs using Jest's `--testPathPattern`:

# Run tests in parallel across 4 jobs
- name: Run unit tests (shard 1/4)
 run: npm test -- --testPathPattern="src/.*" --shard=1/4

- name: Run unit tests (shard 2/4)
 run: npm test -- --testPathPattern="src/.*" --shard=2/4

This reduced test execution from 15 minutes to 4 minutes by running 4 test jobs in parallel.

Replacing Slow Test Frameworks or Optimizing Test Setup/Teardown

The team was using a heavy E2E testing framework (Cypress) for integration tests that required spinning up a full browser. We:

Replaced browser-based tests with API-level integration tests using Supertest (10x faster)
Moved critical E2E tests to a separate nightly job
Optimized database setup/teardown by using transactions instead of full migrations

Using Container Pre-warming or Optimized Base Images

We switched from `node:16` (800MB) to `node:16-alpine` (120MB), reducing:

Image pull time: 45 seconds → 8 seconds
Build context size: 1.2GB → 180MB
Overall build time: 12 minutes → 8 minutes

For self-hosted runners, we pre-warmed containers with base images, eliminating pull time entirely.

📦 3. Dependency Optimization

Caching Package Managers Effectively (npm)

We implemented a multi-layer caching strategy:

npm cache: Cached `~/.npm` directory based on `package-lock.json` hash
node_modules cache: Cached `node_modules` directory for faster restores
Docker layer cache: Cached npm install layer in Docker builds

# Multi-layer npm caching
- name: Get npm cache directory
 id: npm-cache
 run: echo "dir=$(npm config get cache)" >> $GITHUB_OUTPUT

- name: Cache npm
 uses: actions/cache@v3
 with:
 path: ${{ steps.npm-cache.outputs.dir }}
 key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
 restore-keys: |
 ${{ runner.os }}-npm-

Eliminating Unused Dependencies

We audited the `package.json` and found:

127 unused dependencies (installed but never imported)
45 duplicate packages (different versions of the same library)
23 deprecated packages with security vulnerabilities

Using `depcheck` and `npm-check`, we removed 195 unnecessary packages, reducing:

Installation time: 8 minutes → 5 minutes
Docker image size: 1.2GB → 850MB
Security surface: 23 fewer vulnerable packages

Pinning Versions to Avoid Repeated Resolution Delays

The team was using version ranges (`^1.2.3`) which caused npm to resolve versions on every install. We:

Locked all versions in `package.json` to exact versions
Used `package-lock.json` consistently (already in place)
Enabled `npm ci` instead of `npm install` for deterministic installs

This eliminated version resolution time (30-60 seconds) and ensured consistent builds across environments.

Introducing Artifact Repositories for Reusability

We set up a private npm registry (using GitHub Packages) to:

Cache internal packages locally
Share built artifacts across pipelines
Reduce external npm registry calls

For Docker images, we used GitHub Container Registry with layer caching, reducing image push/pull times by 40%.

🧪 4. Eliminating Flaky & Redundant Tests

Logging and Tracking Failure Patterns

We implemented test result tracking to identify flaky tests:

// test-reporter.js
const fs = require('fs');

afterEach(() => {
 if (this.currentTest.state === 'failed') {
 const testData = {
 name: this.currentTest.title,
 file: this.currentTest.file,
 timestamp: new Date().toISOString(),
 error: this.currentTest.err?.message,
 };
 // Log to file for analysis
 fs.appendFileSync('test-failures.jsonl', JSON.stringify(testData) + '\n');
 }
});

Over 2 weeks, we identified 12 consistently flaky tests. Common causes:

Race conditions in async tests (5 tests)
Time-dependent assertions without mocking (3 tests)
Shared test state between tests (2 tests)
External API dependencies (2 tests)

Separate Critical Tests from Long-Running Optional Ones

We reorganized the test suite into three tiers:

Fast unit tests: Run on every commit (500 tests, 2 minutes)
Integration tests: Run on PRs and main branch (800 tests, 5 minutes)
E2E tests: Run nightly or on release tags (700 tests, 20 minutes)

This meant developers got feedback in 2 minutes instead of 23 minutes for most changes.

Delete or Refactor Duplicate Test Coverage

We found significant test duplication:

45 tests covering the same functionality with different names
23 tests that were superseded by newer, better tests
12 tests testing implementation details instead of behavior

We removed 80 redundant tests, reducing test execution time by 3 minutes without losing coverage.

Run Heavy Integration Tests Only When Relevant Files Change

We implemented path-based test execution using `jest-changed-files`:

# Only run integration tests if API or database code changed
- name: Check changed files
 id: changed-files
 uses: tj-actions/changed-files@v35
 with:
 files: |
 src/api/**
 src/database/**
 tests/integration/**

- name: Run integration tests
 if: steps.changed-files.outputs.any_changed == 'true'
 run: npm run test:integration

This reduced integration test runs by 70%, only executing when API or database code actually changed.

🌱 5. Intelligent Job Triggering

Moving Away from "Run Everything on Every Commit"

The biggest win came from conditional job execution. We implemented path-based triggers:

#.github/workflows/ci.yml
name: CI

on:
 push:
 branches: [main, develop]
 pull_request:
 branches: [main, develop]

jobs:
 changes:
 runs-on: ubuntu-latest
 outputs:
 docs: ${{ steps.filter.outputs.docs }}
 frontend: ${{ steps.filter.outputs.frontend }}
 backend: ${{ steps.filter.outputs.backend }}
 steps:
 - uses: actions/checkout@v3
 - uses: dorny/paths-filter@v2
 id: filter
 with:
 filters: |
 docs:
 - 'docs/**'
 - '*.md'
 frontend:
 - 'frontend/**'
 backend:
 - 'src/**'
 - 'server/**'

 test-backend:
 needs: changes
 if: needs.changes.outputs.backend == 'true'
 runs-on: ubuntu-latest
 steps:
 - name: Run backend tests
 run: npm test

Adding Path-Based Triggers or Conditional Workflows

We created separate workflow files for different change types:

ci-docs.yml: Only runs on documentation changes (linting, spell-check)
ci-frontend.yml: Runs frontend tests and builds
ci-backend.yml: Runs backend tests and API tests
ci-full.yml: Runs everything (only on main branch merges)

Skipping Jobs for Docs-Only or Non-Critical Changes

For documentation-only changes, we skip all tests:

- name: Skip CI for docs
 if: contains(github.event.head_commit.message, '[skip ci]') || 
 steps.changed-files.outputs.docs == 'true'
 run: |
 echo "Skipping CI for documentation changes"
 exit 0

This reduced CI runs by 25% (many PRs were just README or comment updates).

Using Commit Message Tags like [skip ci]

We implemented commit message parsing to skip CI:

[skip ci]: Skip all CI jobs
[skip tests]: Skip test jobs, run only linting
[ci fast]: Run only fast unit tests

This gave developers control over CI execution for non-critical changes.

🧵 6. Pipeline Architecture Redesign

Breaking a Monolithic Pipeline into Smaller, Modular Workflows

The original pipeline was a single 45-minute job. We split it into:

Before: Single Job (45 min)
Install → Build → Test → Deploy

After: Parallel Jobs (6 min total)
Job 1: Lint (1 min) | Job 2: Unit Tests (2 min) | Job 3: Build (3 min)
↓
Job 4: Integration Tests (4 min) | Job 5: Deploy (2 min)

Introducing Fan-Out/Fan-In Stages

We implemented a fan-out pattern for tests:

jobs:
 test-matrix:
 strategy:
 matrix:
 shard: [1, 2, 3, 4]
 runs-on: ubuntu-latest
 steps:
 - name: Run test shard
 run: npm test -- --shard=${{ matrix.shard }}/4

 test-aggregate:
 needs: test-matrix
 runs-on: ubuntu-latest
 steps:
 - name: Aggregate test results
 run: npm run test:coverage:merge

This allowed 4 test jobs to run in parallel, reducing test time from 15 minutes to 4 minutes.

Shifting More Logic from CI to Local Pre-commit Hooks

We moved fast checks to pre-commit hooks using Husky:

//.husky/pre-commit
#!/usr/bin/env sh. "$(dirname -- "$0")/_/husky.sh"

# Fast checks that run locally
npm run lint:staged
npm run type-check
npm run test:unit:changed

This caught 80% of issues before they reached CI, reducing CI failures and re-runs.

Adopting a Matrix Build Strategy

For testing across Node.js versions, we used matrix builds:

strategy:
 matrix:
 node-version: [16.x, 18.x, 20.x]
 os: [ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
 - uses: actions/setup-node@v3
 with:
 node-version: ${{ matrix.node-version }}

This ran tests in parallel across 6 combinations (3 Node versions × 2 OS), completing in the time of a single test run.

🚀 7. Introducing Caching & Artifacts

Layer Caching for Docker Builds

We implemented Docker BuildKit cache mounts:

# docker-compose.yml or build command
DOCKER_BUILDKIT=1 docker build \
 --cache-from type=registry,ref=ghcr.io/org/app:latest \
 --cache-from type=local,src=/tmp/.buildx-cache \
 --cache-to type=local,dest=/tmp/.buildx-cache \
 -t app:latest.

This cached Docker layers between builds, reducing build time from 12 minutes to 2-3 minutes on cache hits.

Sharing Build Artifacts Across Jobs Instead of Re-building

We used GitHub Actions artifacts to share build outputs:

# Build job
- name: Build application
 run: npm run build

- name: Upload build artifacts
 uses: actions/upload-artifact@v3
 with:
 name: dist
 path: dist/

# Test job (uses built artifacts)
- name: Download build artifacts
 uses: actions/download-artifact@v3
 with:
 name: dist

- name: Run tests against built artifacts
 run: npm test

This eliminated duplicate builds and ensured tests ran against the exact code that would be deployed.

Caching Test Databases or Pre-computed Assets

For integration tests requiring databases, we:

Used Docker Compose to spin up test databases (PostgreSQL, Redis)
Cached database initialization scripts
Used test fixtures instead of seeding fresh data each time

This reduced database setup time from 2 minutes to 10 seconds.

Using Persistent Runners for Warm Caches

We migrated from GitHub-hosted runners to self-hosted runners with:

Persistent Docker layer cache
Pre-installed Node.js and common dependencies
Warm npm cache
No queue time

This provided consistent performance and eliminated the 3-5 minute queue times.

🏎️ 8. Hardware & Runner Improvements

Moving to More Powerful or Dedicated Runners

We replaced GitHub's standard runners (2 vCPUs, 7GB RAM) with self-hosted runners:

CPU: 8 vCPUs (4x improvement)
RAM: 16GB (2.3x improvement)
Storage: NVMe SSD (10x faster I/O)

This reduced test execution time by 40% due to faster CPU and I/O.

Switching from Shared SaaS Runners to Self-Hosted

Benefits of self-hosted runners:

No queue time: Immediate job execution
Persistent caches: Docker layers, npm cache survive between runs
Custom configuration: Pre-installed tools, optimized for our stack
Cost savings: $0.008/minute vs. $0.08/minute for GitHub-hosted

We set up runners using GitHub Actions Runner on AWS EC2 instances with auto-scaling.

Leveraging Autoscaling Runners for Parallel Workloads

We implemented autoscaling using actions-runner-controller on Kubernetes:

# runner-deployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
 name: github-runner
spec:
 replicas: 1
 template:
 spec:
 repository: org/repo
 minReplicas: 1
 maxReplicas: 10
 scaleUpTriggers:
 - amount: 1
 duration: "1m"

This automatically scaled runners based on queue depth, handling 10 parallel jobs during peak times.

📉 9. Measuring Improvements & Feedback Loop

Setting Up Dashboards for Build Time, Test Flakiness, Queue Depth

We created a CI/CD metrics dashboard using:

GitHub Actions API: Track build times, success rates
Prometheus: Export CI metrics
Grafana: Visualize trends and alerts

Key metrics tracked:

Average build time (target: < 10 minutes)
Test flakiness rate (target: < 1%)
Queue depth and wait times
Cache hit rates (target: > 80%)
Cost per build

Tracking Performance Before/After Changes

We maintained a performance log:

Visual timeline showing pipeline time reduction through each optimization stage

Performance Improvement Timeline

Baseline: 45 minutes (100%)
After caching: 32 minutes (71%) - 29% improvement
After parallelization: 18 minutes (40%) - 60% improvement
After path-based triggers: 12 minutes (27%) - 73% improvement
After self-hosted runners: 6 minutes (13%) - 87% improvement

Adding Alerts When Pipelines Degrade Again

We set up alerts for:

Build time exceeding 10 minutes (p95)
Test flakiness rate above 2%
Cache hit rate below 70%
Queue wait time above 2 minutes

These alerts helped catch regressions immediately, preventing the pipeline from slowly degrading over time.

Regular Reviews to Remove Accumulated "CI Debt"

We instituted monthly CI/CD reviews to:

Remove unused jobs or workflows
Update dependencies and base images
Review and optimize slow tests
Clean up old artifacts and caches
Review CI costs and identify optimization opportunities

This prevented the accumulation of "CI debt" that slows pipelines over time.

🧘 10. Cultural & Workflow Improvements

Encouraging Smaller Pull Requests

Large PRs mean longer CI runs and more context for reviewers. We:

Set PR size limits (max 400 lines changed)
Encouraged feature flags for incremental delivery
Provided templates for breaking large changes into smaller PRs

Smaller PRs meant faster CI feedback (2-4 minutes vs. 6+ minutes) and faster code reviews.

Enforcing Code-Review SLAs to Reduce Queueing

We implemented review SLAs:

First review within 4 hours during business hours
Auto-assign reviewers based on file paths
Reminder bots for stale PRs

This reduced PR queue time and prevented CI resources from being tied up by unreviewed PRs.

Educating the Team on Writing Efficient Tests

We conducted workshops on:

Writing fast, isolated unit tests
Avoiding slow I/O operations in tests
Using mocks and stubs effectively
Test organization and naming conventions

This cultural change led to developers writing faster tests from the start, preventing future CI slowdowns.

Documenting Best Practices for Long-Term Consistency

We created comprehensive documentation:

CI/CD Playbook: How to add new jobs, configure caching, etc.
Test Guidelines: When to write unit vs. integration vs. E2E tests
Performance Budgets: Maximum allowed times for each job type
Onboarding Guide: How new developers can contribute without breaking CI

This ensured the improvements were sustainable and new team members followed best practices.

Results Summary

Complete Transformation Results

87%

Faster Pipelines

6 min

New Build Time

60%

Cost Reduction

Flaky Test Rate

Faster Deployments

95%

Cache Hit Rate

Key Takeaways

Optimizing CI/CD isn't about a single silver bullet - it's about systematic improvements across multiple dimensions:

Measure first: You can't optimize what you don't measure. Profile every stage.
Cache aggressively: npm, Docker layers, and build artifacts are your friends.
Parallelize everything: Tests, builds, and jobs should run concurrently.
Be selective: Don't run everything on every commit. Use path-based triggers.
Invest in infrastructure: Self-hosted runners with proper hardware pay for themselves.
Eliminate waste: Remove flaky tests, unused dependencies, and redundant jobs.
Monitor continuously: Set up dashboards and alerts to catch regressions.
Build culture: Educate the team and document best practices.

Ready to optimize your CI/CD pipeline? Our DevOps team specializes in reducing build times, eliminating flaky tests, and cutting CI costs. Get a free CI/CD audit →

Conclusion

What started as a 45-minute pipeline blocking deployments became a 6-minute pipeline that enables rapid iteration. The transformation required systematic optimization across profiling, caching, parallelization, dependency management, test optimization, intelligent triggering, architecture redesign, and cultural improvements. For more cost optimization strategies, see our case studies on reducing infrastructure spend.

For Node.js applications specifically, the biggest wins came from:

npm caching and dependency optimization (saved 5 minutes)
Docker layer caching and optimized base images (saved 9 minutes)
Test parallelization and path-based execution (saved 11 minutes)
Self-hosted runners and hardware improvements (saved 4 minutes)
Intelligent job triggering (saved 10 minutes by skipping unnecessary runs)

The result? A team that ships code 7x faster, spends 60% less on CI, and has zero flaky tests blocking deployments. This is the power of systematic CI/CD optimization.

How We Reduced CI/CD Pipeline Time by 87%: Fixing Slow Node.js Builds & Tests

Real Impact Delivered

🔍 1. Identifying the Bottlenecks

Time-Boxed Profiling of Each Pipeline Stage

Measuring Queue Times vs. Execution Times

Detecting Flaky Tests, Redundant Jobs, Slow Containers, and Heavy Dependencies

⚙️ 2. Reducing Build & Test Times

Implementing Incremental Builds / Remote Caching

npm Cache Restoration

Docker Layer Caching

Parallelizing Test Execution

Replacing Slow Test Frameworks or Optimizing Test Setup/Teardown

Using Container Pre-warming or Optimized Base Images

📦 3. Dependency Optimization

Caching Package Managers Effectively (npm)

Eliminating Unused Dependencies

Pinning Versions to Avoid Repeated Resolution Delays

Introducing Artifact Repositories for Reusability

🧪 4. Eliminating Flaky & Redundant Tests

Logging and Tracking Failure Patterns

Separate Critical Tests from Long-Running Optional Ones

Delete or Refactor Duplicate Test Coverage

Run Heavy Integration Tests Only When Relevant Files Change

🌱 5. Intelligent Job Triggering

Moving Away from "Run Everything on Every Commit"

Adding Path-Based Triggers or Conditional Workflows

Skipping Jobs for Docs-Only or Non-Critical Changes

Using Commit Message Tags like [skip ci]

🧵 6. Pipeline Architecture Redesign

Breaking a Monolithic Pipeline into Smaller, Modular Workflows

Introducing Fan-Out/Fan-In Stages

Shifting More Logic from CI to Local Pre-commit Hooks

Adopting a Matrix Build Strategy

🚀 7. Introducing Caching & Artifacts

Layer Caching for Docker Builds

Sharing Build Artifacts Across Jobs Instead of Re-building

Caching Test Databases or Pre-computed Assets

Using Persistent Runners for Warm Caches

🏎️ 8. Hardware & Runner Improvements

Moving to More Powerful or Dedicated Runners

Switching from Shared SaaS Runners to Self-Hosted

Leveraging Autoscaling Runners for Parallel Workloads

📉 9. Measuring Improvements & Feedback Loop

Setting Up Dashboards for Build Time, Test Flakiness, Queue Depth

Tracking Performance Before/After Changes

Performance Improvement Timeline

Adding Alerts When Pipelines Degrade Again

Regular Reviews to Remove Accumulated "CI Debt"

🧘 10. Cultural & Workflow Improvements

Encouraging Smaller Pull Requests

Enforcing Code-Review SLAs to Reduce Queueing

Educating the Team on Writing Efficient Tests

Documenting Best Practices for Long-Term Consistency

Results Summary

Complete Transformation Results

Key Takeaways

Conclusion

Ready to Optimize Your CI/CD Pipeline?

Let's Get Started