Optimization recipe: my reference

Not for students. This is the step-by-step path from the cold baseline to a properly-optimized pipeline. Each step is a real, measurable change with predicted impact. I use this to drive the live work in Workshop 3 (we do these steps together, on screen, with students mirroring in their forks), and to know exactly what we should land on as a group.

Read the number story first. A cold pipeline run is ~13 min. That is deliberately, uncomfortably slow. Almost all of it is one stage: the serial real-HTTP test:api suite, which is ~95% of the run (~12 min). Everything else is small: install ~7–90s (cold vs warm cache), lint ~2s, typecheck ~2–3s, test:web ~2–10s, build:api/build:web ~1s each, deploy ~18s.

The cost center is not npm install. It is the apps/api/tests/integration/* suite (billing.rollup, email.campaigns, webhooks.fanout, search.reindex, notifications.blast) plus the original billing rollup test in apps/api/tests/reports.test.ts, all running under maxWorkers: 1 and hitting the mock HTTP server over real localhost TCP.

This is a two-stage story. Workshop 3's own steps (caching, parallel jobs + matrix/sharding, incremental tsc, disabling always-on coverage, faster deploy) trim the overhead and parallelize the suite, bringing the pipeline to roughly ~5–7 min. They cannot collapse the serial real-HTTP test:api: the work is still real network round-trips. The order-of-magnitude collapse (to ~30s–2 min) happens in Workshop 5, when students mock the SDK clients so the integration suite stops making real HTTP calls. Keep that framing everywhere: W3 gets you to ~5–7 min; W5 is what makes test:api cheap.

Each step below is independent. They compose, and students can apply them one at a time.

Step 1: Add npm dependency caching in CI

File: .github/workflows/ci.yml

Change: Add cache: npm to actions/setup-node, and switch npm install → npm ci.

yaml

- uses: actions/setup-node@v4
  with:
    node-version: 20
    cache: npm   # <- this line

yaml

- name: Install dependencies
  run: npm ci --no-audit --no-fund   # <- was `npm install`

Expected impact: install drops from ~90s cold to ~7–10s on warm runs. Saves ~80s of install, real, but a rounding error against a ~12-min test:api. Frame it honestly: caching is correct hygiene, not the lever that fixes this pipeline.

Why it works: setup-node restores ~/.npm from the previous successful run and keys the cache off the committed root package-lock.json. npm ci does a clean, lockfile-exact install (deletes node_modules first) and is the right command for CI: deterministic and cache-friendly. The first run after a lockfile change still pays full cost; subsequent runs are nearly free.

Step 2: Enable incremental TypeScript builds

File: apps/api/tsconfig.json and apps/web/tsconfig.json

Change: Set incremental: true and write build info somewhere stable.

json

{
  "compilerOptions": {
    "incremental": true,
    "tsBuildInfoFile": "./.tsbuildinfo"
  }
}

Expected impact: the api build (tsc) and both typechecks (tsc --noEmit, api and web) drop ~50–70% on subsequent runs. These stages are only ~1–3s each to begin with, so the absolute saving is a couple of seconds. Note the web build is vite build, not tsc: incremental TypeScript does not speed it up.

Why it works: tsc writes a .tsbuildinfo file with the project state. Subsequent runs only recheck files that changed. vite build uses esbuild/Rollup and ignores .tsbuildinfo, so the web build is unaffected.

Pair with: in CI, cache **/.tsbuildinfo:

yaml

- uses: actions/cache@v4
  with:
    path: '**/.tsbuildinfo'
    key: tsbuildinfo-${{ hashFiles('**/*.ts') }}

Step 3: Stop running coverage on every test invocation

Files: apps/api/jest.config.js, apps/web/vite.config.ts, and apps/web/package.json.

Change: Disable always-on coverage in both apps. For web this takes two edits; the config flag is not enough on its own.

apps/api/jest.config.js, default collectCoverage to false:

module.exports = {
  // ...
  collectCoverage: false,   // <- was true
};

apps/web/vite.config.ts, set coverage.enabled: false:

coverage: {
  enabled: false,   // <- was true
},

apps/web/package.json, remove --coverage from the test script:
json
```
"test": "vitest run"   // <- was "vitest run --coverage"
```
This third edit is required. The --coverage CLI flag re-enables coverage even when coverage.enabled is false in the config: CLI wins over config. Disable it in both places or you have changed nothing for web.

Add a separate npm run test:coverage script for the nightly job.

Expected impact: test:api drops ~30–40% of its instrumentation overhead (real but small against a multi-minute suite of network round-trips, where coverage is not the bottleneck); test:web drops a few seconds. Saves seconds, not minutes.

Why it works: Istanbul / v8 coverage rewrites or traces every line of source to track execution. That overhead matters for CPU-bound tests; it barely registers against the api integration suite, where the time is real HTTP latency. PR builds care about pass/fail, not coverage; measure coverage nightly.

Step 4: Parallelize Jest and Vitest workers

Files: apps/api/jest.config.js and apps/web/vite.config.ts

Change (api): Set maxWorkers: '50%' (or remove the line entirely; Jest defaults to half the cores).

module.exports = {
  // ...
  maxWorkers: '50%',  // <- was 1
};

Change (web): The vite config advertises two serial knobs, singleFork: true and isolate: true. Turn off both, or web stays single-process:

poolOptions: {
  forks: {
    singleFork: false,  // <- was true
  },
},
isolate: false,         // <- was true

Expected impact: test:api drops substantially as the integration files fan out across workers. This is the biggest single W3 lever on the dominant stage, but it is bounded by core count and the real HTTP latency, so it shortens test:api rather than collapsing it. test:web drops a few seconds.

Why it works: Jest and Vitest can run test files in parallel processes. Test files here are mostly independent: the in-memory DB is reset per-file via beforeEach. The mock HTTP server handles concurrent requests fine. isolate: false also drops the ~50–100ms per-file jsdom rebuild on web.

Catch: This is where students often hit a real-world gotcha: tests that appeared independent suddenly fail because they actually share global state. Bumping workers will surface the documented flaky test(s). Good. That's the lesson.

Step 5: Mock the external SDK clients (the Workshop 5 payoff)

This is the order-of-magnitude step, and it belongs to Workshop 5. Steps 1–4 + 6–8 get the pipeline to ~5–7 min; this is the one that takes test:api from ~12 min to ~30s–2 min. Don't fold it into W3.

Files: apps/api/src/clients/__mocks__/*

Change: Add Jest manual mocks for all six HTTP clients (billing, email, webhooks, search, notifications, and audit) so unit tests stop hitting localhost. A focused set of integration tests can stay real to validate the wire contract.

// apps/api/src/clients/__mocks__/billing.client.ts
export class BillingClient {
  createCustomer = jest.fn().mockResolvedValue({ id: 'cus_test', email: '', created: 0 });
  charge = jest.fn().mockResolvedValue({ id: 'ch_test', amount: 0, currency: 'usd', status: 'succeeded' });
  getCustomer = jest.fn().mockResolvedValue({ id: 'cus_test', email: '', created: 0 });
}

Then in unit tests: jest.mock('../../src/clients/billing.client'); (and the equivalents for the other five clients).

Expected impact: the entire cost center collapses. The apps/api/tests/integration/* suite (billing.rollup, email.campaigns, webhooks.fanout, search.reindex, notifications.blast) plus the original billing rollup in apps/api/tests/reports.test.ts are what make test:api ~12 min. They run hundreds of real HTTP round-trips against the mock server under maxWorkers: 1. With manual mocks, no network is touched and test:api drops to ~30s–2 min. This is the big collapse.

Why it works: the time in these tests is genuinely network latency: real localhost TCP, no fake sleeps. Caching, sharding, and parallel workers (W3) only divide or hide that latency; they don't remove it. Mocking the clients removes it. Keep a small, deliberately-real integration subset as the one place wire behavior is validated.

Step 6: Split CI into parallel jobs (api + web)

File: .github/workflows/ci.yml

Change: Use a matrix strategy to run apps/api and apps/web as separate jobs in parallel.

yaml

jobs:
  test:
    strategy:
      matrix:
        app: [api, web]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm test --workspace=apps/${{ matrix.app }}

Expected impact: wall-clock pipeline time = max(api, web) instead of api + web. Saves ~30s.

Why it works: GitHub Actions runs matrix jobs concurrently. Two 2-minute jobs in parallel finish in 2 minutes, not 4.

Step 7: Shard the api test suite

File: .github/workflows/ci.yml

Change: Shard the api jest suite only; that's the long pole (the tests/integration/* files). Use 2 shards to match starter/ci-optimized-reference.yml (shard: [1, 2], and pass --shard=N/2 where N is the matrix shard value). Leave the web vitest suite as a single job (it's ~2–10s; sharding it would cost two cold installs to save seconds, see the Catch below). So you end up with a sharded test-api job plus a single test-web job, both running in parallel.

yaml

jobs:
  test-api:                       # the long pole, shard it
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci --no-audit --no-fund
      - run: npm test --workspace=apps/api -- --shard=${{ matrix.shard }}/2

  test-web:                       # small suite, one job, no shard
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci --no-audit --no-fund
      - run: npm test --workspace=apps/web

Expected impact: the api test wall-clock drops toward ~(original / 2) per shard. Combined with Step 4 this shortens the dominant stage but, again, does not collapse it: each shard still runs real HTTP. The collapse is Step 5.

Why it works: each shard runs ~half of the test files, and the shards run in parallel. Combined with Step 4 (parallel workers per shard) you get true multi-core scaling.

Catch: Don't shard below ~30 seconds per shard: overhead from setup-node, npm ci, and runner spin-up dominates. (Two shards is plenty here; more shards just multiply that fixed overhead.)

Step 8: Parallel and faster deploy script

File: scripts/deploy.sh

Change: Replace the per-file copy loop with rsync -a, drop the sleep 0.3 between files.

bash

rsync -a "${SRC_DIRS[@]}" "$DEST"

Expected impact: deploy drops from ~30s to ~2s. Saves ~28s.

Why it works: rsync -a handles thousands of files in one syscall pass. The original script's per-file cp + sleep is artificially serial.

Composition table

Here's the two-stage story, applying steps cumulatively. Baseline is ~13 min cold, dominated by the serial real-HTTP test:api (~12 min). The W3 steps (1–4, 6–8) trim overhead and parallelize the suite; they bottom out at ~5–7 min because the network round-trips are still real. Step 5 (Workshop 5) is the only step that removes the round-trips, and it does the order-of-magnitude work.

Steps applied	Stage	Expected total
0 (baseline)	(none)	~13 min
1 (cache + `npm ci`)	W3	~12 min (install shrinks; `test:api` untouched)
1+2 (+ incremental tsc)	W3	~12 min (seconds off build/typecheck)
1+2+3 (+ no coverage)	W3	~11–12 min
1+2+3+4 (+ parallel workers)	W3	~7–9 min (`test:api` fans out across workers)
1+2+3+4+6 (+ matrix api/web parallel)	W3	~6–8 min
1+2+3+4+6+7 (+ shard /2)	W3	~5–7 min ← floor of W3 alone
+ 8 (faster deploy)	W3	~5–7 min (deploy ~18s → ~2s)
+ 5 (mock the SDK clients)	W5	~30 s – 2 min ← the collapse

The W3 steps are real and worth doing, but they cannot get past ~5–7 min: test:api is still making real HTTP calls, just in parallel. The order-of-magnitude collapse is Step 5 in Workshop 5: mocking all six SDK clients so the tests/integration/* suite (and the reports.test.ts rollup) stops hitting the network at all. That's where ~12 min of test:api becomes ~30s–2 min.

Save Step 5 for the Workshop 5 demo: watching a ~12-min stage collapse to seconds is the audible-reaction moment, and it lands harder when students have already squeezed everything they can out of W3.

What students should NOT do

Disabling tests to make CI faster. (Obvious. Worth saying.)
Reducing test coverage thresholds.
Splitting "slow" tests into smaller tests without actually fixing the underlying cause.
Hardcoding "skip slow tests in CI" without a follow-up plan. The slow tests catch real bugs.

If a student hits these temptations, that's a Workshop 7 conversation: engineering standards prevent these shortcuts at the org level.

Optimization recipe: my reference ​

Step 1: Add npm dependency caching in CI ​

Step 2: Enable incremental TypeScript builds ​

Step 3: Stop running coverage on every test invocation ​

Step 4: Parallelize Jest and Vitest workers ​

Step 5: Mock the external SDK clients (the Workshop 5 payoff) ​

Step 6: Split CI into parallel jobs (api + web) ​

Step 7: Shard the api test suite ​

Step 8: Parallel and faster deploy script ​

Composition table ​

What students should NOT do ​

Optimization recipe: my reference

Step 1: Add npm dependency caching in CI

Step 2: Enable incremental TypeScript builds

Step 3: Stop running coverage on every test invocation

Step 4: Parallelize Jest and Vitest workers

Step 5: Mock the external SDK clients (the Workshop 5 payoff)

Step 6: Split CI into parallel jobs (api + web)

Step 7: Shard the api test suite

Step 8: Parallel and faster deploy script

Composition table

What students should NOT do