Workshop 5: Live demo script

Share-screen guide. This is the most demo-heavy session. We use AI well together: I drive on screen, students follow along in their own clones in small chunks. Total demo time: ~40 minutes across blocks 3–5. We record results in the W5 logbook section together as we go. Anything beyond the core tasks is optional polish.

Reference branch (deterministic answer key)

Branch w5-ai (built on w4-devex) holds a known-good result for every step, so even when the room's AI output differs you have a green target to converge on. The full api suite (231 tests) runs in ~3s and the web suite (77 tests) is green and deterministic on this branch. git show <sha> for one step; git diff w4-devex..w5-ai for the lot.

Step	Commit	What
1	`c908c0f`	refactor `reports.ts` into pure helpers (`reports.csv.ts`)
2	`5c63678`	`auth.service.test.ts` (was zero coverage)
3	`0a9600f`	mock the SDK transport, `test:api` ~12min to ~3s
4	`2aeb253`	virtualize `TaskList`
5	`5f391d7`	deterministic `CommentList` via injected `now`
6	`98cfead`	`Login` test: `await waitFor`
6 (close)	`e1236d3`	restore vitest `isolate:true` + fix the 5th flaky test (Dashboard) so the suite is green

Two places the reference deviates from the narration below, both deliberate:

Step 3 (mocks): the branch mocks at the transport seam, one in-process fetch stub (apps/api/tests/setup/mockFetch.ts) that mirrors scripts/mock-server.js route-for-route, instead of six per-client mocks. Same effect (no real HTTP, same response shapes, all assertions hold), far less to maintain. Live you can still inject per-client mocks per the script; the branch is the robust end-state.
Step 4 (TaskList): the script says "don't change the test", but windowing cannot render 500 cards in jsdom. The branch follows the test file's own stated plan: a small-window render assertion plus a unit test of visibleRange. Same coverage, ~30ms instead of ~2s.

Setup before the demo

AI tool of choice open in the side panel (Claude / ChatGPT / Cursor).
Editor + terminal split.
Prompt templates open: workshops/05-ai/starter/refactor-prompt.md and test-gen-prompt.md.
workshops/05-ai/starter/active-comprehension-prompt.md open, and the comprehension-preserving prompt already set as your system prompt for the demo session so students see it shaping the answers.
Repo state: clean, all tests passing (or the flaky ones passing on this run).

Step 0: Set the comprehension-preserving system prompt (~2 min)

Open starter/active-comprehension-prompt.md. Read the prompt out loud.
"Before any task, I set this as my system prompt. It tells the AI to explain its reasoning, state assumptions and tradeoffs, offer an alternative, ask me clarifying questions, and quiz me at the end. The point is to stop the AI from thinking for me."
Name the risk plainly:
"The danger with AI isn't that it's wrong, we catch wrong with tests. It's that it's right and you accept code you never understood. That's cognitive offloading. You can't debug or extend what you don't understand. This prompt, plus proving comprehension after each task, is how we avoid it."
Leave it on for every step below. Call out, live, when the AI asks you a question or surfaces an assumption, that is the prompt working.

Step 1: Refactor the god function in `reports.ts` (~10 min)

Open apps/api/src/routes/reports.ts. Scroll. Show its length.
"~130 lines. One function. We're going to ask AI to split it apart, and then verify it didn't break anything."
Open starter/refactor-prompt.md. Read it out loud, slowly.
"Notice what we're doing here. We're giving the AI the file, the conventions, AND a constraint: don't change behavior. That last constraint is what makes the output safe to apply."
Paste the prompt template into your AI tool. Paste the full contents of reports.ts. Run it.
While it generates, talk through what we're hoping for:
- Smaller functions for each concern (CSV building, date formatting, timezone handling)
- Behavior preserved (tests still pass)

Apply the output. Run the existing tests:

bash

cd apps/api && npx jest --testPathPattern=reports

If tests pass: high-five the chat.
If tests fail: paste the failure back to the AI: "This test failed after your refactor: [paste]. Fix the refactor." Iterate.
Key teaching moment: show students that the AI's first answer often isn't right. The skill is iterating on the prompt.
Prove comprehension, live. Close the AI panel. From memory, name each helper the AI extracted and what it produces. Then change one helper by hand (rename it, tweak a behavior) with no AI and re-run the tests.
"If I can't name what it did and change it myself, I didn't understand it, I just accepted it. That's the move I want you copying: explain it back, or modify it without AI."

Step 2: Generate tests for `auth.service.ts` (~7 min)

Open apps/api/src/services/auth.service.ts. Show that there are zero tests for it.
Open starter/test-gen-prompt.md. Walk through what the prompt does.
Paste into AI. Paste the source file. Paste apps/api/tests/projects.test.ts as the style guide.
Apply the output to apps/api/tests/auth.service.test.ts.

Run the tests:

bash

cd apps/api && npx jest --testPathPattern=auth.service

Expect failures. AI typically gets one or two things wrong:
- Imports something that doesn't exist (@orbittasks/api instead of relative paths).
- Asserts on JWT payload structure directly when it should use verifyToken.
- Misses an edge case.
Walk through fixing one failure live. Push the rest back to the AI with the failure messages.
Key teaching moment: the AI is a junior pair programmer. Treat its output like a PR you have to merge.
Prove comprehension, live. Pick one passing test, change its assertion to something you know is wrong, and predict the failure out loud before running it. Run it. If the failure matches your prediction, you understood the test. If it surprises you, you didn't, and that's the gap blind acceptance would have shipped.

Step 3: Mock the SDK clients to collapse the slow API tests (~7 min)

This is the highest-value win of the session. Spend the time here.

Run the full API suite and time it:
bash
```
cd apps/api && npm run test:api
```
It takes ~12 minutes. Show students why: the integration tests make real HTTP through the SDK clients.
Point at the cost center: these tests, plus reports.test.ts, all drive real client calls:
- tests/integration/billing.rollup.test.ts
- tests/integration/email.campaigns.test.ts
- tests/integration/webhooks.fanout.test.ts
- tests/integration/search.reindex.test.ts
- tests/integration/notifications.blast.test.ts
- tests/reports.test.ts
Show that the services are already constructor-injectable, e.g.:
"BillingService takes client: BillingClient = new BillingClient(). The seam is already there. We just inject a mock instead of the real client."
Ask the AI to inject mocked clients for the six clients (billing, email, search, webhooks, notifications, audit) so the tests stop making real HTTP:
"These integration tests make real HTTP through the SDK clients. The services accept an injectable client in their constructor (e.g. BillingService(client: BillingClient = new BillingClient())). Generate mocked clients for billing, email, search, webhooks, notifications, and audit, and inject them in the tests so no real network calls happen. Keep the assertions the same."
Apply the mocks. Re-run and time it:
bash
```
cd apps/api && npm run test:api
```
Expected drop: from ~12 minutes to seconds. This is the payoff Workshop 3 set up: the seam was added then, AI wires up the mocks now.

Step 4: Refactor the slow FE list test (~5 min)

Open apps/web/tests/components/TaskList.test.tsx. Scroll to the "renders a long list of tasks" test.

Show it rendering 500 items. Run the test, time it:

bash

cd apps/web && npx vitest run TaskList --no-coverage

Ask the AI to virtualize the TaskList component (don't change the test; change the component to be efficient with N items):
"Here's the TaskList component. Add window-based virtualization so it can render thousands of items efficiently. Use only React built-ins, no new dependencies."
Apply the change to TaskList.tsx. Re-run the test. Expected drop: from ~2s to under 200ms.

Step 5: Fix the flaky `CommentList` test (~5 min)

Open apps/web/tests/components/CommentList.test.tsx. Run the "renders a relative timestamp" test a few times. Show that it occasionally fails depending on time of day.
Ask the AI to fix it. The preferred fix is injecting a fixed now, formatRelativeDays(input, now = new Date()) already takes a now param, so the test can pass a fixed date instead of relying on the real clock:
"This test depends on the real clock via formatRelativeDays. The function already takes a now parameter (formatRelativeDays(input, now = new Date())). Pass a fixed now (or thread it through the component prop) so the assertion is deterministic."
If injecting now isn't reachable from the test, the alternative is faking the clock:
"Alternatively, replace the real clock with vi.useFakeTimers() so the assertion is deterministic."

Apply the fix. Re-run the test 10 times in a loop:

bash

for i in {1..10}; do npx vitest run CommentList --no-coverage --silent; done

Every run passes. Teaching moment: the AI sometimes gives you a real, durable fix on the first try. Sometimes it doesn't. Both happen.

Step 6: Fix the flaky `Login` test (~5 min)

Open apps/web/tests/pages/Login.test.tsx. Scroll to the "Signing in…" test.
Run it a few times. Sometimes it passes, sometimes it doesn't.
Ask the AI:
"This test fires a click and synchronously asserts on the resulting state. React batches state updates so the rerender is queued. Fix the test using await waitFor or findByText so it waits for the rerender."
Apply the fix. Run repeatedly. All pass.
Capture for Workshop 7: the pattern "synchronous assertion on async state update" is something to ban via a lint rule. We'll do that in Workshop 7.

After the live demo

"We just used AI on six real problems in this codebase together, with a system prompt that kept us in the loop, and we proved we understood the output every time. With your own comprehension prompt set (handout Part 0), we keep going together on the next task, following along in your own clones. The handout has the rubric and the comprehension checks; we record the results in the W5 logbook section as we go. 10 minutes."

That's the handoff to the next block. We keep working together rather than splitting off into solo work. Any task past the core ones is optional polish.

What students will not realize without me saying it

The prompts are first-class code. Save them. Version them. Reuse them. That includes the comprehension-preserving system prompt.
The AI is wrong sometimes in ways that compile but lie. Always run the tests.
The bigger risk is the code that's right and you didn't understand. That's cognitive offloading, and it's invisible because nothing breaks today. The tell: could you re-derive or explain it without the AI? If not, you offloaded comprehension. Explain it back or modify it without AI to close the gap.
Failure modes are predictable once you've used AI for a week or two. The categories in starter/evaluation-rubric.md will become second nature.
The team policy you write in Workshop 7 codifies all of the above. Today is the input data.

Workshop 5: Live demo script ​

Reference branch (deterministic answer key) ​

Setup before the demo ​

Step 0: Set the comprehension-preserving system prompt (~2 min) ​

Step 1: Refactor the god function in reports.ts (~10 min) ​

Step 2: Generate tests for auth.service.ts (~7 min) ​

Step 3: Mock the SDK clients to collapse the slow API tests (~7 min) ​

Step 4: Refactor the slow FE list test (~5 min) ​

Step 5: Fix the flaky CommentList test (~5 min) ​

Step 6: Fix the flaky Login test (~5 min) ​

After the live demo ​

What students will not realize without me saying it ​

Workshop 5: Live demo script

Reference branch (deterministic answer key)

Setup before the demo

Step 0: Set the comprehension-preserving system prompt (~2 min)

Step 1: Refactor the god function in `reports.ts` (~10 min)

Step 2: Generate tests for `auth.service.ts` (~7 min)

Step 3: Mock the SDK clients to collapse the slow API tests (~7 min)

Step 4: Refactor the slow FE list test (~5 min)

Step 5: Fix the flaky `CommentList` test (~5 min)

Step 6: Fix the flaky `Login` test (~5 min)

After the live demo

What students will not realize without me saying it