Appearance
Workshop 5: Live demo script
Share-screen guide. This is the most demo-heavy session. We use AI well together: I drive on screen, students follow along in their own clones in small chunks. Total demo time: ~40 minutes across blocks 3–5. We record results in the W5 logbook section together as we go. Anything beyond the core tasks is optional polish.
Setup before the demo
- AI tool of choice open in the side panel (Claude / ChatGPT / Cursor).
- Editor + terminal split.
- Prompt templates open:
workshops/05-ai/starter/refactor-prompt.mdandtest-gen-prompt.md. workshops/05-ai/starter/active-comprehension-prompt.mdopen, and the comprehension-preserving prompt already set as your system prompt for the demo session so students see it shaping the answers.- Repo state: clean, all tests passing (or the flaky ones passing on this run).
Step 0: Set the comprehension-preserving system prompt (~2 min)
Open
starter/active-comprehension-prompt.md. Read the prompt out loud."Before any task, I set this as my system prompt. It tells the AI to explain its reasoning, state assumptions and tradeoffs, offer an alternative, ask me clarifying questions, and quiz me at the end. The point is to stop the AI from thinking for me."
Name the risk plainly:
"The danger with AI isn't that it's wrong, we catch wrong with tests. It's that it's right and you accept code you never understood. That's cognitive offloading. You can't debug or extend what you don't understand. This prompt, plus proving comprehension after each task, is how we avoid it."
Leave it on for every step below. Call out, live, when the AI asks you a question or surfaces an assumption, that is the prompt working.
Step 1: Refactor the god function in reports.ts (~10 min)
Open
apps/api/src/routes/reports.ts. Scroll. Show its length."~130 lines. One function. We're going to ask AI to split it apart, and then verify it didn't break anything."
Open
starter/refactor-prompt.md. Read it out loud, slowly."Notice what we're doing here. We're giving the AI the file, the conventions, AND a constraint: don't change behavior. That last constraint is what makes the output safe to apply."
Paste the prompt template into your AI tool. Paste the full contents of
reports.ts. Run it.While it generates, talk through what we're hoping for:
- Smaller functions for each concern (CSV building, date formatting, timezone handling)
- Behavior preserved (tests still pass)
Apply the output. Run the existing tests:
bashcd apps/api && npx jest --testPathPattern=reportsIf tests pass: high-five the chat.
If tests fail: paste the failure back to the AI: "This test failed after your refactor: [paste]. Fix the refactor." Iterate.
Key teaching moment: show students that the AI's first answer often isn't right. The skill is iterating on the prompt.
Prove comprehension, live. Close the AI panel. From memory, name each helper the AI extracted and what it produces. Then change one helper by hand (rename it, tweak a behavior) with no AI and re-run the tests.
"If I can't name what it did and change it myself, I didn't understand it, I just accepted it. That's the move I want you copying: explain it back, or modify it without AI."
Step 2: Generate tests for auth.service.ts (~7 min)
Open
apps/api/src/services/auth.service.ts. Show that there are zero tests for it.Open
starter/test-gen-prompt.md. Walk through what the prompt does.Paste into AI. Paste the source file. Paste
apps/api/tests/projects.test.tsas the style guide.Apply the output to
apps/api/tests/auth.service.test.ts.Run the tests:
bashcd apps/api && npx jest --testPathPattern=auth.serviceExpect failures. AI typically gets one or two things wrong:
- Imports something that doesn't exist (
@orbittasks/apiinstead of relative paths). - Asserts on JWT payload structure directly when it should use
verifyToken. - Misses an edge case.
- Imports something that doesn't exist (
Walk through fixing one failure live. Push the rest back to the AI with the failure messages.
Key teaching moment: the AI is a junior pair programmer. Treat its output like a PR you have to merge.
Prove comprehension, live. Pick one passing test, change its assertion to something you know is wrong, and predict the failure out loud before running it. Run it. If the failure matches your prediction, you understood the test. If it surprises you, you didn't, and that's the gap blind acceptance would have shipped.
Step 3: Mock the SDK clients to collapse the slow API tests (~7 min)
This is the highest-value win of the session. Spend the time here.
Run the full API suite and time it:
bashcd apps/api && npm run test:apiIt takes ~12 minutes. Show students why: the integration tests make real HTTP through the SDK clients.
Point at the cost center: these tests, plus
reports.test.ts, all drive real client calls:tests/integration/billing.rollup.test.tstests/integration/email.campaigns.test.tstests/integration/webhooks.fanout.test.tstests/integration/search.reindex.test.tstests/integration/notifications.blast.test.tstests/reports.test.ts
Show that the services are already constructor-injectable, e.g.:
"
BillingServicetakesclient: BillingClient = new BillingClient(). The seam is already there. We just inject a mock instead of the real client."Ask the AI to inject mocked clients for the six clients (billing, email, search, webhooks, notifications, audit) so the tests stop making real HTTP:
"These integration tests make real HTTP through the SDK clients. The services accept an injectable client in their constructor (e.g.
BillingService(client: BillingClient = new BillingClient())). Generate mocked clients for billing, email, search, webhooks, notifications, and audit, and inject them in the tests so no real network calls happen. Keep the assertions the same."Apply the mocks. Re-run and time it:
bashcd apps/api && npm run test:apiExpected drop: from ~12 minutes to seconds. This is the payoff Workshop 3 set up: the seam was added then, AI wires up the mocks now.
Step 4: Refactor the slow FE list test (~5 min)
Open
apps/web/tests/components/TaskList.test.tsx. Scroll to the "renders a long list of tasks" test.Show it rendering 500 items. Run the test, time it:
bashcd apps/web && npx vitest run TaskList --no-coverageAsk the AI to virtualize the
TaskListcomponent (don't change the test; change the component to be efficient with N items):"Here's the TaskList component. Add window-based virtualization so it can render thousands of items efficiently. Use only React built-ins, no new dependencies."
Apply the change to
TaskList.tsx. Re-run the test. Expected drop: from ~2s to under 200ms.
Step 5: Fix the flaky CommentList test (~5 min)
Open
apps/web/tests/components/CommentList.test.tsx. Run the "renders a relative timestamp" test a few times. Show that it occasionally fails depending on time of day.Ask the AI to fix it. The preferred fix is injecting a fixed
now,formatRelativeDays(input, now = new Date())already takes anowparam, so the test can pass a fixed date instead of relying on the real clock:"This test depends on the real clock via formatRelativeDays. The function already takes a
nowparameter (formatRelativeDays(input, now = new Date())). Pass a fixednow(or thread it through the component prop) so the assertion is deterministic."If injecting
nowisn't reachable from the test, the alternative is faking the clock:"Alternatively, replace the real clock with vi.useFakeTimers() so the assertion is deterministic."
Apply the fix. Re-run the test 10 times in a loop:
bashfor i in {1..10}; do npx vitest run CommentList --no-coverage --silent; doneEvery run passes. Teaching moment: the AI sometimes gives you a real, durable fix on the first try. Sometimes it doesn't. Both happen.
Step 6: Fix the flaky Login test (~5 min)
Open
apps/web/tests/pages/Login.test.tsx. Scroll to the "Signing in…" test.Run it a few times. Sometimes it passes, sometimes it doesn't.
Ask the AI:
"This test fires a click and synchronously asserts on the resulting state. React batches state updates so the rerender is queued. Fix the test using
await waitFororfindByTextso it waits for the rerender."Apply the fix. Run repeatedly. All pass.
Capture for Workshop 7: the pattern "synchronous assertion on async state update" is something to ban via a lint rule. We'll do that in Workshop 7.
After the live demo
"We just used AI on six real problems in this codebase together, with a system prompt that kept us in the loop, and we proved we understood the output every time. With your own comprehension prompt set (handout Part 0), we keep going together on the next task, following along in your own clones. The handout has the rubric and the comprehension checks; we record the results in the W5 logbook section as we go. 10 minutes."
That's the handoff to the next block. We keep working together rather than splitting off into solo work. Any task past the core ones is optional polish.
What students will not realize without me saying it
- The prompts are first-class code. Save them. Version them. Reuse them. That includes the comprehension-preserving system prompt.
- The AI is wrong sometimes in ways that compile but lie. Always run the tests.
- The bigger risk is the code that's right and you didn't understand. That's cognitive offloading, and it's invisible because nothing breaks today. The tell: could you re-derive or explain it without the AI? If not, you offloaded comprehension. Explain it back or modify it without AI to close the gap.
- Failure modes are predictable once you've used AI for a week or two. The categories in
starter/evaluation-rubric.mdwill become second nature. - The team policy you write in Workshop 7 codifies all of the above. Today is the input data.