Skip to content

Workshop 2: Slide Content

Deck title: Profiling, Bottleneck Analysis & Root Cause Diagnosis Duration: 60 minutes · ~22 slides OAF layouts used: lavender title / dark agenda / white headline+body / dark headline+body / 2-col / divider / Big Stat closer


Slide 1: Title [lavender title slide]

Title: Finding the real problem. Eyebrow: Session 2 of 8 · Profiling & Root Cause Diagnosis


Slide 2: Recap [white, headline + body]

Title: Last week you measured. This week you diagnose. Subtitle: Where we are in the arc Body: Session 1 produced a baseline: numbers and observations from a real run of the OrbitTasks pipeline. Today we take that data and figure out why it looks the way it does. By the end you'll have a ranked list of bottlenecks and the underlying causes for each.


Slide 3: Agenda [dark blue, agenda layout]

Title: Agenda Bullets (5):

  • 10 min: Profiling techniques: how to read what your machine is telling you
  • 20 min: Apply profiling to your baseline data from Session 1
  • 10 min: Root cause analysis: the 5 Whys and the fishbone
  • 10 min: Apply root cause analysis to the OrbitTasks pipeline
  • 10 min: Prioritize the bottlenecks and discuss how senior engineers do this work

Slide 4: Section divider [dark blue]

Big text: Profiling, / not guessing.


Slide 5: What is profiling [white, headline + body]

Title: What is profiling? Subtitle: The honest definition Body: Profiling is the practice of measuring where a program actually spends its time, rather than guessing. The opposite of profiling is "I bet the database is slow" with no data. Profiling produces numbers like "step X took 14 seconds, step Y took 0.2 seconds." Once you have those numbers, the conversation changes from opinion to evidence.


Slide 6: Two kinds [white, 2-col]

Title: Two flavors of profiling. / Both are useful. Left subhead: Instrumentation Left body: You put explicit timing around blocks of code. Cheap, simple, surgical, but you only see what you remember to measure. The bash script you ran last week (scripts/measure.sh) is pure instrumentation. Right subhead: Sampling Right body: A profiler interrupts the running program periodically and records what's on the stack. Heavier and noisier, but you see things you'd never have thought to measure. Tools: Node's --inspect, perf on Linux, time for crude wall-clock.


Slide 7: What we use today [white, title + body + bullets]

Title: Today's tools / are deliberately simple. Body: You don't need a flame graph generator to make real progress. The most useful profiling tool is the one you actually run. We'll use three: Bullets (4):

  • baseline.log: the timing output from last week's npm run ci run.
  • The raw test output: Jest and Vitest both print per-test durations.
  • time and date +%s: wrap any command to get a wall-clock measurement.
  • Your own brain: the patterns matter as much as the numbers.

Slide 8: Reading a build log [dark, title + body]

Title: How to read a build log / line by line. Subtitle: What jumps out Body: Walk through a real baseline.log together: identify which stages are observably long, which tests dominate the test stage, and any non-deterministic outcomes. The point isn't memorizing format; it's learning to scan for outliers and to be suspicious of any single number that's much larger than its neighbors.


Slide 9: Section divider [dark blue]

Big text: Apply it / to your data.


Slide 10: Workshop: find the slowest [white, headline + body]

Title: Now together. Open your baseline.log. Subtitle: Workshop, 15 minutes, we do this together Body: We rank the stages together while you fill the W2 section of your logbook. For each pipeline stage, rank it from fastest to slowest. Then circle the single biggest contributor: here test:api is ~95% of total runtime. That's our first target for the rest of the session.


Slide 11: Hot spots vs cold spots [white, headline + body]

Title: Hot spots and the long tail Subtitle: A pattern you'll see for the rest of your career Body: Most pipelines are dominated by a small number of slow steps: usually one or two operations take more time than everything else combined. That's the "hot spot." But there's also a "long tail": dozens of small operations that each look cheap but add up. Today focus on hot spots. We'll handle the long tail starting in Session 3.


Slide 12: The long tail pattern [dark, headline + body]

Title: Why the long tail is dangerous Subtitle: It hides in plain sight Body: A single 30-second slow test is obviously the problem. Ten tests at 3 seconds each look fine individually, but together they're the same problem in disguise. Senior engineers learn to be just as suspicious of "lots of small things" as they are of "one big thing." Today, note both.


Slide 13: Section divider [dark blue]

Big text: Symptoms / vs causes.


Slide 14: Symptoms vs causes [white, headline + body]

Title: "The test is slow" is a symptom. Subtitle: The root cause lives further down Body: When test:api takes ~12 minutes, "the tests are slow" is true but not useful. Why does it take ~12 minutes? Because a suite of integration tests each make many real HTTP round-trips. Why real HTTP? Because there's no mock layer. Why is there no mock layer? Because there was no test-infra convention. Now you have something actionable. That's root cause analysis.


Slide 15: The 5 Whys [white, title + body + bullets]

Title: The 5 Whys: / the simplest tool that works. Body: Invented at Toyota in the 1950s. Still in active use at every engineering org you'll work at. The technique is exactly what it says: ask "why?" five times, in sequence. By the fourth or fifth "why?" you've usually moved from a symptom to something you can actually change. Bullets (4):

  • Why is the pipeline slow? → Because test:api takes ~12 minutes.
  • Why does test:api take ~12 minutes? → Because a suite of integration tests each make many real HTTP round-trips.
  • Why real HTTP? → Because there's no mock layer for the external service clients.
  • Why is there no mock layer? → Because there was no test-infra convention. ← root cause

Slide 16: Worked example [dark, headline + body]

Title: Worked example, together Subtitle: 5 Whys on a real OrbitTasks bottleneck Body: As a group, we apply the 5 Whys to apps/api/tests/reports.test.ts and the tests/integration/ suite, specifically the many real HTTP round-trips each test makes. We don't move on until everyone's traced it back to a root cause that's about process or convention, not about the line of code itself. Process root causes are what you actually fix.


Slide 17: Fishbone diagrams [white, 2-col]

Title: When 5 Whys isn't enough, / draw a fishbone. Left subhead: What it is Left body: A diagram with one branch per category of cause. Categories typically include People, Process, Tools, Environment, and Materials. Each branch holds the contributing factors you can think of. The picture forces you to look beyond the most obvious answer. Right subhead: When to use it Right body: Use it when a 5 Whys gives you one answer but you suspect there are several contributing causes. Flaky tests are a classic fit: the failure is rarely just one thing. We'll draw one together for the flaky test on the next slide.


Slide 18: Section divider [dark blue]

Big text: Apply RCA / to your repo.


Slide 19: Workshop: 5-Why your top 3 [white, headline + body]

Title: Pick your top three. Subtitle: Workshop, 10 minutes, we do this together Body: Together we take the three biggest bottlenecks you identified in Part 1 and run a 5 Whys on each, recording them in your logbook as we go. You should land on a process root cause for each: something about how the team works, not just about the code. If you only get to "because someone wrote it that way," push one more level.


Slide 20: Prioritization [white, title + body + bullets]

Title: Not every bottleneck / is worth fixing. Body: A 2×2 of impact and effort. Top-right wins are obvious. The interesting question is the rest: high-impact / high-effort projects need executive buy-in; low-impact / low-effort fixes are good warm-up work; low-impact / high-effort almost always loses to something else. We rank your three together now and record them in your logbook. Bullets (4):

  • High impact, low effort: fix it this sprint, no discussion needed.
  • High impact, high effort: write a proposal. Get the team aligned.
  • Low impact, low effort: good for new hires; nice low-stakes wins.
  • Low impact, high effort: kill it. Politely.

Slide 21: RCA at senior levels [dark, headline + body]

Title: Why senior engineers care about this so much Subtitle: Career relevance Body: As you move from junior to senior, the work shifts from "fix things" to "diagnose what to fix." Every staff engineer interview includes some version of "tell me about a time you traced a hard bug to its source." The technique is the same one you used today. The depth comes from practice.


Slide 22: Session 3 preview [white, title + body + bullets]

Title: Session 3: CI/CD pipeline / design and configuration. Body: You'll take the bottlenecks you identified today and start fixing them, for real, using GitHub Actions. The next session is hands-on heavy. Bullets (3):

  • Understand how a modern CI pipeline is structured.
  • Write your own .github/workflows/ci.yml for OrbitTasks.
  • Apply caching and parallelization, then measure the improvement.

Slide 23: Questions [dark blue, Big Stat layout]

Big text: Questions?