Skip to content

Workshop 6: Slide Content

Deck title: Experimentation Design & Measuring Engineering Impact Duration: 60 minutes · ~22 slides


Slide 1: Title [lavender title slide]

Title: Prove the win. Eyebrow: Session 6 of 8 · Experimentation & DORA Metrics


Slide 2: Recap [white, headline + body]

Title: You've improved a lot of things. Now make the case. Subtitle: Where we are in the arc Body: Faster CI. Better onboarding. AI-assisted refactors. All real, all measurable. But "I made the pipeline faster" isn't the same thing as "this saves the team N engineer-hours per week." Today we turn the improvements into evidence a leadership team would act on, together, live in this session, and record it all in your logbook as we go.


Slide 3: Agenda [dark blue, agenda layout]

Title: Agenda Bullets (6):

  • 10 min: Hypothesis-driven experimentation and the DORA metrics
  • 5 min: Revisit your Session 1 baseline measurements
  • 15 min: Define hypotheses and select success metrics
  • 15 min: Compare baseline versus improved measurements
  • 10 min: Design a phased rollout strategy
  • 5 min: Peer review and group feedback

Slide 4: Section divider [dark blue]

Big text: Experiments, / not just changes.


Slide 5: Why hypotheses matter [white, headline + body]

Title: "I think this will help" is not enough Subtitle: The mindset shift Body: Senior engineers don't propose changes. They propose hypotheses. The difference: a hypothesis says "if we do X, we expect Y to improve by Z, and we'll know in N days." That framing forces you to define success up front, and it gives you something to falsify when reality disagrees with you.


Slide 6: The format [white, title + body + bullets]

Title: Three sentences. / Always the same shape. Body: A good hypothesis has three parts. Get used to writing them in this order; every PM, EM, and staff engineer recognizes the format. Bullets (3):

  • "If we [the change]…"
  • "We expect [the metric] to [improve by amount] within [time window]…"
  • "Because [the underlying reason we believe this will work]."

Slide 7: Worked hypothesis [dark, headline + body]

Title: Example, from your own work Subtitle: A real OrbitTasks hypothesis Body: "If we shard the test suite across CI workers and mock the SDK clients, we expect the total pipeline duration to drop from ~13 minutes to under ~4 minutes within 2 weeks, because the test:api integration suite is ~95% of pipeline time and runs serially (maxWorkers: 1) against a mock server, so sharding plus mocking the clients eliminates most of it." That's a hypothesis you can prove or disprove.


Slide 8: Section divider [dark blue]

Big text: The DORA / four.


Slide 9: DORA in 90 seconds [white, headline + body]

Title: The four numbers leadership actually wants Subtitle: From the State of DevOps research Body: DORA stands for DevOps Research and Assessment. Their research identified four metrics that consistently predict engineering team performance. Today every CTO knows them. If you can speak to all four in an interview, you sound like a senior engineer regardless of your years of experience.


Slide 10: The four [white, title + body + bullets]

Title: The four / DORA metrics Body: Two are about speed; two are about stability. Healthy teams improve all four. Unhealthy teams trade one for another. Bullets (4):

  • Deployment frequency: how often does code reach production?
  • Lead time for changes: how long from commit to production?
  • Change failure rate: what % of deployments cause an incident?
  • Mean time to recovery: when something breaks, how long to fix?

Slide 11: How OrbitTasks maps [dark, headline + body]

Title: How OrbitTasks maps to DORA today Subtitle: Honest numbers Body: Deployment frequency: ~2-3 per week. Lead time: ~hours after PR is approved, dominated by the ~13-minute CI (almost all of it the test:api integration suite). Change failure rate: ~10% earlier in the term, from the five flaky tests producing false greens, which you fixed in Session 5. Mean time to recovery: unknown (no incident tracking). Three of the four are in your control to improve. The fourth is where you'd push leadership for visibility.


Slide 12: Section divider [dark blue]

Big text: Revisit / your baselines.


Slide 13: Pull out your data [white, headline + body]

Title: Open your baseline.log and Session 1 numbers Subtitle: Workshop, 5 minutes, together Body: We lay out everything you have together: Session 1's baseline numbers, the post-CI improvement numbers from Session 3, the onboarding time delta from Session 4. As we build the side-by-side "before" vs "after" table, record this in your logbook as we go.


Slide 14: Workshop: hypothesis writing [white, headline + body]

Title: Workshop, 15 minutes, together Subtitle: We write three hypotheses live Body: Pick the three improvements with the biggest measured delta. For each, we write the three-sentence hypothesis together. Then note what metric you used to prove it, and what would have changed your mind. Record this in your logbook as we go.


Slide 15: Section divider [dark blue]

Big text: Compare / numbers.


Slide 16: Workshop: before/after [white, headline + body]

Title: Workshop, 15 minutes, together Subtitle: We quantify your wins live Body: For each hypothesis, we work through the before/after numbers together. Compute the % change and the absolute change, and record them in your logbook as we go. Absolute change matters more: "30% faster" is meaningless without knowing the baseline. "Saved 4 minutes per PR × 50 PRs per week = 3 engineer-hours per week" is the version that makes a CFO listen.


Slide 17: Common mistakes [dark, title + body + bullets]

Title: Common mistakes / when reporting impact. Body: The pattern your reader will not recognize but will distrust. Bullets (4):

  • Reporting % change without absolute numbers: sounds bigger than it is.
  • Comparing apples to oranges: Session 1 cold install vs Session 3 cached install.
  • Cherry-picking the best run: the median matters; the worst case matters more.
  • Ignoring confounds: did anything else change in the same window?

Slide 18: Rollout strategies [white, 2-col]

Title: Plan the rollout / before the rollout. Left subhead: Phased Left body: Apply the change to one repo or one team first. Measure for a week. If the metrics improve, expand. If they don't, you've contained the blast radius. Almost always the right answer. Right subhead: Big bang Right body: Apply the change to everything at once. Only acceptable when the change is reversible in seconds and the test environment is exactly production. Rarer than you'd think.


Slide 19: Workshop: rollout plan [white, headline + body]

Title: Workshop, 10 minutes, together Subtitle: We draft your rollout live Body: For each hypothesis, we write a one-paragraph rollout together: who runs the experiment first, what they'll measure, for how long, and what threshold makes you ship versus roll back. Record this in your logbook as we go.


Slide 20: Communicating to leadership [dark, headline + body]

Title: Translating engineering wins to business outcomes Subtitle: The part senior engineers practice for years Body: Leadership doesn't care about pipeline duration. They care about "engineering capacity unlocked," "time to first PR for new hires," "cost per deploy." Same numbers, different framing. The skill is doing the translation without losing the underlying truth. Practice it on your hypotheses now.


Slide 21: Peer review [white, headline + body]

Title: Workshop, 5 minutes, together Subtitle: Trade with a partner Body: Swap logbooks with the person next to you. Read their hypotheses, before/after numbers, and rollout plans. Find one thing that's strong; find one thing you'd push on. Share the feedback verbally. Senior engineers do this informally constantly; it's a muscle.


Slide 22: Session 7 preview [white, title + body + bullets]

Title: Session 7: Engineering standards / and avoiding AI tech debt. Body: You've measured the wins. Now we make sure they stick, and we set guardrails for the AI tools you started using in Session 5. Bullets (3):

  • Write a CI/CD governance doc, an AI usage policy, and coding standards.
  • Implement at least one enforcement mechanism: pre-commit hook, lint rule, or PR template.
  • Debate developer freedom versus engineering standards. The line moves.

Slide 23: Questions [dark blue, Big Stat layout]

Big text: Questions?