Why I optimise for the unremarkable workflow, not the wow demo
A hundred engineers using AI by default beats one jaw-dropping demo. Production compounds on the thousandth unremarkable run, so measure throughput, not wow.
A hundred engineers reaching for AI by default beats one jaw-dropping demo, every time. A pilot is judged on a single impressive run. Production is judged on the thousandth unremarkable one. So I optimise for the boring, everyday workflow that compounds across a team, not the spike that lights up a steering committee and then goes nowhere. The wow demo is a moment. The unremarkable workflow is a habit, and habits are what move delivery numbers.
Why does the wow demo mislead?
A demo is a curated event. Someone picks the input, polishes the prompt, and runs it until it shines. That tells you the ceiling of what is possible in ideal conditions, which is genuinely useful information and almost completely useless for forecasting daily value.
Daily value comes from the floor, not the ceiling: what happens when a tired engineer on a Friday reaches for the tool on a messy, real ticket. If that interaction is smooth enough to repeat without thinking, it compounds. If it requires the demo conditions to work, it gets abandoned the first busy week, which is every week. This is the same mechanism behind why AI pilots stall before production: the skills that win a demo are exactly the ones that do not generalise.
What should you measure instead?
Measure the thing that accumulates. The honest signals are unglamorous:
- Throughput. How much work the team actually ships per cycle, not how fast one task went once.
- Cycle time. How long a unit of work takes from start to done, averaged across real tickets, including the ugly ones.
- Default usage. What fraction of eligible work touches the tool without anyone being reminded.
Notice what is missing: the wow factor. “How impressed was the room” is a vanity metric. It feels like progress and predicts nothing about month three.
How do you make the unremarkable workflow win?
You design for repeatability, not applause. Put the tool where the work already lives so reaching for it costs nothing. Lower the activation energy until the AI path is the path of least resistance on a normal day. Then track the boring metrics over weeks and let the curve, not the highlight reel, tell you whether it is working.
This is the heart of how I run team enablement: build the default behaviour, then measure the compounding. It is also what 700 trained engineers taught me about adoption. The teams that win are not the ones with the best demo. They are the ones where using AI stopped being remarkable at all.