The plan-as-living-document idea is the part that actually changes outcomes. Most people treat the spec as a throwaway artifact that gets stale the moment implementation starts. Versioning it alongside the code and closing the loop after each commit turns it into something the agent can actually trust on the next session. That compounds fast across a long project.
The step I'd add between Diagram and Mockup is a constraints doc. Not architecture, not UI, just the things that are off the table: third party services you won't use, patterns that caused problems before, decisions that were made for non-obvious reasons. Agents are optimistic by default. They'll reach for the clean solution without knowing why you're not using it. A short "don't do X because Y" document prevents a whole category of review comments.
One thing I've noticed with the test-first step: it works best when you write the acceptance criteria in plain language first, before asking the agent to generate the test code. If you hand it to the agent to interpret, it tends to write tests that pass rather than tests that catch failures. The distinction sounds subtle but it shows up in code review constantly.
The plan-as-living-document idea is the part that actually changes outcomes. Most people treat the spec as a throwaway artifact that gets stale the moment implementation starts. Versioning it alongside the code and closing the loop after each commit turns it into something the agent can actually trust on the next session. That compounds fast across a long project. The step I'd add between Diagram and Mockup is a constraints doc. Not architecture, not UI, just the things that are off the table: third party services you won't use, patterns that caused problems before, decisions that were made for non-obvious reasons. Agents are optimistic by default. They'll reach for the clean solution without knowing why you're not using it. A short "don't do X because Y" document prevents a whole category of review comments. One thing I've noticed with the test-first step: it works best when you write the acceptance criteria in plain language first, before asking the agent to generate the test code. If you hand it to the agent to interpret, it tends to write tests that pass rather than tests that catch failures. The distinction sounds subtle but it shows up in code review constantly.