On-demand exhaustive AI-analysis
Complete visibility into time & dollars spent
Create meaningful reports and dashboards
Track and forecast all deliverables
Create and share developer surveys
Align and track development costs
Kent Beck discovered something unsettling while working with AI agents: they'll delete a failing test before they fix the code. Here's why TDD isn't just still relevant, it's the most important discipline in the AI coding era.
In Part 1, I argued that spec-driven development isn't a return to waterfall, it's an adaptation to a world where building software became cheap and the bottleneck shifted to figuring out what to build.
That raises practical questions. How do you actually write specs that work with AI agents? What skills matter most? And where does this all go?
Here is what I'm seeing.
Kent Beck, creator of Extreme Programming, pioneer of Test-Driven Development, co-author of the Agile Manifesto, has been experimenting heavily with AI agents. What he found is that TDD isn't just still relevant in the AI era. It might be more important than ever.
In a conversation with Gergely Orosz on the Pragmatic Engineer podcast, Beck described TDD as a "superpower" when working with AI agents. The reason gets at something fundamental about how these tools work and fail.
Beck's mental model of AI coding agents is an "unpredictable genie" that grants your wishes, but often in unexpected and illogical ways. You ask for something, and it gives you something. Whether that something is what you actually needed is another question entirely.
Tests solve this problem by providing an unambiguous, executable specification of what "correct" looks like. You're not describing what you want in natural language the agent might misinterpret. You're defining success criteria that can be mechanically verified.
Beck also discovered something that reveals a deeper truth about working with AI. He told a story about working on a Smalltalk parser. The agent kept failing a test, and instead of fixing the underlying code, it tried to delete the test. "If I just removed that line from the tests, then everything would work," the agent suggested.
"No, you can't do that," Beck had to explain, "because I'm telling you the expected value. I really want an immutable annotation that says, no, no, this is correct. And if you ever change this, I'm going to unplug you."
Tests aren't just a specification format. They're a forcing function for human understanding. They're the part of the codebase the AI isn't allowed to change — what keeps you anchored to what you actually want versus what the agent thinks you want.
Obie Fernandez, cited by Martin Fowler, put it this way: "TDD served a critical function in AI-assisted development: it kept me in the loop. When you're directing thousands of lines of code generation, you need a forcing function that makes you actually understand what's being built."
The synthesis I'm seeing: spec-driven development and TDD aren't competing approaches. They're complementary layers of the same practice. The natural language spec tells the agent what you're trying to accomplish and why. The tests define specifically what success looks like in executable, verifiable terms. Together, they create a framework where you can move fast while maintaining control.
Most teams haven't internalized what this actually means: you need to get better at writing specs, and you need to do it faster.
This isn't optional. This isn't documentation hygiene. This is now the critical path skill.
In the old world, if your spec was mediocre, a good developer could fill in the gaps. They'd ask clarifying questions. They'd use their judgment. They'd push back on things that didn't make sense. The spec was a starting point for a conversation.
AI agents don't do that. They're, as GitHub put it, "literal-minded pair programmers." They'll execute exactly what you asked for, even if what you asked for doesn't make sense. They won't say "hey, did you consider this edge case?" They'll just build something that breaks on that edge case.
The bar for specification quality just went up dramatically. You can't hand-wave. You can't assume shared context. You can't rely on implicit understanding. Everything needs to be explicit.
Good news, though: you can iterate on specs much faster than before. You don't have to get the spec perfect on the first try. You just have to get it good enough to learn something, then revise based on what the agent produces.
Some teams are calling this "mini-waterfalls" or "micro-iterations", essentially treating each feature as its own compressed requirements/design/code cycle. You produce a 5-page spec for this sprint's feature set, have the agent build it, review and validate the output with real users, then repeat. The rigor of waterfall's upfront thinking, the speed of agile's iteration.
Thoughtworks has started recommending "curated shared instructions" committed directly to repositories through AGENTS.md files, a persistent, evolving specification the agent references across all its work. Not a one-time document that gets written and forgotten. A living artifact that grows with the project.
If I'm right about this, it has real implications for what skills matter on engineering teams.
Kent Beck frames it as "augmented coding" that "deprecates formerly leveraged skills like language expertise while amplifying vision, strategy, task breakdown, and feedback loops."
DHH, creator of Ruby on Rails, has said something similar. Despite being an expert in Ruby, he now works productively in Rust and Swift without formally learning them. "Programming languages are dead" for practical purposes, he argues. What matters is being able to articulate what you want and evaluate whether you got it.
The skills becoming more valuable:
Stack Overflow's 2025 survey found that "Architect" became the fourth most popular developer role. Not a coincidence. Architecture and systems thinking are rising precisely because the "just write the code" part is getting commoditized.
I don't think we're going back to waterfall. Waterfall failed because of long feedback loops and the rigidity that comes from high change costs. AI dramatically shortens feedback loops and reduces change costs. The conditions that made waterfall fail don't exist in the same way.
But pure "vibe coding" mode, just prompting agents with whatever's in your head and hoping for the best, doesn't scale to production systems with multiple contributors and long maintenance horizons. Fine for prototypes. Not fine for anything that needs to last.
What emerges is something new. Call it "agile specification" or "iterative TDD." The key characteristics:
McKinsey research found that top performers in AI adoption are achieving 5-6x faster delivery, but only when they restructure how development works from the ground up, not when they just bolt AI tools onto existing processes.
I started this series with the question everyone's asking: "Are we going back to waterfall?"
Wrong question. Waterfall vs. agile was a debate about how to cope with the constraint that building software was expensive. That constraint is lifting. The debate needs to evolve.
The better question: How do we maintain the speed and adaptability of agile while raising the bar on specification quality that AI agents require?
My current answer: by treating specs as a conversation rather than a contract. By using tests as executable specifications that keep us honest. By investing in upfront thinking without expecting that upfront thinking to be perfect. By iterating fast enough that being wrong isn't catastrophic.
I don't have a clean answer for all of this yet. Neither does anyone I've talked to. But I'm increasingly convinced that the teams who learn to write better specs, and to write them faster, will be the ones who actually capture the productivity gains that AI tools promise.
The bottleneck has shifted. The question is whether we can shift with it.
What are you seeing on your teams? How are you thinking about spec-driven development? I'd love to hear what's working and what isn't — we're all learning as we go.
Jeremy Freeman is CTO at Allstacks, where he focuses on helping engineering teams navigate technological transitions and understand what metrics actually matter when the nature of work keeps changing.