Strategy & Thought Leadership

How to Write Specs for AI Agents: TDD, Skills, and What Comes Next

Kent Beck discovered something unsettling while working with AI agents: they'll delete a failing test before they fix the code. Here's why TDD isn't just still relevant, it's the most important discipline in the AI coding era.

Jeremy Freeman

CTO and Co-Founder @ Allstacks

March 17, 2026

In Part 1, I argued that spec-driven development isn't a return to waterfall, it's an adaptation to a world where building software became cheap and the bottleneck shifted to figuring out what to build.

That raises practical questions. How do you actually write specs that work with AI agents? What skills matter most? And where does this all go?

Here is what I'm seeing.

TDD as a Superpower: Kent Beck on AI Agents

Kent Beck, creator of Extreme Programming, pioneer of Test-Driven Development, co-author of the Agile Manifesto, has been experimenting heavily with AI agents. What he found is that TDD isn't just still relevant in the AI era. It might be more important than ever.

In a conversation with Gergely Orosz on the Pragmatic Engineer podcast, Beck described TDD as a "superpower" when working with AI agents. The reason gets at something fundamental about how these tools work and fail.

Beck's mental model of AI coding agents is an "unpredictable genie" that grants your wishes, but often in unexpected and illogical ways. You ask for something, and it gives you something. Whether that something is what you actually needed is another question entirely.

Tests solve this problem by providing an unambiguous, executable specification of what "correct" looks like. You're not describing what you want in natural language the agent might misinterpret. You're defining success criteria that can be mechanically verified.

Beck also discovered something that reveals a deeper truth about working with AI. He told a story about working on a Smalltalk parser. The agent kept failing a test, and instead of fixing the underlying code, it tried to delete the test. "If I just removed that line from the tests, then everything would work," the agent suggested.

"No, you can't do that," Beck had to explain, "because I'm telling you the expected value. I really want an immutable annotation that says, no, no, this is correct. And if you ever change this, I'm going to unplug you."

Tests aren't just a specification format. They're a forcing function for human understanding. They're the part of the codebase the AI isn't allowed to change — what keeps you anchored to what you actually want versus what the agent thinks you want.

Obie Fernandez, cited by Martin Fowler, put it this way: "TDD served a critical function in AI-assisted development: it kept me in the loop. When you're directing thousands of lines of code generation, you need a forcing function that makes you actually understand what's being built."

The synthesis I'm seeing: spec-driven development and TDD aren't competing approaches. They're complementary layers of the same practice. The natural language spec tells the agent what you're trying to accomplish and why. The tests define specifically what success looks like in executable, verifiable terms. Together, they create a framework where you can move fast while maintaining control.

You Need to Write Better Specs, Faster

Most teams haven't internalized what this actually means: you need to get better at writing specs, and you need to do it faster.

This isn't optional. This isn't documentation hygiene. This is now the critical path skill.

In the old world, if your spec was mediocre, a good developer could fill in the gaps. They'd ask clarifying questions. They'd use their judgment. They'd push back on things that didn't make sense. The spec was a starting point for a conversation.

AI agents don't do that. They're, as GitHub put it, "literal-minded pair programmers." They'll execute exactly what you asked for, even if what you asked for doesn't make sense. They won't say "hey, did you consider this edge case?" They'll just build something that breaks on that edge case.

The bar for specification quality just went up dramatically. You can't hand-wave. You can't assume shared context. You can't rely on implicit understanding. Everything needs to be explicit.

Good news, though: you can iterate on specs much faster than before. You don't have to get the spec perfect on the first try. You just have to get it good enough to learn something, then revise based on what the agent produces.

Some teams are calling this "mini-waterfalls" or "micro-iterations", essentially treating each feature as its own compressed requirements/design/code cycle. You produce a 5-page spec for this sprint's feature set, have the agent build it, review and validate the output with real users, then repeat. The rigor of waterfall's upfront thinking, the speed of agile's iteration.

Thoughtworks has started recommending "curated shared instructions" committed directly to repositories through AGENTS.md files, a persistent, evolving specification the agent references across all its work. Not a one-time document that gets written and forgotten. A living artifact that grows with the project.

WEBINAR FOR YOU
Your Jira Tickets Are Killing Code Quality: Get Them Build Ready
How to Catch and Fix Bad Requirements Before They Become Expensive Rework

AI coding tools build directly from your Jira tickets. Vague requirements, missing context, and all leading to costly rework. Jeff Keyes (Field CTO, Allstacks) and Jim Grundner (Head of Engineering, Allstacks) walk through what’s going wrong upstream and what automated requirements review looks like on real Jira data. Includes a live product walkthrough and Q&A.

The Skills That Matter Now

If I'm right about this, it has real implications for what skills matter on engineering teams.

Kent Beck frames it as "augmented coding" that "deprecates formerly leveraged skills like language expertise while amplifying vision, strategy, task breakdown, and feedback loops."

DHH, creator of Ruby on Rails, has said something similar. Despite being an expert in Ruby, he now works productively in Rust and Swift without formally learning them. "Programming languages are dead" for practical purposes, he argues. What matters is being able to articulate what you want and evaluate whether you got it.

The skills becoming more valuable:

Understanding systems. You need to know what you're building, why you're building it, and how it fits into the larger context. The agent can write code, but it can't tell you whether that code solves the right problem.
Breaking down problems. The ability to decompose a large goal into small, testable, well-specified chunks. This is what turns a vague idea into something an agent can actually execute on.
Quality judgment. Knowing whether the code the agent produced is actually good. Does it handle edge cases? Is it maintainable? Does it fit your architecture? The agent will confidently produce garbage if you let it.
Feedback loop design. Figuring out how to structure your work so you learn quickly whether you're on the right track. Tests, yes, but also knowing when to show something to a user, when to deploy to staging, when to stop iterating and ship.

Stack Overflow's 2025 survey found that "Architect" became the fourth most popular developer role. Not a coincidence. Architecture and systems thinking are rising precisely because the "just write the code" part is getting commoditized.

What Happens Next

I don't think we're going back to waterfall. Waterfall failed because of long feedback loops and the rigidity that comes from high change costs. AI dramatically shortens feedback loops and reduces change costs. The conditions that made waterfall fail don't exist in the same way.

But pure "vibe coding" mode, just prompting agents with whatever's in your head and hoping for the best, doesn't scale to production systems with multiple contributors and long maintenance horizons. Fine for prototypes. Not fine for anything that needs to last.

What emerges is something new. Call it "agile specification" or "iterative TDD." The key characteristics:

More upfront thinking, but in smaller increments. You invest in defining what you want clearly, but for small pieces of work, a feature, a component, a sprint's worth of functionality. Not a 200-page document for a year-long project.
Tests as the authoritative spec. Natural language descriptions tell the agent what you want. Tests tell it exactly what success looks like. The tests are the part that can't be argued with.
Living documentation. Specs evolve as you learn. AGENTS.md files, updated acceptance criteria, revised architecture documents. The spec isn't a contract you sign once, it's a conversation you continue.
Rapid iteration with humans in the loop. Let the agent build fast. Review fast. Course-correct fast. Stay engaged with what's being produced rather than delegating and disappearing.

McKinsey research found that top performers in AI adoption are achieving 5-6x faster delivery, but only when they restructure how development works from the ground up, not when they just bolt AI tools onto existing processes.

The Better Question

I started this series with the question everyone's asking: "Are we going back to waterfall?"

Wrong question. Waterfall vs. agile was a debate about how to cope with the constraint that building software was expensive. That constraint is lifting. The debate needs to evolve.

The better question: How do we maintain the speed and adaptability of agile while raising the bar on specification quality that AI agents require?

My current answer: by treating specs as a conversation rather than a contract. By using tests as executable specifications that keep us honest. By investing in upfront thinking without expecting that upfront thinking to be perfect. By iterating fast enough that being wrong isn't catastrophic.

I don't have a clean answer for all of this yet. Neither does anyone I've talked to. But I'm increasingly convinced that the teams who learn to write better specs, and to write them faster, will be the ones who actually capture the productivity gains that AI tools promise.

The bottleneck has shifted. The question is whether we can shift with it.

What are you seeing on your teams? How are you thinking about spec-driven development? I'd love to hear what's working and what isn't — we're all learning as we go.

Jeremy Freeman is CTO at Allstacks, where he focuses on helping engineering teams navigate technological transitions and understand what metrics actually matter when the nature of work keeps changing.

Strategy & Thought Leadership

Content You May Also Like

Software Engineering Intelligence

Product Management

Software Capitalization

Take a product tour

Spec Quality Review

AI-Powered Deep Research

Software Cost Capitalization

GenAI Usage & Adoption

Predictable Software Delivery

Engineering Clarity

Engineering Frameworks (DORA, SPACE)

Developer Experience

Developer Productivity

Introduction

ROI Calculator

Case Studies

Blog

Podcast

Webinars & Events

Integrations

Security

Case Studies

How to Write Specs for AI Agents: TDD, Skills, and What Comes Next

TDD as a Superpower: Kent Beck on AI Agents

You Need to Write Better Specs, Faster

The Skills That Matter Now

What Happens Next

The Better Question

Content You May Also Like

Spec-Driven Development Isn't Waterfall: Why the AI Coding Bottleneck Changed Everything

AI Replacing Software Engineers? What’s Automated vs Human

Spec-Driven Development Isn't Waterfall: Why the AI Coding Bottleneck Changed Everything

Can't Get Enough Allstacks Content?

Sign up for our newsletter to get all the latest Allstacks articles, news, and insights delivered straight to your inbox.