What Is Spec-Driven Development? A Guide for Product and Engineering Teams

Spec-driven development means writing precise specifications before AI agents write code. See the workflow, the tools, and where teams fail.

Jeremy Freeman

Co-Founder & CTO

Date

June 18, 2026

The shift that made specs a control surface again

For two decades, "documentation first" was an aspiration most engineering teams ignored. Code was the fast path. Specs were slow. The cost of writing detailed requirements rarely justified the time, because a developer reading vague tickets could still infer intent from context, conversations, and iteration.

AI forced a shift in that flow. An agent reading a vague ticket does not ask a clarifying question. It wants to build and create, so it guesses, it commits, it generates plausible-looking code that solves the wrong problem. Teams running AI agents on incomplete specs have seen the same pattern repeat: faster output, more rework, and a growing pile of pull requests that look correct but do not match what the business asked for.

Requirements, in the traditional sense, are statements of business intent: what the user needs, what the feature must do. A modern spec for an AI agent is broader. It merges business requirements with the technical details needed to realize the feature in your existing codebase: data contracts, integration boundaries, error handling, architectural constraints. When an agent reads a modern spec, it has what it needs to produce a working implementation without having to guess at gaps. When it reads a traditional requirements document, it fabricates the technical specifics.

Specifications have therefore become the control surface for AI-assisted development. The quality of a spec now determines whether AI accelerates delivery or accelerates rework. This is why teams that were content with two-sentence Jira tickets in 2023 are writing structured, multi-section specifications in 2026.

What spec-driven development actually is

A specification in this context is a structured artifact written in natural language that describes the external behavior of a software system. It defines what the system does, not how it does it. It specifies inputs and outputs, preconditions and postconditions, invariants, edge cases, integration contracts, and the sequence of state transitions the system goes through.

The spec is one of two inputs that determine whether AI-assisted development produces good output. The other is the AI harness: the agent configuration, tool integrations, rules files, and connected system access that defines how the agent reads and acts on the spec. A strong spec in a weak harness produces inconsistent output. A weak spec in a strong harness produces confident but wrong output. Both matter. This article focuses on the spec side; harness design is adjacent and equally important.

Spec-driven development treats the specification as the source of truth. Code derives from the spec. Tests derive from the spec. Documentation derives from the spec. When the system needs to change, the spec changes first, and the code follows.

This may sound familiar to teams with long memories of waterfall. In the waterfall model, the requirements and specifications were typically a giant product definition, written before development began. However, that often went stale within weeks of development. Modern specs are different in structure and intent. They are scoped to features rather than systems, updated as the feature evolves, and written to answer the questions an AI agent will ask, not to satisfy a sign-off process. A ready spec for a single feature is closer to a well-defined sprint goal than to a waterfall requirements document. Iteration speed stays high because the feedback loop runs in minutes, with AI agents regenerating implementations from updated specs rather than waiting for the next planning cycle.

Spec-driven development is also different from vibe-coding, the practice Andrej Karpathy described in February 2025 of describing a goal to an AI assistant and accepting whatever code comes back. Vibe-coding works for throwaway prototypes and demos. It produces inconsistent results when teams build systems that other humans, other agents, or external dependencies have to reason about over months and years. The difference shows up most sharply at the senior engineering level. A senior engineer working in a spec-driven workflow does not describe features. They specify architecture: data contracts, integration boundaries, error behaviors, and concurrency assumptions. That level of specificity is what gives an agent enough context to produce production-ready code, or something really close to it, rather than a plausible-looking first draft that needs three rounds of rework.

The three levels of spec-driven development

Birgitta Böckeler, a Distinguished Engineer at Thoughtworks, published a taxonomy of three levels in the Exploring Gen AI series on martinfowler.com that has become the working vocabulary in the field. Each level represents a deeper commitment to making the specification a first-class artifact.

Spec-first. The team writes a thorough specification before any code gets generated. AI agents or developers use that spec to produce the initial implementation. Once the code ships, the spec is no longer maintained. This is the lowest-commitment level, and it captures most of the rework-reduction benefit on a per-feature basis. Most teams adopting spec-driven development start here.

Spec-anchored. The team writes the spec first, and keeps the spec maintained alongside the code. When requirements change, the spec is the first thing that gets updated. AI agents reading the codebase later are pointed at the spec, not just the code, to understand intent. This level supports long-term evolution and onboarding. Teams running production AI agents tend to settle at this level for systems with more than one engineering team touching the code.

Spec-as-source. The spec becomes the primary artifact that humans edit. Code is generated and regenerated from the spec, and humans rarely touch generated code directly. This is the most ambitious level. A small but growing number of teams operate here for self-contained services, internal tools, and greenfield projects. It is not yet practical for most production systems with significant legacy code.

The same levels map onto how engineering teams handle tickets. Most teams operate at the ticket equivalent of spec-first: the ticket captures intent, gets executed, and is never updated once the work closes. Spec-anchored at the ticket level would mean the ticket stays accurate through scope changes and lives as a reference artifact alongside the code it produced. Ticket-as-source, where tickets are the primary artifact that humans maintain and implementations regenerate from, is not common practice. The aspiration holds at both the feature spec and the ticket level; most teams are early in both.

The level a team operates at should match the system it is building and the risk profile of the work. A throwaway internal tool can stay at spec-first. A revenue-bearing API needs spec-anchored at minimum. A regulated workflow in a compliance-bound vertical may justify the investment to reach spec-as-source.

Anatomy of a spec that AI agents can build from

A spec is more useful in proportion to how directly it answers the questions an AI agent has to answer when generating code. The following six elements appear in nearly every successful spec-driven workflow.

Goal statement. One paragraph that describes what the user can do once this feature ships, and why that matters to the business. This anchors the spec to outcome rather than implementation.
Functional requirements. A list of behaviors the system must support, written in declarative form. Each requirement specifies an input shape, an expected output, and the conditions under which the behavior fires.
Non-functional requirements. Performance ceilings, availability targets, security constraints, accessibility expectations. These are the constraints AI agents miss most often when they are not made explicit.
Integration contracts. Every external dependency the system reads from or writes to, with the schema, the error modes, and the retry behavior. AI agents will fabricate API shapes if the contract is not specified.
Edge cases and failure modes. What happens when input is malformed, when the upstream service times out, when the user cancels mid-operation, when concurrent writes arrive. Edge cases are where AI-generated code most often produces plausible-looking but wrong implementations.
Acceptance criteria. Concrete, testable statements of what done looks like. These map directly to the test cases an agent will generate or a reviewer will check against.

A spec that contains all six elements is what some teams call a "ready" spec. A spec missing two or more is what they call a "draft." The distinction matters because ready specs produce code that survives review. Draft specs produce code that needs rework.

The spec-driven workflow, step by step

A mature spec-driven development cycle has five stages. Most teams run them with a mix of human authors, AI authoring assistants, and AI coding agents.

Draft. A product manager or engineer writes the first version of the spec, often in collaboration with an AI authoring assistant that suggests acceptance criteria and edge cases based on similar past work.
Validate readiness. Before the spec moves to development, it gets evaluated against a readiness checklist. Are all six elements present? Are integration contracts specified? Have edge cases been enumerated? Teams running mature spec-driven workflows now use scored readiness gates to automate this check, because human reviewers miss the same categories of gaps repeatedly.
Implement. AI coding agents or developers build against the validated spec. Because the spec is precise, the implementation work shifts from discovery to translation. GitHub's controlled trial of Copilot found AI-assisted developers complete coding tasks 55% faster than the unassisted control group (arxiv 2302.06590, 95% CI 21-89%). The spec is what keeps that raw speed from producing rework, because it defines what done means before the agent starts.
Review. Code review compares the implementation back to the spec. Tests are generated from the acceptance criteria. Anything that does not trace back to the spec gets flagged.
Evolve. When requirements change, the spec gets updated first. The implementation regenerates or gets modified to match. The spec stays the source of truth across the lifetime of the feature.

The discipline of spec-driven development is mostly in stages two and five. Drafting a spec is the easy part. Keeping it ready and keeping it current are where teams either succeed or relapse into ticket-driven work.

Where teams trip up

The risks above affect teams even when the practice is implemented correctly. They are structural constraints the category has not fully solved. The failure modes below are adoption problems: patterns that come from how teams implement the practice, not from the practice itself.

The spec stops at goals. Teams write the goal statement and the functional requirements, then declare the spec complete. They skip the integration contracts and edge cases, which is where AI agents make the most expensive mistakes.

The spec lives in a different system from the work. Specs in a Notion doc that nobody links from the Jira ticket get ignored. The spec has to live where the implementation work lives, or the implementation work will diverge from it.

The spec is treated as a one-time artifact. Teams write the spec, ship the feature, and never update the spec when requirements change. Six months later, the spec is fiction. The team is back to ticket-driven work without realizing it.

Readiness is a vibe check, not a measurement. Senior engineers can spot a thin spec, but they do so inconsistently and only when they have time. Without a scored readiness rubric, the same kinds of gaps slip through repeatedly.

The team optimizes for spec volume, not spec quality. A team writing 30 thin specs per sprint produces more rework than a team writing 8 ready specs per sprint. Volume without quality multiplies the AI rework tax rather than reducing it.

Avoiding these failure modes is the difference between a team that gets the productivity gains of spec-driven development and a team that adds documentation overhead without the upside.

What the practice changes about team composition

Spec-driven development shifts which roles do the most valuable work, and engineering leaders adopting the practice should plan for those shifts before they show up in performance reviews.

Product managers will move faster and manage more information upstream. The PM is now writing acceptance criteria that AI agents will build against, which is closer to engineering than the PM role has traditionally been. PMs working in spec-driven teams need stronger fluency in API contracts, error modes, and edge cases. The spec is also now the shared artifact between product and engineering, instead of a requirements document handed over the fence before engineering writes a separate technical design. One document, co-owned, is the working model. Teams that try to maintain separate product requirements and engineering technical specs produce gaps in exactly the places that matter most for agents.

Senior-level, staff, and principal engineers shift from coding to defining architecture and patterns. The engineers who used to spend their time writing the hardest parts of code and mentoring the juniors on the team now spend it on architecture and patterns that are used to shape and review specs. That is the highest-leverage work in the cycle: a staff engineer who can articulate the things that need to be represented in specs, that consequently help catch a bad spec before the agent runs, is worth twenty PR reviews after the fact. Mentorship shifts accordingly.

Mid-level engineers spend more time on translation, less on discovery. When the spec is precise, the implementation work is closer to translation than design. This narrows the value spread between mid-level and senior engineers on pure coding work. Hiring plans should account for that narrowing, because the IC pipeline that worked in 2023 assumes a discovery-heavy implementation phase that the practice removes.

Quality engineering moves earlier, and test-driven development comes back. Tests derived from acceptance criteria are written before code exists, which means QA engineers are reviewing specs, not just shipped builds. This is also where TDD re-emerges as a relevant practice in AI-assisted teams. A spec paired with a test suite gives an agent a concrete and verifiable definition of done: implement the code to make these tests pass. That specificity reduces interpretation drift more reliably than natural language specs alone, because agents can misinterpret ambiguous language. However, it’s not foolproof: agents can optimize for passing tests while missing broader behavior the spec intended. Strong engineering review or a well-configured agent harness is what catches the difference.

The hiring plan implication: the staff+ engineer job description, the PM job description, and the QA job description all need updating before the team scales the practice past the first squad.

The spec-driven development tool landscape

The tooling category has expanded fast since 2024. The tools cluster into three functional layers, and most teams adopting spec-driven development end up using one tool from each.

Spec authoring. Many individuals and teams start with a frontier LLM, connected to MCP servers, a framework, with a context library including agents.md and rules files. That approach can certainly get you started with specs that sound great, but don’t scale for the teams.

Tools that help write the specification itself, often with AI assistance are also on the rise. GitHub Spec Kit, Kiro, BMAD-METHOD, and OpenSpec all sit in this layer. They generate the structured artifact that becomes the spec, each with different workflow assumptions: Spec Kit is a CLI toolkit designed for teams already using GitHub Copilot; Kiro is an agentic IDE with a three-phase spec-to-code workflow; BMAD-METHOD orchestrates a full set of agent personas across the SDLC rather than focusing on the spec stage alone.

Spec readiness and stress-testing. Tools that evaluate whether a spec is ready before engineering commits to building. This layer is newer than authoring. Several authoring tools include lightweight readiness checks of their own, which is part of why a dedicated layer has emerged: the authoring tool optimizes for getting the spec written, and a separate evaluator can hold the spec to a higher bar without slowing the drafting loop.

Spec governance. Tools that track how specs are performing across an organization: which teams produce ready specs consistently, which features ship without rework, which patterns of spec writing correlate with downstream outcomes. This layer is the least developed of the three. Most teams doing governance work are doing it manually: tracking rework rates against spec quality observations in spreadsheets or engineering metrics dashboards. As SDD adoption matures, governance tooling will likely consolidate around SDLC intelligence platforms that already track delivery data, since spec quality signals are a subset of delivery outcome signals.

The three layers complement each other. Authoring tools produce the spec. Readiness tools stress-test it. Governance tools measure whether the practice is paying off.

Allstacks Product Studio combines each of these to help ideate, define, refine, and share context-aware specs in spec-driven development workflows.

When spec-driven development is the wrong tool

Spec-driven development produces real gains for the feature work described throughout this article. It also adds friction that does not pay off in every category of work. Four cases where the spec is the wrong tool:

Research and prototyping. When the goal is to learn whether an idea is worth building, the spec is a tax on learning. Throwaway prototypes and timeboxed spikes belong outside the spec-driven workflow.

Exploratory UI work. Visual and interaction design that is still being iterated through Figma or live coding does not benefit from a structured spec until the design has settled. Locking acceptance criteria around an unsettled design produces a spec the team rewrites three times before anything ships.

ML model training and data pipelines. The behavior the spec is trying to lock down is statistical, not deterministic. Spec-driven workflows assume a system whose correct behavior can be enumerated. ML pipelines need evaluation harnesses and dataset governance more than they need behavior specs.

Production hotfixes. A bug that is paging the on-call rotation does not get a spec. It gets a triage note and a postmortem.

The framing that holds across these cases: spec-driven development pays off when the cost of writing the spec is lower than the cost of the rework the spec prevents. When the rework cost is already low (prototypes), already inevitable (exploratory work), or already governed by other artifacts (ML, hotfixes), the spec adds drag without removing risk.

How to start spec-driven development on a real team

The first move that produces visible results inside a quarter is to apply spec-driven development to one team and one type of work. Pick a single squad working on greenfield features rather than legacy maintenance. Greenfield work has fewer integration constraints and lets the team practice the six-element rubric without fighting an existing codebase.

Stand up a readiness checklist that lives inside the ticket workflow, not outside it. The checklist should be answerable in two minutes. If it takes longer, the team will skip it. Score each spec before it moves to development. Track the readiness score against rework rates over the next eight weeks.

Then expand. Once one team is producing ready specs consistently, the practice transfers to other teams faster than the first adoption took. The hardest part is building the muscle of stopping at the readiness gate. Once that muscle exists, the tools and the workflow can be extended without rebuilding from scratch.

For teams that want to produce a first ready spec before investing in tooling, this approach works:

Take an AI assistant (Claude, ChatGPT, or similar) and give it access to your internal systems: codebase, past tickets, architecture docs, API documentation. The richer the context, the better the spec output.
Give it a structure to fill: the six elements above, in order. The structure enforces completeness, and the AI fills in the content from context.
Use this prompt as a starting point: “You are an expert product manager and software architect working on [product]. You are developing a feature that [description]. Research this feature and produce a spec covering goal, functional requirements, non-functional requirements, integration contracts, edge cases, and acceptance criteria. Reference prior art in our codebase and systems wherever it exists. If no prior art exists, identify and cite state-of-the-art approaches. Do not make assumptions. For any assertion you cannot support with evidence, ask rather than guess.”

That combination of AI assistant, internal context access, the six-element structure, and an evidence-first prompt produces a first spec that is reviewable in the same session.

Table of contents

Toc link here

Product Management KPIs That Actually Matter (and the New Ones AI Added)

A product manager's guide to the KPIs that matter, the accountability model behind them, and the AI product metrics most KPI lists still leave out.

The Product Manager's Context Window in an AI-Native World

The AI-native product manager's real job is in judgement, taste, and the why's that AI can't replace. The challenge is capturing and communicating that context. Here's what we mean.

Flow Distribution: The Flow Metric Product Managers Own

Engineering owns speed, and DORA measures it well. But there's one flow metric that belongs to product managers alone, and it's the only one that answers whether you built the right thing.

/ get started /

See it on your stack.

30-minute demo. Your tools connected. Real specs running through it before you leave the call.

Book a demo