On-demand exhaustive AI-analysis
Complete visibility into time & dollars spent
Create meaningful reports and dashboards
Track and forecast all deliverables
Requirements building for agentic development
Align and track development costs
Spec-driven development means writing precise specifications before AI agents write code. See the workflow, the tools, and where teams fail.
For two decades, "documentation first" was an aspiration most engineering teams ignored. Code was the fast path. Specs were slow. The cost of writing detailed requirements rarely justified the time, because a developer reading vague tickets could still infer intent from context, conversations, and iteration.
AI forced a shift in that flow. An agent reading a vague ticket does not ask a clarifying question. It wants to build and create, so it guesses, it commits, it generates plausible-looking code that solves the wrong problem. Teams running AI agents on incomplete specs have seen the same pattern repeat: faster output, more rework, and a growing pile of pull requests that look correct but do not match what the business asked for.
Requirements, in the traditional sense, are statements of business intent: what the user needs, what the feature must do. A modern spec for an AI agent is broader. It merges business requirements with the technical details needed to realize the feature in your existing codebase: data contracts, integration boundaries, error handling, architectural constraints. When an agent reads a modern spec, it has what it needs to produce a working implementation without having to guess at gaps. When it reads a traditional requirements document, it fabricates the technical specifics.
Specifications have therefore become the control surface for AI-assisted development. The quality of a spec now determines whether AI accelerates delivery or accelerates rework. This is why teams that were content with two-sentence Jira tickets in 2023 are writing structured, multi-section specifications in 2026.
A specification in this context is a structured artifact written in natural language that describes the external behavior of a software system. It defines what the system does, not how it does it. It specifies inputs and outputs, preconditions and postconditions, invariants, edge cases, integration contracts, and the sequence of state transitions the system goes through.
The spec is one of two inputs that determine whether AI-assisted development produces good output. The other is the AI harness: the agent configuration, tool integrations, rules files, and connected system access that defines how the agent reads and acts on the spec. A strong spec in a weak harness produces inconsistent output. A weak spec in a strong harness produces confident but wrong output. Both matter. This article focuses on the spec side; harness design is adjacent and equally important.
Spec-driven development treats the specification as the source of truth. Code derives from the spec. Tests derive from the spec. Documentation derives from the spec. When the system needs to change, the spec changes first, and the code follows.
This may sound familiar to teams with long memories of waterfall. In the waterfall model, the requirements and specifications were typically a giant product definition, written before development began. However, that often went stale within weeks of development. Modern specs are different in structure and intent. They are scoped to features rather than systems, updated as the feature evolves, and written to answer the questions an AI agent will ask, not to satisfy a sign-off process. A ready spec for a single feature is closer to a well-defined sprint goal than to a waterfall requirements document. Iteration speed stays high because the feedback loop runs in minutes, with AI agents regenerating implementations from updated specs rather than waiting for the next planning cycle.
Spec-driven development is also different from vibe-coding, the practice Andrej Karpathy described in February 2025 of describing a goal to an AI assistant and accepting whatever code comes back. Vibe-coding works for throwaway prototypes and demos. It produces inconsistent results when teams build systems that other humans, other agents, or external dependencies have to reason about over months and years. The difference shows up most sharply at the senior engineering level. A senior engineer working in a spec-driven workflow does not describe features. They specify architecture: data contracts, integration boundaries, error behaviors, and concurrency assumptions. That level of specificity is what gives an agent enough context to produce production-ready code, or something really close to it, rather than a plausible-looking first draft that needs three rounds of rework.
Birgitta Böckeler, a Distinguished Engineer at Thoughtworks, published a taxonomy of three levels in the Exploring Gen AI series on martinfowler.com that has become the working vocabulary in the field. Each level represents a deeper commitment to making the specification a first-class artifact.
Spec-first. The team writes a thorough specification before any code gets generated. AI agents or developers use that spec to produce the initial implementation. Once the code ships, the spec is no longer maintained. This is the lowest-commitment level, and it captures most of the rework-reduction benefit on a per-feature basis. Most teams adopting spec-driven development start here.
Spec-anchored. The team writes the spec first, and keeps the spec maintained alongside the code. When requirements change, the spec is the first thing that gets updated. AI agents reading the codebase later are pointed at the spec, not just the code, to understand intent. This level supports long-term evolution and onboarding. Teams running production AI agents tend to settle at this level for systems with more than one engineering team touching the code.
Spec-as-source. The spec becomes the primary artifact that humans edit. Code is generated and regenerated from the spec, and humans rarely touch generated code directly. This is the most ambitious level. A small but growing number of teams operate here for self-contained services, internal tools, and greenfield projects. It is not yet practical for most production systems with significant legacy code.
The same levels map onto how engineering teams handle tickets. Most teams operate at the ticket equivalent of spec-first: the ticket captures intent, gets executed, and is never updated once the work closes. Spec-anchored at the ticket level would mean the ticket stays accurate through scope changes and lives as a reference artifact alongside the code it produced. Ticket-as-source, where tickets are the primary artifact that humans maintain and implementations regenerate from, is not common practice. The aspiration holds at both the feature spec and the ticket level; most teams are early in both.
The level a team operates at should match the system it is building and the risk profile of the work. A throwaway internal tool can stay at spec-first. A revenue-bearing API needs spec-anchored at minimum. A regulated workflow in a compliance-bound vertical may justify the investment to reach spec-as-source.
A spec is more useful in proportion to how directly it answers the questions an AI agent has to answer when generating code. The following six elements appear in nearly every successful spec-driven workflow.
A spec that contains all six elements is what some teams call a "ready" spec. A spec missing two or more is what they call a "draft." The distinction matters because ready specs produce code that survives review. Draft specs produce code that needs rework.
A mature spec-driven development cycle has five stages. Most teams run them with a mix of human authors, AI authoring assistants, and AI coding agents.
The discipline of spec-driven development is mostly in stages two and five. Drafting a spec is the easy part. Keeping it ready and keeping it current are where teams either succeed or relapse into ticket-driven work.
The risks above affect teams even when the practice is implemented correctly. They are structural constraints the category has not fully solved. The failure modes below are adoption problems: patterns that come from how teams implement the practice, not from the practice itself.
The spec stops at goals. Teams write the goal statement and the functional requirements, then declare the spec complete. They skip the integration contracts and edge cases, which is where AI agents make the most expensive mistakes.
The spec lives in a different system from the work. Specs in a Notion doc that nobody links from the Jira ticket get ignored. The spec has to live where the implementation work lives, or the implementation work will diverge from it.
The spec is treated as a one-time artifact. Teams write the spec, ship the feature, and never update the spec when requirements change. Six months later, the spec is fiction. The team is back to ticket-driven work without realizing it.
Readiness is a vibe check, not a measurement. Senior engineers can spot a thin spec, but they do so inconsistently and only when they have time. Without a scored readiness rubric, the same kinds of gaps slip through repeatedly.
The team optimizes for spec volume, not spec quality. A team writing 30 thin specs per sprint produces more rework than a team writing 8 ready specs per sprint. Volume without quality multiplies the AI rework tax rather than reducing it.
Avoiding these failure modes is the difference between a team that gets the productivity gains of spec-driven development and a team that adds documentation overhead without the upside.
Spec-driven development shifts which roles do the most valuable work, and engineering leaders adopting the practice should plan for those shifts before they show up in performance reviews.
Product managers will move faster and manage more information upstream. The PM is now writing acceptance criteria that AI agents will build against, which is closer to engineering than the PM role has traditionally been. PMs working in spec-driven teams need stronger fluency in API contracts, error modes, and edge cases. The spec is also now the shared artifact between product and engineering, instead of a requirements document handed over the fence before engineering writes a separate technical design. One document, co-owned, is the working model. Teams that try to maintain separate product requirements and engineering technical specs produce gaps in exactly the places that matter most for agents.
Senior-level, staff, and principal engineers shift from coding to defining architecture and patterns. The engineers who used to spend their time writing the hardest parts of code and mentoring the juniors on the team now spend it on architecture and patterns that are used to shape and review specs. That is the highest-leverage work in the cycle: a staff engineer who can articulate the things that need to be represented in specs, that consequently help catch a bad spec before the agent runs, is worth twenty PR reviews after the fact. Mentorship shifts accordingly.
Mid-level engineers spend more time on translation, less on discovery. When the spec is precise, the implementation work is closer to translation than design. This narrows the value spread between mid-level and senior engineers on pure coding work. Hiring plans should account for that narrowing, because the IC pipeline that worked in 2023 assumes a discovery-heavy implementation phase that the practice removes.
Quality engineering moves earlier, and test-driven development comes back. Tests derived from acceptance criteria are written before code exists, which means QA engineers are reviewing specs, not just shipped builds. This is also where TDD re-emerges as a relevant practice in AI-assisted teams. A spec paired with a test suite gives an agent a concrete and verifiable definition of done: implement the code to make these tests pass. That specificity reduces interpretation drift more reliably than natural language specs alone, because agents can misinterpret ambiguous language. However, it’s not foolproof: agents can optimize for passing tests while missing broader behavior the spec intended. Strong engineering review or a well-configured agent harness is what catches the difference.
The hiring plan implication: the staff+ engineer job description, the PM job description, and the QA job description all need updating before the team scales the practice past the first squad.
The tooling category has expanded fast since 2024. The tools cluster into three functional layers, and most teams adopting spec-driven development end up using one tool from each.
Spec authoring. Many individuals and teams start with a frontier LLM, connected to MCP servers, a framework, with a context library including agents.md and rules files. That approach can certainly get you started with specs that sound great, but don’t scale for the teams.
Tools that help write the specification itself, often with AI assistance are also on the rise. GitHub Spec Kit, Kiro, BMAD-METHOD, and OpenSpec all sit in this layer. They generate the structured artifact that becomes the spec, each with different workflow assumptions: Spec Kit is a CLI toolkit designed for teams already using GitHub Copilot; Kiro is an agentic IDE with a three-phase spec-to-code workflow; BMAD-METHOD orchestrates a full set of agent personas across the SDLC rather than focusing on the spec stage alone.
Spec readiness and stress-testing. Tools that evaluate whether a spec is ready before engineering commits to building. This layer is newer than authoring. Several authoring tools include lightweight readiness checks of their own, which is part of why a dedicated layer has emerged: the authoring tool optimizes for getting the spec written, and a separate evaluator can hold the spec to a higher bar without slowing the drafting loop.
Spec governance. Tools that track how specs are performing across an organization: which teams produce ready specs consistently, which features ship without rework, which patterns of spec writing correlate with downstream outcomes. This layer is the least developed of the three. Most teams doing governance work are doing it manually: tracking rework rates against spec quality observations in spreadsheets or engineering metrics dashboards. As SDD adoption matures, governance tooling will likely consolidate around SDLC intelligence platforms that already track delivery data, since spec quality signals are a subset of delivery outcome signals.
The three layers complement each other. Authoring tools produce the spec. Readiness tools stress-test it. Governance tools measure whether the practice is paying off.
Allstacks Product Studio combines each of these to help ideate, define, refine, and share context-aware specs in spec-driven development workflows.
Spec-driven development produces real gains for the feature work described throughout this article. It also adds friction that does not pay off in every category of work. Four cases where the spec is the wrong tool:
Research and prototyping. When the goal is to learn whether an idea is worth building, the spec is a tax on learning. Throwaway prototypes and timeboxed spikes belong outside the spec-driven workflow.
Exploratory UI work. Visual and interaction design that is still being iterated through Figma or live coding does not benefit from a structured spec until the design has settled. Locking acceptance criteria around an unsettled design produces a spec the team rewrites three times before anything ships.
ML model training and data pipelines. The behavior the spec is trying to lock down is statistical, not deterministic. Spec-driven workflows assume a system whose correct behavior can be enumerated. ML pipelines need evaluation harnesses and dataset governance more than they need behavior specs.
Production hotfixes. A bug that is paging the on-call rotation does not get a spec. It gets a triage note and a postmortem.
The framing that holds across these cases: spec-driven development pays off when the cost of writing the spec is lower than the cost of the rework the spec prevents. When the rework cost is already low (prototypes), already inevitable (exploratory work), or already governed by other artifacts (ML, hotfixes), the spec adds drag without removing risk.
The first move that produces visible results inside a quarter is to apply spec-driven development to one team and one type of work. Pick a single squad working on greenfield features rather than legacy maintenance. Greenfield work has fewer integration constraints and lets the team practice the six-element rubric without fighting an existing codebase.
Stand up a readiness checklist that lives inside the ticket workflow, not outside it. The checklist should be answerable in two minutes. If it takes longer, the team will skip it. Score each spec before it moves to development. Track the readiness score against rework rates over the next eight weeks.
Then expand. Once one team is producing ready specs consistently, the practice transfers to other teams faster than the first adoption took. The hardest part is building the muscle of stopping at the readiness gate. Once that muscle exists, the tools and the workflow can be extended without rebuilding from scratch.
For teams that want to produce a first ready spec before investing in tooling, this approach works:
That combination of AI assistant, internal context access, the six-element structure, and an evidence-first prompt produces a first spec that is reviewable in the same session.