The AI Productivity Gap Most Engineering Leaders Can't See

The sprint review was a success. Velocity up 22%. Deployment frequency doubled. The CTO looked at the charts and nodded.

What the dashboard didn't show: three junior engineers on that team spent most of the past sprint approving AI-generated pull requests in under four minutes each. Not because the code was clearly good. Because they couldn't tell whether it was bad. They'd been writing code for eight months. The AI had been writing code for eight months too. Nobody measured the difference.

This is the AI productivity gap engineering teams aren't seeing. It is growing. Mark Russinovich and Scott Hanselman didn't write their recent Communications of the ACM paper as a warning. They wrote it as a diagnosis. Agentic AI, they argue, has "fractured the economics of software development": not by making everyone better, but by making the gap between senior and junior engineers measurably wider.

The story being told in board meetings and investor decks is one of uniform uplift: that engineering productivity with AI tools rises across the board, that developers move faster, and that faster developers make teams more productive. The data says something different.

AI Is Compounding Seniority While Stalling Everyone Else

Senior engineers get a compounding advantage. They have the codebase context, the architectural judgment, and the domain knowledge to steer AI output, catch its failures, and integrate its work into systems that will hold up under production load. AI amplifies what they already know.

The numbers reflect it. A July 2025 survey of 791 professional developers by Fastly found that senior engineers with ten or more years of experience are nearly 2.5 times more likely than junior engineers (zero to two years) to have more than half their shipped code be AI-generated: 32 percent versus 13 percent. Senior engineers are also twice as likely to describe AI as making them "a lot faster": 26 percent versus 13 percent.

Early-career developers face something closer to the opposite. McKinsey research found that developers with less than one year of experience actually took 7 to 10 percent longer on coding tasks when using AI tools than without them. Google Cloud's DORA ROI of AI-Assisted Software Development report names the mechanism: what looks like saved time is frequently replaced by a "verification tax." DORA defines it as "the cognitive load required to iterate on prompts and rigorously audit AI-generated code that looks remarkably similar to correct code." The scale of this shift is measurable: in a way they used to spend 70% of their time coding and the rest reviewing. Now they spend 30% specifying, and 70% reviewing what AI produces. For a senior engineer, discharging that tax takes minutes. They've seen enough production failures to pattern-match the risk. For an engineer eight months in, the same audit can take an hour, or skipped entirely.

The result is a two-tier productivity curve. The top of the curve pulls away. The bottom stalls, or struggles.

Why Standard Engineering Metrics Can't See the Split

Here is the problem: this bifurcation is nearly invisible in standard engineering analytics.

Consider two engineers on the same team. Daniel has been building distributed systems for a decade. He uses an AI coding agent to scaffold a complex event-sourcing migration, reviews every method signature, catches three hallucinated function calls before they reach CI, and ships the feature two days ahead of schedule. He knows which AI outputs to trust and which to interrogate.

Across the team, Keisha has been an engineer for nine months. She uses the same tools, generates the same volume of code, and passes the same CI checks. But when the AI produces an approach she doesn't recognize, she searches Stack Overflow to confirm it looks reasonable, then merges it if nothing obviously contradicts it. Both engineers have green DORA metrics. Both have strong sprint velocity. The gap between them isn't in the numbers. It's in what each of them will know in three years.

Team-level velocity metrics don't surface this split. If your senior engineers are significantly more productive and your junior engineers are slower, aggregate velocity looks fine, maybe even improved. Deployment frequency doesn't break down by engineer tier. Cycle time metrics aggregate across the team. PR throughput counts commits, not the judgment behind them. The actual AI impact on engineering productivity (who benefits, who doesn't, and by how much) disappears into a single team-level number.

Google Cloud's DORA research provides the data that makes this concrete. Across its study of AI adoption effects on key engineering outcomes, the second-largest measured impact (behind only individual effectiveness) was an increase in software delivery instability. Not throughput. Not quality. Instability. The DORA ROI report explains why: "AI-assisted coding can increase the volume and velocity of code generation, overwhelming existing deployment pipelines and manual review gates." A developer might write code significantly faster, the report notes, but "that code simply piles up in front of manual security reviews or brittle deployment pipelines. Those localized gains are lost to downstream chaos."

Most teams interpret that downstream chaos as a systems problem: flaky CI, overloaded reviewers, a pipeline that can't keep up. That interpretation is correct as far as it goes. What it misses is the talent-layer cause: a bifurcating team where the engineers generating code faster are not the same engineers absorbing the instability tax downstream.

What Russinovich and Hanselman Propose, and Why It's Not Enough on Its Own

The paper's proposed solution is structural: a "preceptor-based organization" where senior engineers are explicitly paired with early-career developers to direct AI coding agents together. The senior engineer provides steering and context, the AI provides acceleration, and the junior engineer learns by doing alongside both.

It's the right instinct. And it can work.

A senior engineer at a payments company started blocking one hour every Wednesday with a junior on her team, not for code review after the fact, but for what she calls "steering sessions." They open the AI tool together, she describes the problem, and they work through the prompts collaboratively. When the AI generates something plausible but architecturally wrong, she explains why, in real time, in context. Three months in, the junior engineer started catching those same errors independently. The learning loop that AI tooling had nearly eliminated was deliberately rebuilt. The cost was four hours a month of one senior engineer's time.

The problem is that this model requires senior engineers to treat early-career mentorship as a primary organizational investment at exactly the moment when AI is making it easier to route around that investment entirely. The economic incentive runs in the wrong direction.

And even if an organization commits to the preceptor model, it still faces a measurement problem. How do you know which teams are successfully developing junior engineer capability versus which are just generating code? How do you see whether the productivity split is widening or narrowing over time? How do you identify the teams where the junior developer experience has collapsed into rubber-stamping AI output with no learning attached?

You can't answer those questions with deployment frequency, cycle time, or PR throughput. You need visibility into what's actually happening at the behavioral level.

The Measurement Layer This Requires

What Russinovich and Hanselman describe as an organizational design problem also has an engineering intelligence dimension. The bifurcation becomes visible and therefore addressable when you can see individual and team-level behavioral patterns that aggregate metrics obscure.

Where is rework concentrated? Which engineers are doing the most post-merge corrections? Where are spec-to-delivery quality gaps forming? Which teams have the widest variance in individual contribution patterns? Those are the signals that tell you whether AI adoption is compounding your senior engineers' capability, stalling your junior engineers' development, or both, before it shows up as a retention problem or a missed release.

The Allstacks platform operates at that level, surfacing the behavioral signals that reveal where delivery constraints are forming and where the gap between expected and actual contribution patterns is widening. The preceptor model Russinovich and Hanselman propose is the right human response to AI bifurcation. The engineering intelligence layer that makes that model measurable and sustainable is how you close the loop and know whether it's working.

See how engineering teams use Allstacks to track AI's real impact on developer growth →

Frequently Asked Questions

Does AI improve engineering team productivity equally across experience levels?

No. The gap is wider than most organizations expect. Research consistently shows that AI productivity gains are strongly stratified by seniority. Senior engineers with deep codebase context absorb AI's benefits directly; they use the tools to accelerate decisions they already know how to make. Early-career developers often absorb the costs instead. Without the contextual knowledge to discharge those costs efficiently, they face the verification tax, the review burden, and the overhead of prompting. McKinsey found that developers with less than one year of experience took 7 to 10 percent longer on coding tasks when using AI tools than without them. The tools don't discriminate. They amplify what's already there.

What metrics actually reveal the senior-junior AI productivity gap?

Standard DORA metrics don't. Deployment frequency, cycle time, and change failure rate aggregate across the team and blend two very different performance curves into a single number. The signals that surface the two-tier engineering team AI dynamic are behavioral and individual: rework concentration (which engineers are making the most post-merge corrections?), contribution variance within experience tiers, review queue distribution by engineer (are certain reviewers processing AI-generated code far more slowly than others?), and spec-to-delivery quality gaps. These patterns require individual-level visibility that team-level dashboards are not built to surface.

What is the verification tax in software development?

The verification tax is Google Cloud DORA's term for a hidden cost of AI-assisted coding: the cognitive overhead of auditing AI-generated code that looks correct but may not be. The time saved on writing is frequently offset by the time required to evaluate the output: iterating on prompts, checking for hallucinations, and confirming the generated approach fits the system's architecture and security requirements. DORA identified this as a primary reason AI adoption's productivity gains are smaller than expected, and why they accrue unevenly. Senior engineers discharge the verification tax quickly because they have the pattern recognition to evaluate AI output in seconds. Junior engineers face the same tax without the same equipment.

How can engineering leaders measure the AI productivity gap on their team?

Start by separating individual contribution data from team-level aggregates. The goal is to identify whether senior and junior engineers are experiencing AI adoption differently, not whether the team as a whole is moving faster. Look for widening variance in PR throughput within experience tiers, rework concentrated in a specific subset of engineers, growing review queue time on AI-generated code, and divergence between velocity metrics and delivery quality. Most engineering analytics platforms don't surface this by default because they roll everything up to the team level. Measuring the AI productivity gap requires breaking contribution patterns down by individual and tier; that granularity is what makes the bifurcation visible before it becomes a retention problem.

Software Engineering Intelligence

Product Management

Software Capitalization

Take a product tour

Spec Quality Review

AI-Powered Deep Research

Software Cost Capitalization

GenAI Usage & Adoption

Predictable Software Delivery

Engineering Clarity

Engineering Frameworks (DORA, SPACE)

Developer Experience

Developer Productivity

Introduction

ROI Calculator

Case Studies

Blog

Podcast

Webinars & Events

Integrations

Security

Case Studies

The AI Productivity Gap Most Engineering Leaders Can't See

AI Is Compounding Seniority While Stalling Everyone Else

Why Standard Engineering Metrics Can't See the Split

What Russinovich and Hanselman Propose, and Why It's Not Enough on Its Own

The Measurement Layer This Requires

Frequently Asked Questions

Does AI improve engineering team productivity equally across experience levels?

What metrics actually reveal the senior-junior AI productivity gap?

What is the verification tax in software development?

How can engineering leaders measure the AI productivity gap on their team?

Content You May Also Like

The 2030 Software Engineering Team Structure for an AI-Native World

AI Replacing Software Engineers? What’s Automated vs Human

Engineering Leadership In the AI Era: 4 Rethinks To Avoid AI Slop

Can't Get Enough Allstacks Content?

Sign up for our newsletter to get all the latest Allstacks articles, news, and insights delivered straight to your inbox.