On-demand exhaustive AI-analysis
Complete visibility into time & dollars spent
Create meaningful reports and dashboards
Track and forecast all deliverables
Create and share developer surveys
Align and track development costs
Here's what the latest DORA research tells us: individual developer productivity is up. Great news, right?
Except throughput isn't. And code instability? That's actually increasing.
We're equipping developers with Copilot, Claude, Cursor, and a dozen other tools. They're writing code faster than ever. And yet, the actual delivery of value to customers hasn't improved. In some cases, it's gotten worse.
I see this with our customers all the time. Just yesterday, I was talking with my CEO about our own implementation of CodeRabbit for code reviews. The developers love it. QA thinks it's great. But when we look at the metrics—actual quality outcomes—nothing's changed.
So what's happening here?
Chris used a metaphor that I keep coming back to: imagine looking at the back of a Swiss watch, watching all the gears turn. You pick one tiny gear and ask, "Why isn't this one moving faster?"
It can't. It's completely dependent on every other gear around it.
That's what we're doing with AI coding assistants. We're pouring massive investment into making one gear spin faster while ignoring the entire mechanism—the backlog process, the refinement cycles, the deployment pipeline, the feedback loops.
"If you just automate or speed up the things that are already broken, you're just going to create more problems." — Chris Condo
Or as Chris put it another way: you're connecting a tractor to a horse and donkey and wondering why it's not plowing any better.
This is where the conversation got interesting. Chris is now at Equal Experts, working directly with engineering teams in the trenches. And they're seeing something different emerge—not "AI-assisted development," but genuinely AI-native software delivery.
The distinction matters:
AI-assisted = Your existing process, now with a chatbot to help developers code faster.
AI-native = The entire software delivery lifecycle reimagined with AI embedded at every step.
Think about what that means in practice:
Backlog curation — AI synthesizing customer feedback, support tickets, and usage data into prioritized work items. What takes weeks of refinement could happen in days.
Adversarial testing — Not happy-path testing that developers default to. Chris made the point that you can ask AI to "take an adversarial approach—test this code like you're trying to hack it." AI can adopt that persona in ways humans rarely sustain.
Production correlation — AI that connects what's in your pipeline with what's happening in production logs. Finding patterns humans would never catch.
Chris shared a story from his Microsoft days that illustrates this perfectly. His team was combing through gigabytes of log files and kept dismissing what looked like random 400 errors—needles in a haystack. Turns out, they had a whole haystack of needles. The errors were happening constantly, triggered by specific conditions they couldn't see. They only discovered it by building custom tools to analyze patterns at scale.
Today? AI can find those patterns in minutes. But only if you're pointing it at the whole system—not just the coding step.
Here's something Chris said that I think is underappreciated: maybe the chat interface is fundamentally wrong for AI in software delivery.
Someone at QCon recently posted a diagram that captures this perfectly:
"Would you give your drunken best friend access to all of your code and your entire enterprise system? Then why are you doing it with AI?"
We've all had the experience. You ask an AI to write a function. You refine it. Refine it again. By the fifth revision, you realize it didn't actually improve what you had—it just added more stuff. The old code is still there, buried under new code.
The result? Massive check-ins that are hard to review. Bugs slipping through. Engineers frustrated because they're stuck in endless revision loops.
We're trying to make the same process go faster. But maybe we need a fundamentally different process.
The teams getting this right aren't treating AI as a faster typing assistant. They're treating it as a parallel intelligence that can evaluate, correlate, and synthesize across the entire value stream—things that are genuinely hard for humans to do at scale.
Let me tell you about a tools analysis I saw recently that tried to answer the question: "What's the ROI of our AI investment?"
Their approach? Calculate lines of code generated before and after AI adoption. Assign a dollar value to each line. Multiply.
My response: please don't.
Lines of code is not value. Never has been. And optimizing for it is how you end up with bloated codebases, increased instability, and zero improvement in what actually matters.
This is where value stream management finally earns its keep. Because if you're spending real money on AI tools, you need to answer questions like:
That's what measuring AI productivity should look like. Not "how many lines did the robot write?" but "did we deliver more value, faster, with higher quality?"
Here's where this is heading. LinkedIn recently restructured some teams around a radical premise: one product manager, one developer. That's it. The question becomes: how much can you deliver with AI doing the heavy lifting?
"There isn't going to be a fixed agile model anymore. The methodologies that worked for a decade are being disrupted. Every quarter brings new capabilities." — Chris Condo
The playbook is being rewritten in real time. The teams that win will be the ones that stay adaptive—not the ones clinging to processes designed for a pre-AI world.
If you're a VP of Engineering or CTO reading this and thinking, "Okay, but what do I actually do Monday morning?"—here's the path forward:
Stop asking which AI tool is best. Claude vs. Copilot vs. Cursor is a moment-in-time debate that will be obsolete in six months. Instead, ask: how do we re-engineer our organization to utilize AI end-to-end?
Start small but think big. Pick one feature. Map the entire journey from idea to production. Then ask: where could AI be involved at each step? Not just coding—curation, testing, deployment, monitoring, prioritization.
Measure outcomes, not activity. If your AI metrics are "lines of code generated" or "time saved in code review," you're measuring the wrong things. Measure value delivered. Measure cycle time across the full stream. Measure quality in production.
Accept that this changes everything. The organizations that figure out AI-native software delivery are going to outpace everyone still trying to optimize individual gears. The question isn't whether AI will transform how we build software. It already is. The question is whether you'll transform with it.
This article is based on my conversation with Chris Condo on the Stacked Sessions podcast. We went deeper on Microsoft war stories, why the chat interface might be a dead end, and what Equal Experts is seeing with teams that are actually making AI-native work.
Jeff Keyes is the Field CTO at Allstacks, where he helps engineering leaders connect software delivery metrics to business outcomes. Want to see how your teams are actually performing—not just how fast they're typing? Talk to us.