Strategy & Thought Leadership

AGENTS.md Files: The Research Says You're Probably Doing Them Wrong

Over 60,000 repos have context files for AI agents. New research says they're often too long, too vague, and actively making your agent worse. The fix isn't deletion—it's discipline.

Gustavo Liendecker
Frontend Developer @ Allstacks
·
March 2, 2026

You've probably seen it in a few repos lately. Maybe you've added one yourself. A file called AGENTS.md, or CLAUDE.md, or .cursorrules sitting at the root of your codebase, telling your AI coding agent how to behave. What conventions to follow. What to avoid. How to think about the project.

It feels like the right move. Like writing a great onboarding doc for a new developer, except the new developer is an AI that forgets everything the moment the context window closes.

The practice has taken off fast. Over 60,000 open-source projects already include some version of these context files. The format is now backed by OpenAI, Google, Cognition, and others, and is being stewarded by the Linux Foundation's Agentic AI Foundation. Cursor, Claude Code, GitHub Copilot, Codex... every major AI coding tool either recommends or officially supports context files. It's become the default assumption: if you're serious about AI-assisted development, you write the file.

But a paper published on arxiv in February 2026 just threw some cold water on all of it. And if you're an engineering leader who's been betting on context files to improve your team's AI ROI, it's worth slowing down for five minutes.


What Does the Research Say About AGENTS.md Effectiveness?

A recent study from Gloaguen et al. (published on arXiv, February 2026) examined how AI coding agents perform with and without context files. The results surprised the community.

Context files reduced task success rates. Not dramatically in every case, but consistently enough to be a pattern. And they increased AI inference costs by more than 20%. That's real money if you're running agents at scale.

The mechanism makes intuitive sense once you see it: context files bloat the prompt. Every token of instructions you add is a token the model has to process, attend to, and reason about before it even gets to your actual task. And if those instructions are poorly targeted (vague guidance, irrelevant constraints, or requirements that don't apply to the task at hand) you're not just wasting money. You're actively making the model's job harder.

The paper's prescription: human-written context files should be minimal. Surgical. Focused on what the agent genuinely cannot infer from the codebase itself.


Why Do Context Files Go Stale?

The arxiv findings are about context files in general. But there's a compounding problem that the research doesn't fully capture: most context files go stale.

Your codebase is a living thing. Dependencies change. Architecture evolves. The team switches from one testing framework to another. A library gets deprecated. Someone refactors the entire data layer. You hire three new engineers who bring different conventions with them.

Your AGENTS.md file? It got written once, probably during a sprint when someone had good intentions, and hasn't been touched since.

Now imagine your AI agent diligently following instructions that reference a folder structure that no longer exists, conventions you abandoned six months ago, or a database ORM you migrated away from last quarter. It's not just unhelpful—it's actively misleading. The agent is confidently doing the wrong thing because you told it to.

We're still in the early innings of AI-assisted development, and most teams haven't built any process around maintaining these files. They're treated like README files: written once, forgotten forever. But an outdated README just confuses new hires. An outdated AGENTS.md can send your AI agent confidently down the wrong path on every single task.


Should You Delete Your AGENTS.md File?

No. 

The research doesn't say context files are useless. It says they're often too long, too vague, and packed with unnecessary requirements that add noise without adding signal. The problem isn't the concept. It's the execution.

There's a version of AGENTS.md that genuinely helps: a tight, focused document that surfaces genuinely non-obvious information the agent can't infer from the code itself. Things like: the custom linting rules that aren't in the config yet, the quirky deploy constraint that lives in someone's head, the one place where the naming convention breaks from the standard for a specific historical reason.

That file is worth having. It's probably 10-20 lines, not 200.

What doesn't help (and what the research confirms) is a sprawling document that tries to re-document things the agent can already figure out, adds requirements that don't apply to most tasks, and grows unbounded over time because no one has a clear owner or review process.

The fix is discipline, not deletion. Write less. Be specific. Review it like you review code... because it is code, effectively. And if you haven't looked at yours in three months, look at it now.


How Do You Know If Your AI Tooling Is Actually Working?

Here's the thing that should actually bother engineering leaders: most teams have no idea whether their context files are helping or hurting.

You can't feel a 20% increase in inference costs at the task level. You can't see that your AI agent's success rate dropped because your AGENTS.md is full of stale instructions. You won't notice that a refactor last month invalidated half the guidance in the file. These are invisible inefficiencies, and they compound quietly.

This is the broader pattern: teams are adopting AI coding tools and assuming the productivity gains are real, but very few are actually measuring them. Are AI-generated PRs getting merged faster or slower than human-written ones? Are tasks completed with agent assistance taking more or fewer review cycles? Is your AI tooling actually moving the needle on throughput, or is it just creating more noise for reviewers to manage?

Jeremy Freeman, CTO at Allstacks, sees this pattern constantly with engineering teams:

"The problem isn't the tool. It's not knowing whether the tool is actually helping. Teams adopt AI coding assistants and assume the gains are real, but they're not measuring what matters. Are PRs getting merged faster? Is cycle time improving? Is the agent creating more rework? If you can't answer those questions, you're flying blind on one of your biggest engineering investments."

If you can't answer those questions, you're flying blind. Not just on context files, but on your entire AI investment.

Prominent voices in the dev community are starting to push back on AI tooling hype in general. The question isn't whether these tools are impressive demos. The question is whether they're improving your team's actual delivery metrics. That requires data, not vibes.

Allstacks is built for exactly this kind of visibility: connecting the signals from your engineering tools to the outcomes that actually matter. Delivery velocity, cycle time, PR health, team throughput. If your AI agent context files are helping, you should be able to see it. If they're not, you should know that too.


Best Practices for AGENTS.md Files in 2026

Context files like AGENTS.md went from niche practice to industry standard in less than a year. That's fast, even by tech standards. And like most things that move that fast, the execution got ahead of the understanding.

The research says: keep them minimal. The real world says: keep them current. And basic engineering sense says: measure whether they're actually working.

If you haven't audited your context files recently, do it this week. Cut anything that isn't genuinely non-obvious. Set a calendar reminder to review it every time you do a major refactor. And if you're serious about AI ROI on your team, make sure you have the visibility to actually know whether it's real.

You wrote the file. Now make sure it's helping.


Want to understand whether your AI tooling is actually moving the needle on engineering delivery? Talk to the Allstacks team.

Gustavo Leindecker Pereira is a Senior Frontend Engineer and UX/UI Designer with nearly 20 years of experience building scalable, high-performance web applications. Based in Porto Alegre, Brazil, he combines strong engineering discipline with a design background to create accessible applications, modern frontend architectures, and business-critical solutions used across multiple products and teams. Currently a Senior Frontend Engineer at Allstacks, he specializes in the Vue ecosystem, mentors developers, and collaborates closely with product and design teams to deliver impactful digital experiences.

Content You May Also Like

What "Senior Engineer" Means in an AI-Native World

AI won't replace senior engineers—but it will redefine what senior means. Four capabilities that matter now.
Read More

Why Some Engineers Will Struggle with AI — And It's Not About Technical Skill

Research shows technical skill barely predicts who struggles with AI—professional identity does. Here's why some engineers will thrive and others...
Read More

The Training Data Problem No One Is Talking About

AI coding tools were trained on Stack Overflow's 23M questions—but that community has stopped contributing. Here's the training data paradox no one...
Read More

Can't Get Enough Allstacks Content?

Sign up for our newsletter to get all the latest Allstacks articles, news, and insights delivered straight to your inbox.