All posts

LLMs Come and Go. Your Context Is Forever.

AI agents automated the 5% of development that was writing code. They multiplied the 70% that was understanding code. The fix isn't a better model or a better agent framework. It's structured context.

Your agents shipped more code last quarter than your entire team did the year before.

How much of it does anyone actually understand?

Six months ago, AI was a copilot. It suggested a line. You accepted or rejected. There was friction in that loop, and the friction was a feature: every accepted suggestion required a developer to comprehend it first. To trace the dependency. To check if it fit the pattern three files over. That moment of comprehension built mental models as a byproduct.

Now agents write entire features. They open pull requests. They refactor services you haven't looked at in months. The developer who would have built the mental model while writing the code... never writes the code.

You solved the production bottleneck. Congratulations. You created a comprehension crisis.

CODE OUTPUTmachine speedUNDERSTANDINGhuman speedWidening every sprint. Nobody is tracking it.

Developers already spent 70% of their time understanding existing code. Only 5% was actually writing it. AI agents automated the 5%. They multiplied the 70%.

The evidence was there before agents

A randomized controlled trial by METR, published in 2025, measured what happens when experienced developers use AI tools on real codebases. Not toy benchmarks. Not greenfield tasks. Real repositories with over a million lines, real architectural complexity, real cross-service dependencies.

The developers expected AI to make them 24% faster. The measured result? 19% slower.

Even after seeing the data, the developers in the study still believed AI had helped. They felt faster while going slower. A 43-point perception gap.

Greenfield task
55%
faster with AI on clean, isolated tasks
GitHub / Peng et al. 2023
vs
Real codebase
19%
slower on familiar repos with existing architecture
METR 2025 (randomized controlled trial)

The explanation was simple. On a clean, isolated task, AI is brilliant. Greenfield JavaScript function? 55% faster. That's real. But hand the same tool a codebase with a million lines, 40 microservices, and five years of architectural decisions baked into every file path... it can't see what matters. It guesses. It generates plausible code.

That was the copilot era. One suggestion at a time. One human evaluating each line. And even with that friction in the loop, the tools made developers slower on real codebases.

Now you're past copilots. Your agents are faster. Your sprint velocity is up. Your throughput numbers look great to the board. Nobody's arguing that. The question is what's accumulating underneath the velocity.

Agents don't suggest lines. They write whole features. They open pull requests across services they've never seen before. The same context blindness that made copilots 19% slower is now operating autonomously, at 10x the volume, with no human comprehension as a safety net.

GitClear analyzed 211 million lines of code and found an 8x increase in code duplication during 2024. The DORA report showed delivery stability dropped 7.2% with increased AI adoption. Stack Overflow's developer survey identified "lack of understanding of the codebase" as the number one challenge with AI tools. 73% of AI-generated completions compile locally but violate patterns elsewhere in the codebase.

The agents aren't broken. They're context-blind. And now they're writing all the code.

Here's what it looks like on a Wednesday morning. Agent A refactors a shared authentication utility in PR #247. Agent B, working two hours later, builds a new feature using the old auth pattern in PR #251. Your senior engineer reviews both. They're individually correct. She approves both. Three weeks later, a production incident traces back to the inconsistency. The postmortem takes two days. Nobody saw it coming because nobody had the full picture.

That's not a hypothetical. That's the shape of every agentic codebase incident we've seen. The code was correct. The context was missing.

THE COMPREHENSION CRISIScopilot eraagentic eracode outputunderstandingCode grows at machine speed. Understanding grows at human speed.

Before agents, the numbers were already bad.

42% of institutional knowledge lives only in people's heads
$47M lost per year per large company from bad knowledge transfer
23 hrs per developer per month spent just searching for answers
2 yrs average developer tenure before they leave with their context

Remember the 70/5 split. Now look at where that time actually goes:

Where developer time goes
70%
25%
5%
Understanding code Meetings, search, admin Actually editing code

The green sliver is what agents automated. The red bar is what they made worse.

Your senior engineers spend 30 to 40% of their time answering the same questions on repeat. They're $200K/year knowledge routers. New hires take 12 to 16 weeks to reach full productivity because nobody wrote down why things work the way they do.

When humans write code, they build mental models as a byproduct. Every function you write, you understand. Every dependency you wire up, you trace. The act of writing is the act of understanding. When agents write code, nobody builds the model. The code exists. The understanding doesn't.

And documentation won't save you. 90% of engineering knowledge is tacit. Never written down. Your agents shipped thousands of lines since the last doc update. Average developer tenure: two years. When they leave, their context leaves with them.

That's not a risk factor. That's a ticking clock.

You've already invested in the process

If you're reading this in 2026, you're not starting from zero. You've picked your models. GPT-4.1, Claude Opus, Gemini, maybe all three depending on the task. You've built harnesses, prompt chains that route tasks through multi-step workflows. You've wired up skills: code search, file editing, test running, deployment. You've built evals to measure agent output quality and catch regressions. You've standardized on an agent framework with orchestration, routing, and fallbacks. You've added guardrails for safety, cost controls, and human-in-the-loop review.

That's millions of dollars in process investment. The machinery around your agents is sophisticated.

But what are you feeding them?

Your agent has access to every tool. It can search code, read files, run tests, open pull requests. What it doesn't have is understanding. It doesn't know that the billing service was split from the payments service eight months ago and the shared types still live in the old package. It doesn't know that the "legacy" auth flow actually handles 60% of your enterprise customers and can't be deprecated. It doesn't know that the last time someone refactored the notification system without understanding the retry semantics, it caused a three-day incident.

All that process investment, every harness, every eval, every guardrail, is operating on blind input.

Process depreciates. Context compounds.

Models keep getting better. Nobody disputes that. GPT-5 is smarter than GPT-4. Opus 4.6 is smarter than Opus 3. But you've been waiting for the model that finally understands your codebase for three years now. Each release is a little better at reasoning. The problems haven't gone away. Your agents still duplicate code. They still miss cross-service dependencies. They still break patterns they can't see. The problem was never that the model wasn't smart enough. The problem is that your context was never structured enough for any model to use.

Over 80 major model releases from the top labs in three years. Not minor patches. Full new architectures, new capabilities, new pricing, new context windows. The model you built your agent workflows around last quarter is already mid-tier. The three frontier models your team relies on right now? Look at the chart. Every one of them is ticking.

MODEL SHELF LIFE2023202420252026frontierusers leftkilledtickingNOWGPT-414 moClaude 3 Opus3 moGPT-4o15 mo3.5 Sonnet4 moGPT-4.55 moGPT-53 moOpus 4.5?Opus 4.6?14 months → 3 months → 2 months. The shelf life is collapsing.

And the agent layer is commoditizing just as fast. Everyone gets the same harnesses. The same evals. The same frameworks. The agent framework you standardized on six months ago? Already being replaced. The custom integrations, the workflow automations, the tooling your team built around a specific agent's quirks... all of it depreciates on 3-to-6 month cycles. What's not commoditizing is the knowledge specific to your codebase.

PROCESS INVESTMENTSvaluetimenet: flatAlways starting overCONTEXT INVESTMENTvaluetimeAlways compounding

"As AI becomes more capable and agentic, models themselves become more of a commodity, and all value gets created by how you steer, ground, and finetune these models with your business data and workflow."

Satya Nadella, CEO, Microsoft

Context is the food your agents consume

Your agent is only as good as what you feed it. Structured codebase knowledge, architecture maps, dependency graphs, service boundaries, decision history, pattern libraries. The kind of knowledge that used to live only in your senior engineers' heads, made persistent, queryable, and available to every agent on every task.

Better context makes every other investment work harder. Your harnesses make smarter routing decisions. Your evals catch real issues instead of false positives. Your tools operate on the right files with the right patterns. Your agents write code that fits, not just code that compiles.

Take two teams. Same size. Same models. Same agent framework. Same IDE. Give both agents the same task: refactor the billing service.

Team A has the best stack money can buy and zero structured context. Their agent produces code that compiles, passes tests, and causes an incident two weeks later. It duplicated a utility that already existed in a shared package. It used the old auth pattern that was deprecated after the January migration. It didn't know.

Team B has the same stack and rich structured context. Their agent produces code that fits like a senior engineer wrote it. It used the existing shared utility. It followed the new auth pattern. It understood the service boundaries because it could see them.

Context isn't competing with your process investments. It's the ingredient that makes them work.

40%+

of the top 1,000 open-source repos have a bus factor of one.

65% have a bus factor of two or less. One person leaves. The project stalls.

These are the most popular, most scrutinized projects in the world. Now think about your proprietary codebase, where agents just wrote half of it and nobody on the team fully understands the other half.

Without structured context, your agent is a brilliant stranger on their first day. Every single time.

The real bet

Agents will keep getting better. That's table stakes. Everyone will have access to them. The differentiator isn't which agent you use. It's what the agent knows about your codebase.

You've built an incredible kitchen. The best models, the best harnesses, the best evals. But the kitchen is only as good as the ingredients. Most teams are feeding their agents whatever context they can scrape together, and wondering why the output tastes generic.

ProdE builds the food layer. It structures your architecture, dependencies, service boundaries, and decision history into knowledge that agents and engineers can query instantly. It stays current as your codebase changes. It works with whatever model or framework comes next. When the stack shifts again, and it will, your context layer just plugs into the new tools and keeps compounding.

See how ProdE builds your context layer

Get a demo