All posts

LLMs Come and Go. Your Context Is Forever.

Your coding agent automated the entire code-writing part. Congrats, you've earned some serious LinkedIn bragging rights. Go ahead, make that post. But when you come back to your broken code, I have a suggestion for you.

Your agents shipped more code last quarter than your entire team did the year before.

How much of it does anyone actually understand?

Six months ago, AI was a copilot. It suggested a line. Let's be honest, you mostly accepted everything the agent gave you. And when you accepted it, did you actually understand what the agent was about to do? Did you trace the dependencies? The change impact? Did you check if it fit the pattern three files over?

Or did you just punt the comprehension downstream to another agent to do the code review? And another one to write the tests?

Now agents write entire features and open pull requests across services you haven't looked at in months. The developer who would have built the mental model while writing the code... never writes the code.

Here's what it looks like on a Wednesday morning. Agent A refactors a shared authentication utility in PR #247. Agent B, working two hours later, builds a new feature using the old auth pattern in PR #251. Your senior engineer reviews both. They're individually correct. She approves both. Three weeks later, a production incident traces back to the inconsistency. The postmortem takes two days. Nobody saw it coming because nobody had the full picture.

That's not a hypothetical. That's the shape of every agentic codebase incident we've seen. The code was correct. The context was missing.

You solved the production bottleneck. Congratulations. You created a comprehension crisis.

Developers already spent 70% of their time understanding existing code. Only 5% was actually writing it. AI agents automated the 5%. The 70% just got way worse. Code is shipping faster than ever, and less of it is in anyone's head.

All the way back in 2025 (ages ago in AI years), METR ran a randomized controlled trial on experienced developers using AI tools on real codebases. They expected to be 24% faster. They got 19% slower. And they still felt faster.

Greenfield task
55%
faster with AI on clean, isolated tasks
GitHub / Peng et al. 2023
vs
Real codebase
19%
slower on familiar repos with existing architecture
METR 2025 (randomized controlled trial)

The agents aren't broken. They're context-blind. And now they're writing all the code.

You might assume agents make new hires productive on day one. They write code from the first commit, what's the problem? The problem is no one is building the mental model anymore. The agent isn't. It starts cold every session, re-discovers your codebase from scratch, then forgets. Your harness isn't either. Skills, rules, evals. Those are process, not knowledge. And your engineers aren't, because they're not the ones writing the code anymore. Every shipped PR widens the gap between the codebase and any persistent understanding of it. The code grows. The understanding stays nowhere. And nobody is tracking the compounding.

And your incomplete documentation won't save you. Let's be honest, no one reads the docs anyway. They still need to exist, but for your agents now, not for you. 90% of engineering knowledge is tacit. Never written down. Your agents shipped thousands of lines since the last doc update. Average developer tenure: two years. When they leave, their context leaves with them.

You've already invested millions in the process

If you're reading this in 2026, you're not starting from zero. You've picked your models. GPT-5.5, Claude Opus 4.7, Gemini 3, maybe all three depending on the task. You've built harnesses, prompt chains that route tasks through multi-step workflows. You've wired up skills: code search, file editing, test running, deployment. You've built evals to measure agent output quality and catch regressions. You've standardized on an agent framework with orchestration, routing, and fallbacks. You've added guardrails for safety, cost controls, and human-in-the-loop review.

That's millions of dollars in process investment. The machinery around your agents is sophisticated.

But what are you feeding them?

Skills aren't food. Rules aren't food. Prompts aren't food. Your context is the food.

Your agent has access to every tool. It can search code, read files, run tests, open pull requests. What it doesn't have is understanding. It doesn't know that the billing service was split from the payments service eight months ago and the shared types still live in the old package. It doesn't know that the "legacy" auth flow actually handles 60% of your enterprise customers and can't be deprecated. It doesn't know that the last time someone refactored the notification system without understanding the retry semantics, it caused a three-day incident.

All that process investment, every harness, every eval, every guardrail, is operating on blind input.

Process depreciates. Context compounds.

Models keep getting better. Nobody disputes that. GPT-5.5 is smarter than GPT-4. Opus 4.7 is smarter than Opus 3. But you've been waiting for the model that finally understands your codebase for three years now. Each release is a little better at reasoning. The problems haven't gone away. Your agents still duplicate code. They still miss cross-service dependencies. They still break patterns they can't see. The problem was never that the model wasn't smart enough. The problem is that your context was never structured enough for any model to use.

Heard Claude Code quality went sideways recently? Whole subreddits about it. Anthropic admitted it. The model didn't get dumber overnight. The context layer didn't get smarter overnight. Same problem, new month. The next provider drama is already in the queue.

Over 80 major model releases from the top labs in three years. Not minor patches. Full new architectures, new capabilities, new pricing, new context windows. The model you built your agent workflows around last quarter is already mid-tier. The three frontier models your team relies on right now? Look at the chart. Every one of them is ticking.

MODEL SHELF LIFE2023202420252026frontierusers leftkilledtickingNOWGPT-414 moClaude 3 Opus3 moGPT-4o15 mo3.5 Sonnet4 moGPT-4.55 moGPT-53 moOpus 4.5?Opus 4.6?14 months → 3 months → 2 months. The shelf life is collapsing.

And the agent layer is commoditizing just as fast. Everyone gets the same harnesses. The same evals. The same frameworks. The agent framework you standardized on six months ago? Already being replaced. The custom integrations, the workflow automations, the tooling your team built around a specific agent's quirks... all of it depreciates on 3-to-6 month cycles.

What's not commoditizing is the knowledge specific to your codebase.

PROCESS INVESTMENTSvaluetimenet: flatAlways starting overCONTEXT INVESTMENTvaluetimeAlways compounding

"As AI becomes more capable and agentic, models themselves become more of a commodity, and all value gets created by how you steer, ground, and finetune these models with your business data and workflow."

Satya Nadella, CEO, Microsoft

Context is the food your agents consume

Your agent is only as good as what you feed it. Structured codebase knowledge, architecture maps, dependency graphs, service boundaries, decision history, pattern libraries. The kind of knowledge that used to live only in your senior engineers' heads, made persistent, queryable, and available to every agent on every task.

Better context makes every other investment work harder. Your harnesses make smarter routing decisions. Your evals catch real issues instead of false positives. Your tools operate on the right files with the right patterns. Your agents write code that fits, not just code that compiles.

Take two teams. Same size. Same models. Same agent framework. Same IDE. Give both agents the same task: refactor the billing service.

Team A has the best stack money can buy and zero structured context. Their agent produces code that compiles, passes tests, and causes an incident two weeks later. It duplicated a utility that already existed in a shared package. It used the old auth pattern that was deprecated after the January migration. It didn't know.

Team B has the same stack and rich structured context. Their agent produces code that fits like a senior engineer wrote it. It used the existing shared utility. It followed the new auth pattern. It understood the service boundaries because it could see them.

Your food isn't competing with your kitchen. The fuel in your Toyota isn't competing with the engine. Your context isn't competing with your harnesses, your evals, your agent framework. They need each other. One is useless without the other.

40%+

of the top 1,000 open-source repos would stall if one person quit tomorrow.

For 65% of them, two people quitting does it. The whole project lives in that handful of heads.

These are the most popular, most scrutinized projects in the world. Now think about your proprietary codebase, where agents just wrote half of it and nobody on the team fully understands the other half.

Without structured context, your agent is a brilliant stranger on their first day. Every single time.

The real bet

Agents will keep getting better. That's table stakes. Everyone will have access to them. The differentiator isn't which agent you use. It's what the agent knows about your codebase.

You've built an incredible kitchen. The best models, the best harnesses, the best evals. But the kitchen is only as good as the ingredients. Most teams are feeding their agents whatever context they can scrape together, and wondering why the output tastes generic.

Don't confuse the kitchen with the ingredients.

ProdE is the food. For your engineers when a new service lands on their plate. For Claude when it's about to open a PR. For Codex, for Gemini, for whatever agent you're running tomorrow. It structures your architecture, dependencies, service boundaries, and decision history into knowledge that agents and engineers can query instantly. It stays current as your codebase changes. It works with whatever model or framework comes next. When the stack shifts again, and it will, your context layer just plugs into the new tools and keeps compounding. (Check out our open benchmarks.)

Okay. Now let's prepare the food for when the next model drops.

See how ProdE builds your context layer

Get a demo