Same Model: 78% vs 42% — The Harness Made the Difference
Claude Code and Codex are making different architectural bets — and your team is compounding one of them every week.

Same Model: 78% vs 42% — The Harness Made the Difference
Nate's Newsletter | March 2026 Source: https://natesnewsletter.substack.com/p/same-model-78-vs-42-the-harness-made
Thesis
Stop debating which AI model won. The harness is what actually determines outcomes — not the model itself. The harness is where the AI runs, what it remembers between sessions, which tools it can touch, how it manages multiple tasks, and how deep your team's dependency grows every week you build around it.
The Headline Stat
The same model scored 78% in one harness vs 42% in another. Most evaluation processes wouldn't catch this difference — because they're comparing models when they should be comparing harnesses.
Claude Code vs Codex — Two Different Bets
| Claude Code | Codex | |
| Access | Full machine access | Sealed sandbox |
| Memory | Builds project memory over time | Stateless |
| Style | Works in your environment | Slides results under the door |
Neither is converging toward the other. Both bets are working.
The Compounding Lock-in Problem
One developer built six layers of workflow automation over a few months, each layer depending on the previous one. He couldn't have started with the final version — it only works because it accumulated. If he switched harnesses tomorrow, every layer resets to zero.
Now multiply that by every engineer on your team. That's the lock-in nobody is pricing into their decisions, and it's the most expensive blind spot in software tooling right now.
5 architectural decisions are actively locking teams in right now — each compounds separately, all five compound together:
- Where the AI runs — your machine vs. a remote sandbox
- Session memory — persistent project context vs. stateless
- Tool surface — what the AI can actually touch
- Multi-task handling — parallel vs. single-stream
- Depth of team dependency — conventions, scripts, onboarding docs built around the harness
The Cursor Economics Warning
A $2 billion company spending 100% of its revenue on API costs. Cursor's trajectory reveals what happens when harness economics get ignored — until it becomes very expensive. The lock-in isn't just technical. It's financial.
Key Takeaway
The model is the brain. The harness is everything else. Every week you build around one harness, switching costs go up. Most teams haven't made this choice consciously — they've just been making it by default.
Practical Tools (in the article)
- Harness audit prompt — scores your lock-in across 5 dimensions and routes your actual work to the right tool this week
- Executive brief generator — translates the audit into engineering-weeks and dollars for getting leadership aligned
Infographics








