Same Model: 78% vs 42% — The Harness Made the Difference

Nate's Newsletter | March 2026 Source: https://natesnewsletter.substack.com/p/same-model-78-vs-42-the-harness-made

Thesis

Stop debating which AI model won. The harness is what actually determines outcomes — not the model itself. The harness is where the AI runs, what it remembers between sessions, which tools it can touch, how it manages multiple tasks, and how deep your team's dependency grows every week you build around it.

The Headline Stat

The same model scored 78% in one harness vs 42% in another. Most evaluation processes wouldn't catch this difference — because they're comparing models when they should be comparing harnesses.

Claude Code vs Codex — Two Different Bets

	Claude Code	Codex
Access	Full machine access	Sealed sandbox
Memory	Builds project memory over time	Stateless
Style	Works in your environment	Slides results under the door

Neither is converging toward the other. Both bets are working.

The Compounding Lock-in Problem

One developer built six layers of workflow automation over a few months, each layer depending on the previous one. He couldn't have started with the final version — it only works because it accumulated. If he switched harnesses tomorrow, every layer resets to zero.

Now multiply that by every engineer on your team. That's the lock-in nobody is pricing into their decisions, and it's the most expensive blind spot in software tooling right now.

5 architectural decisions are actively locking teams in right now — each compounds separately, all five compound together:

Where the AI runs — your machine vs. a remote sandbox
Session memory — persistent project context vs. stateless
Tool surface — what the AI can actually touch
Multi-task handling — parallel vs. single-stream
Depth of team dependency — conventions, scripts, onboarding docs built around the harness

The Cursor Economics Warning

A $2 billion company spending 100% of its revenue on API costs. Cursor's trajectory reveals what happens when harness economics get ignored — until it becomes very expensive. The lock-in isn't just technical. It's financial.

Key Takeaway

The model is the brain. The harness is everything else. Every week you build around one harness, switching costs go up. Most teams haven't made this choice consciously — they've just been making it by default.

Practical Tools (in the article)

Harness audit prompt — scores your lock-in across 5 dimensions and routes your actual work to the right tool this week
Executive brief generator — translates the audit into engineering-weeks and dollars for getting leadership aligned