Skip to main content

Command Palette

Search for a command to run...

Same Model: 78% vs 42% — The Harness Made the Difference

Claude Code and Codex are making different architectural bets — and your team is compounding one of them every week.

Updated
3 min read
Same Model: 78% vs 42% — The Harness Made the Difference

Same Model: 78% vs 42% — The Harness Made the Difference

Nate's Newsletter | March 2026 Source: https://natesnewsletter.substack.com/p/same-model-78-vs-42-the-harness-made


Thesis

Stop debating which AI model won. The harness is what actually determines outcomes — not the model itself. The harness is where the AI runs, what it remembers between sessions, which tools it can touch, how it manages multiple tasks, and how deep your team's dependency grows every week you build around it.

The Headline Stat

The same model scored 78% in one harness vs 42% in another. Most evaluation processes wouldn't catch this difference — because they're comparing models when they should be comparing harnesses.


Claude Code vs Codex — Two Different Bets

Claude CodeCodex
AccessFull machine accessSealed sandbox
MemoryBuilds project memory over timeStateless
StyleWorks in your environmentSlides results under the door

Neither is converging toward the other. Both bets are working.


The Compounding Lock-in Problem

One developer built six layers of workflow automation over a few months, each layer depending on the previous one. He couldn't have started with the final version — it only works because it accumulated. If he switched harnesses tomorrow, every layer resets to zero.

Now multiply that by every engineer on your team. That's the lock-in nobody is pricing into their decisions, and it's the most expensive blind spot in software tooling right now.

5 architectural decisions are actively locking teams in right now — each compounds separately, all five compound together:

  1. Where the AI runs — your machine vs. a remote sandbox
  2. Session memory — persistent project context vs. stateless
  3. Tool surface — what the AI can actually touch
  4. Multi-task handling — parallel vs. single-stream
  5. Depth of team dependency — conventions, scripts, onboarding docs built around the harness

The Cursor Economics Warning

A $2 billion company spending 100% of its revenue on API costs. Cursor's trajectory reveals what happens when harness economics get ignored — until it becomes very expensive. The lock-in isn't just technical. It's financial.


Key Takeaway

The model is the brain. The harness is everything else. Every week you build around one harness, switching costs go up. Most teams haven't made this choice consciously — they've just been making it by default.


Practical Tools (in the article)

  • Harness audit prompt — scores your lock-in across 5 dimensions and routes your actual work to the right tool this week
  • Executive brief generator — translates the audit into engineering-weeks and dollars for getting leadership aligned

Infographics

Landscape Infographic

Portrait Infographic

More from this blog

A

AI with Alex & Angus

102 posts

Same Model: 78% vs 42% — The Harness Made the Difference