๐ค AI Agents Weekly: Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /crawl, Context7 CLI, and More

Original article: https://nlp.elvissaravia.com/p/ai-agents-weekly-claude-code-review
Processed: March 15, 2026 | Source: Elvis Saravia โ AI Newsletter
โ ๏ธ Note: The full article is paywalled. This summary is based on the free preview and table of contents.
Summary
This week's AI Agents Weekly covers a broad range of agent tooling, infrastructure, and research developments across the industry.
In this issue:
- Claude ships multi-agent Code Review
- AutoHarness makes small agents beat large ones
- Perplexity launches an always-on Personal Computer
- Cloudflare ships a one-call /crawl endpoint
- Context7 CLI brings docs to any agent
- Andrew Ng launches Context Hub
- Cursor Marketplace adds 30+ plugins
- OpenAI shares Skills for Agents SDK
- Google launches Gemini Embedding 2
- Meta ships four MTIA chips in two years
- Codex agent files taxes, catches $20K error
Top Stories
Claude Code Review
Anthropic launched Code Review for Claude Code โ an automated system that dispatches multiple AI agents to examine every pull request. Instead of a single pass, parallel agents identify potential issues, verify findings to eliminate false positives, and rank bugs by severity, delivering a consolidated overview comment plus targeted inline annotations.
Key details:
- Multi-agent architecture: The system operates in parallel โ agents scan, verify, and prioritize issues independently, producing both a summary comment and inline code annotations for specific problems.
- Scales with complexity: Review depth adjusts based on PR size. Large PRs (over 1,000 lines) received findings 84% of the time, averaging 7.5 issues per PR. Small PRs (under 50 lines) had findings 31% of the time.
- High precision: Less than 1% of flagged issues were marked incorrect by Anthropic engineers, with the system catching production-critical bugs that appeared routine in diffs.
- Pricing and access: Available now as a research preview for Team and Enterprise customers. Reviews average $15โ25 per PR, billed on token usage, with configurable monthly caps and per-repo controls.
๐ Blog
AutoHarness: Automated Agent Constraint Synthesis
Researchers introduced AutoHarness, a technique that lets LLMs automatically synthesize protective code harnesses around themselves, preventing illegal actions without human-written constraints. Instead of relying on larger, more expensive models, the approach uses iterative code refinement with environmental feedback to generate custom safeguards โ making smaller models outperform bigger unconstrained ones.
Key details:
- Massive illegal action problem: In a recent LLM chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. AutoHarness eliminates this class of failure entirely by generating harnesses that enforce valid actions across 145 different TextArena games.
- Small beats large: Gemini-2.5-Flash with a synthesized harness exceeded Gemini-2.5-Pro's performance while reducing costs โ demonstrating that proper constraints are more valuable than raw model scale for agent environments.
- Zero-shot generalization: The technique extends beyond game-playing to generating full policies in code, eliminating runtime LLM decision-making entirely and achieving higher rewards than GPT-5.2-High on certain benchmarks.
- Practical agent pattern: The core insight applies broadly to any agent deployment โ rather than trusting a model to self-constrain, auto-generate a verified harness that makes illegal states unreachable, shifting safety from model behavior to environment design.
๐ Paper
More This Week (Preview โ Paywalled)
- Perplexity Personal Computer โ always-on AI computer product
- Cloudflare /crawl โ one-call endpoint for web crawling
- Context7 CLI โ brings documentation to any agent
- Andrew Ng Context Hub โ new platform from the AI pioneer
- Cursor Marketplace โ 30+ new plugins for the AI editor
- OpenAI Skills for Agents SDK โ new SDK capabilities
- Google Gemini Embedding 2 โ next-gen embedding model
- Meta MTIA chips โ four chips shipped in two years
- Codex filing taxes โ agent catches a $20K error
Key Takeaways
- Multi-agent verification is production-ready โ Claude Code Review demonstrates that parallel agent pipelines with cross-verification can achieve less than 1% false positive rates at scale, making agentic code review commercially viable.
- Constraints beat raw capability โ AutoHarness shows that wrapping smaller models with auto-generated behavioral harnesses can outperform larger unconstrained models, suggesting the future of reliable agents is in environment design, not just model scaling.
- Agent infrastructure is maturing rapidly โ Cloudflare /crawl, Context7 CLI, OpenAI Skills SDK, and Cursor Marketplace all signal a converging ecosystem of primitives designed specifically for agentic workflows.
- Custom silicon is accelerating โ Meta shipping four MTIA chips in two years reflects the industry-wide push to build dedicated inference hardware for AI workloads.
Infographics








