GPT-5 Codex Works Best When You Strip Your Harness Down
Learn why GPT-5 Codex underperforms in custom tool harnesses and how stripping down to native tooling like apply patch and ripgrep eliminates the attention tax for better coding agent results.
How teams use agents to iterate, review, and ship PRs with proof.
Showing 12 of 122 posts
Learn why GPT-5 Codex underperforms in custom tool harnesses and how stripping down to native tooling like apply patch and ripgrep eliminates the attention tax for better coding agent results.
Why code correctness benchmarks miss critical agent failure modes and how to evaluate AI coding agents using work style metrics like proactivity, context management, and communication.
Learn why reasoning models repeat file searches across prompts and how preserving reasoning tokens between API calls can boost coding benchmark performance by 4-5 percentage points.
Why code-specific LLMs fail at pair programming - they optimize for syntax prediction but strip out the world understanding needed to build software that actually serves users.
Why AI coding tools live or die on the first response, and how engineering teams can evaluate tools beyond the initial impression.
Learn why AI integrations built in 2023 need a complete rewrite, not patches. The scaffolding you built to work around model limitations now prevents you from using current capabilities.
Why clickable prototypes eliminate guesswork and alignment meetings that specs create - and how AI coding agents make prototype-first workflows the new default.
In many workflows, quality at scale beats speed in series. That sentence sounds wrong until you stop running one agent at a time.
AI is generating over half of Google's production code. The bottleneck didn't disappear, it moved. Here's how teams are adapting review workflows to handle the volume.
Why non-technical builders hit a migration wall with AI coding tools and how engineering artifacts make the difference between throwaway prototypes and production-ready handoffs.
Throwaway apps aren't a bug. They're the workflow. Understanding when rebuilding beats migrating, and what makes prototypes production-ready.
Learn how dev plans transform inexpensive AI models into reliable code generators by providing explicit specifications instead of vague prompts.
Cloud Agents review code, catch issues, and suggest fixes before you open the diff. You review the results, not the process.