The Opus Cost Paradox: Expensive Models Can Be Cheaper
The expensive model is cheaper.
Sounds backwards. But if you've watched a smaller model spiral through the same wrong fix three times, you already know.
The spiral pattern
You pick the smaller model because the token price is lower. Budget looks reasonable. You tell yourself you'll just re-prompt if it doesn't work.
Then it doesn't work. You re-prompt. The model suggests a different wrong change. You're an hour in, watching it add unrelated code, confidently iterating toward nowhere.
The token meter keeps running. Your time keeps burning. And you're still no closer to a working diff.
This is the spiral: the model is fast, the model is confident, and the model is wrong in ways that take twenty minutes to diagnose each time.
The math inverts
Using a model that costs more per token often costs less per task.
One developer spent hours in a loop with a smaller model on a performance optimization. The model kept suggesting changes. The changes kept not working. Eventually, mid-task, they switched to Opus.
Four dollars. Four minutes. Done.
"So yeah, it might look scary because it is expensive, but you ended up spending less because it gets the job done quicker. $4 and four minutes instead of $4 and four hours, I guess."
Dan,
The pattern repeats: cheaper model spirals, adding unrelated code until you abandon it, then Opus solves the same task in one pass. The difference isn't that Opus is smarter. The difference is that it finishes.
The tradeoff
This isn't "always use Opus." The tradeoff is real, and it cuts in a specific direction.
Lazy prompts punish you harder on Opus. Token cost scales with context. If you're vague, Opus will be confidently vague back at you, and you'll pay premium rates for the privilege.
"But I would caution anyone listening or watching, if you're using Opus, really take the time to make sure your prompts are thorough. 100% a thing. If you don't, that's when if you do lazy coding where you're just like, try this. Try that. It can work, but it gets very expensive with Opus."
Dan,
The investment is upfront: write a thorough prompt, get a thorough result. Smaller models work fine for straightforward tasks where the failure mode is obvious. The spiral problem hits hardest on ambiguous tasks, the ones where you won't know the model is wrong until you've run the code and checked the output.
Why this matters for your team
For a five-person engineering team shipping daily, the spiral problem compounds. If two developers hit spiral loops per week, each burning an hour before switching models, that's 8-10 hours of lost productivity per month. Time that could have gone into the next feature.
The counterintuitive finding: embracing the expensive model can actually reduce your bill.
"Like if you just embrace Opus, like at first you might see the price go up, like it is an expensive model, but I think you can end up spending a lot less because it doesn't mess up as much."
Dan,
For Series A through C teams with limited engineering bandwidth, this math matters. You're not optimizing for lowest token price. You're optimizing for tasks completed per dollar spent.
The shift
If you've prompted the same task three times and the model keeps missing the point, stop. The token cost of switching models is almost always less than the token cost of another failed attempt.
Track cost-per-completed-task, not cost-per-token. The budget metric changes when you count the retries.
How Roo Code closes the loop on model economics
Roo Code operates on a BYOK (Bring Your Own Key) model, meaning you pay the provider directly at their published rates with no token markup. This transparency matters when optimizing for cost-per-task rather than cost-per-token.
Because Roo Code closes the loop - proposing diffs, running commands and tests, and iterating based on results - you see immediately whether a model is spiraling or converging. You can switch models mid-task without losing context, letting you start with a smaller model for straightforward work and escalate to Opus when the task demands it.
The key insight for AI coding agents: model flexibility with direct provider pricing lets you optimize for completed tasks, not token budgets.
Model selection comparison
| Dimension | Optimize for Token Price | Optimize for Task Completion |
|---|---|---|
| Primary metric | Cost per token | Cost per completed task |
| Model selection | Always use cheapest | Match model to task complexity |
| Failure handling | Re-prompt same model | Escalate to capable model |
| Time accounting | Ignored | Included in total cost |
| Best for | Simple, well-defined tasks | Ambiguous, complex tasks |
Frequently asked questions
Stop being the human glue between PRs
Cloud Agents review code, catch issues, and suggest fixes before you open the diff. You review the results, not the process.