Skip Thinking Mode on Opus and Save Your Tokens
Thinking mode on Opus is overkill.
After a month of running Opus with and without the extra reasoning step, the pattern is consistent: the output quality stays the same, but the cost and latency go up.
The experiment
Hypothesis: Opus with thinking enabled produces measurably higher-quality results on coding tasks than Opus without it.
Setup: Same model. Same tasks. Toggle thinking mode on and off across a month of real work.
Result: No measurable difference in output quality. The thinking step adds tokens and wait time without changing what ships.
"I've used Opus for like a month now without thinking, and I've tried it with thinking. I think for Opus, thinking is overkill and pretty much a waste."
Hannes Rudolph,
The mechanism makes sense when you think about what thinking mode actually does. It gives the model a scratch pad to reason through a problem before responding. For models that need that extra step to handle complexity, it helps. For Opus, the base model already handles complex reasoning well enough that the scratch pad becomes redundant compute.
The tier-based rule
The pattern extends beyond just Opus. Match thinking mode to model tier, not task complexity:
"When I run Opus, I don't do thinking. When I run Sonnet, I use minimal thinking."
Hannes Rudolph,
Opus: Skip thinking entirely. The base reasoning is strong enough.
Sonnet: Use minimal thinking. Just enough to compensate for the lighter base reasoning, not enough to balloon the token count.
The intuition that "complex task = enable thinking" is backwards. Complex task + capable model = skip thinking. The reasoning is already there.
The tradeoff
This is not "thinking mode is useless." Thinking mode exists because some models need it to handle multi-step reasoning. The question is whether this specific model needs it for your specific tasks.
For Opus on coding tasks, the evidence points to no.
"I feel like Opus is just so good at doing everything that having thinking enabled doesn't improve anything at all."
Hannes Rudolph,
The cost is concrete: thinking mode consumes tokens for the reasoning trace. If the reasoning trace does not change the final output, those tokens are overhead. Multiply that across a day of coding tasks and the waste compounds.
Latency matters too. Every task waits for the thinking step to complete before you see the response. If the thinking step is not improving the result, you are paying in time for nothing.
When thinking mode might still help
The pattern likely differs by model and by task type. If you are running a model that struggles with multi-step reasoning without the scratch pad, thinking mode earns its cost. If you are running Opus on tasks that require long chains of dependent logic (mathematical proofs, complex algorithm design), it might still add value.
But for typical coding tasks - refactors, bug fixes, feature implementation, PR review context - Opus handles these without the extra step.
How Roo Code enables intentional token spending
Roo Code's BYOK (Bring Your Own Key) model means you pay your AI provider directly for exactly the tokens you use. There is no markup, no hidden fees, and no intermediary padding the bill.
This makes optimization choices like disabling thinking mode on Opus directly actionable. When you configure Roo Code to use Opus without thinking mode, you immediately see the cost reduction in your provider dashboard. The savings compound across every task the agent runs.
Roo Code lets you close the loop on token optimization: configure your model settings, run tasks, and verify your actual spend against your expected spend - all without leaving your editor.
The transparency matters. When you bring your own API key, you control the model, the parameters, and the cost. Roo Code executes your configuration without adding overhead.
Thinking mode comparison by model tier
| Dimension | Opus with thinking | Opus without thinking | Sonnet with minimal thinking |
|---|---|---|---|
| Output quality (coding tasks) | Baseline | Same as baseline | Slightly below Opus |
| Token cost per task | Higher (reasoning trace) | Lower | Moderate |
| Response latency | Longer wait | Faster response | Moderate wait |
| When to use | Complex mathematical proofs | Standard coding work | General development tasks |
| Cost efficiency | Lower (redundant compute) | Higher | Balanced |
Why this matters for your workflow
For an engineer running 10-20 coding tasks per day on Opus, disabling thinking mode could shave 10-20% off your token spend without changing your output quality. That is not a small efficiency gain. It is also not a small latency improvement: you get the same result in less wall-clock time.
The setting is a toggle. The experiment is straightforward. Disable thinking mode for a week. Track whether your outputs degrade. If they do not, you have been paying for compute that was not helping.
The test
Disable thinking mode on Opus. Run your normal tasks for a week.
If output quality drops, re-enable it. If it does not, you just cut your Opus spend without losing anything.
The setting is reversible. The experiment costs nothing but attention.
Frequently asked questions
Stop being the human glue between PRs
Cloud Agents review code, catch issues, and suggest fixes before you open the diff. You review the results, not the process.