Skip Thinking Mode on Opus and Save Your Tokens

2026-01-126 min read
ai-modelstoken-optimizationclaude-opusdeveloper-productivity

Thinking mode on Opus is overkill.

After a month of running Opus with and without the extra reasoning step, the pattern is consistent: the output quality stays the same, but the cost and latency go up.

The experiment

Hypothesis: Opus with thinking enabled produces measurably higher-quality results on coding tasks than Opus without it.

Setup: Same model. Same tasks. Toggle thinking mode on and off across a month of real work.

Result: No measurable difference in output quality. The thinking step adds tokens and wait time without changing what ships.

"I've used Opus for like a month now without thinking, and I've tried it with thinking. I think for Opus, thinking is overkill and pretty much a waste."

Hannes Rudolph,

The mechanism makes sense when you think about what thinking mode actually does. It gives the model a scratch pad to reason through a problem before responding. For models that need that extra step to handle complexity, it helps. For Opus, the base model already handles complex reasoning well enough that the scratch pad becomes redundant compute.

The tier-based rule

The pattern extends beyond just Opus. Match thinking mode to model tier, not task complexity:

"When I run Opus, I don't do thinking. When I run Sonnet, I use minimal thinking."

Hannes Rudolph,

Opus: Skip thinking entirely. The base reasoning is strong enough.

Sonnet: Use minimal thinking. Just enough to compensate for the lighter base reasoning, not enough to balloon the token count.

The intuition that "complex task = enable thinking" is backwards. Complex task + capable model = skip thinking. The reasoning is already there.

The tradeoff

This is not "thinking mode is useless." Thinking mode exists because some models need it to handle multi-step reasoning. The question is whether this specific model needs it for your specific tasks.

For Opus on coding tasks, the evidence points to no.

"I feel like Opus is just so good at doing everything that having thinking enabled doesn't improve anything at all."

Hannes Rudolph,

The cost is concrete: thinking mode consumes tokens for the reasoning trace. If the reasoning trace does not change the final output, those tokens are overhead. Multiply that across a day of coding tasks and the waste compounds.

Latency matters too. Every task waits for the thinking step to complete before you see the response. If the thinking step is not improving the result, you are paying in time for nothing.

When thinking mode might still help

The pattern likely differs by model and by task type. If you are running a model that struggles with multi-step reasoning without the scratch pad, thinking mode earns its cost. If you are running Opus on tasks that require long chains of dependent logic (mathematical proofs, complex algorithm design), it might still add value.

But for typical coding tasks - refactors, bug fixes, feature implementation, PR review context - Opus handles these without the extra step.

How Roo Code enables intentional token spending

Roo Code's BYOK (Bring Your Own Key) model means you pay your AI provider directly for exactly the tokens you use. There is no markup, no hidden fees, and no intermediary padding the bill.

This makes optimization choices like disabling thinking mode on Opus directly actionable. When you configure Roo Code to use Opus without thinking mode, you immediately see the cost reduction in your provider dashboard. The savings compound across every task the agent runs.

Roo Code lets you close the loop on token optimization: configure your model settings, run tasks, and verify your actual spend against your expected spend - all without leaving your editor.

The transparency matters. When you bring your own API key, you control the model, the parameters, and the cost. Roo Code executes your configuration without adding overhead.

Thinking mode comparison by model tier

DimensionOpus with thinkingOpus without thinkingSonnet with minimal thinking
Output quality (coding tasks)BaselineSame as baselineSlightly below Opus
Token cost per taskHigher (reasoning trace)LowerModerate
Response latencyLonger waitFaster responseModerate wait
When to useComplex mathematical proofsStandard coding workGeneral development tasks
Cost efficiencyLower (redundant compute)HigherBalanced

Why this matters for your workflow

For an engineer running 10-20 coding tasks per day on Opus, disabling thinking mode could shave 10-20% off your token spend without changing your output quality. That is not a small efficiency gain. It is also not a small latency improvement: you get the same result in less wall-clock time.

The setting is a toggle. The experiment is straightforward. Disable thinking mode for a week. Track whether your outputs degrade. If they do not, you have been paying for compute that was not helping.

The test

Disable thinking mode on Opus. Run your normal tasks for a week.

If output quality drops, re-enable it. If it does not, you just cut your Opus spend without losing anything.

The setting is reversible. The experiment costs nothing but attention.

Frequently asked questions

For Opus on typical coding tasks, the evidence shows no measurable difference in output quality. The base Opus model handles complex reasoning without the additional scratch pad step. Run your own week-long test to verify this holds for your specific workload.
The savings depend on your task volume and complexity. For engineers running 10-20 coding tasks per day on Opus, disabling thinking mode can reduce token spend by 10-20%. The reasoning trace tokens that thinking mode generates become pure overhead when they do not improve the final output.
No. Match thinking mode to model capability, not task complexity. Opus can skip thinking entirely. Sonnet benefits from minimal thinking. Models with weaker base reasoning may need full thinking mode to handle multi-step problems. The tier-based rule is: stronger base model, less thinking needed.
Roo Code uses your API key directly with your chosen provider. Configure model parameters including thinking mode through your provider's API settings or through Roo Code's model configuration. Because Roo Code uses BYOK, your settings apply exactly as specified with no intermediary modifications.
Consider keeping thinking mode for tasks requiring long chains of dependent logic, such as mathematical proofs or complex algorithm design. For standard coding work - refactors, bug fixes, feature implementation, and PR reviews - the evidence suggests thinking mode adds cost without improving results.

Stop being the human glue between PRs

Cloud Agents review code, catch issues, and suggest fixes before you open the diff. You review the results, not the process.