Manage AI Spend by Measuring Return, Not Cost

2025-11-147 min read
ai-spendengineering-leadershipproductivitydeveloper-tools

"I own it just so that I can tell them I want you to focus on your return, not your cost."

That's JB Brown from Smartsheet, explaining why he consolidated all AI tool spend into a single account under his control.

The budget trap

When AI token costs sit in team budgets, engineers optimize for the wrong metric. They watch the spend. They pick smaller models. They skip the task that might cost fifteen dollars even when it would save three hours.

The incentive structure is backwards. You're measuring input (tokens consumed) instead of output (work completed). Every team manages their own line item, and every team gets cautious.

This is the predictable outcome of distributed AI budgets: usage goes down, and so does the productivity gain you were trying to unlock.

Why not track per-person?

The technical capability to track individual spend exists. You authenticate to use tokens. The data is there. Smartsheet deliberately stays away from it.

"I could actually get down to it because you have to authenticate to use tokens and so then there's a tracking to amount of tokens per person but I don't. I'm kind of staying away from that. I think it will lead to bad mindset and bad behavior."

JB Brown,

The tracking is possible. The question is whether you should. Per-person dashboards create the exact cost anxiety that undermines the productivity gain you're paying for.

The Smartsheet approach

Smartsheet took the opposite path. They moved all AI tool spend into a single account owned by engineering leadership. Not to track costs more closely, but to remove cost from the team-level conversation entirely.

"Here we're trying to drive return, not trying to reduce cost. And so to get that mindset shift and behavior and practice shift, I'm sort of precluding people from thinking about the cost too much."

JB Brown,

The goal is explicit: shift the mental model from "how do I spend less?" to "how do I ship more?"

The metric that matters

If you're not measuring cost, what are you measuring?

Their answer: MR throughput. Merge requests completed. Commits merged. Work shipped.

"We would measure it by MR throughput. And that's what we're trying to drive towards as that outcome."

JB Brown,

This is the difference between treating AI as an expense line and treating it as a productivity lever. Expenses get minimized. Levers get pulled.

The tradeoff

Centralizing spend requires leadership to take ownership of a growing line item. That's a real commitment. You're betting that the productivity gains justify the cost, and you're removing the natural friction that distributed budgets create.

This works when you have the instrumentation to measure output. If you can't track MR throughput (or your equivalent of work completed), you're flying blind. The model only makes sense if you have visibility into what you're getting for the spend.

The other risk: engineers might overconsume without constraints. Smartsheet's approach relies on trust and a focus on outcomes. If your teams aren't outcome-oriented, centralizing spend without guardrails could backfire.

Why this matters for your organization

If you're evaluating AI coding tools at the org level, the budget question comes early. Finance wants to know where the costs sit. Engineering wants to experiment. Someone has to decide who owns the number.

For a 20-person engineering team, the difference between cautious usage and full adoption compounds. If engineers second-guess every expensive task, you're leaving the productivity gain on the table. If they're told "focus on output, I'll handle the spend," you unlock a different behavior entirely.

The question isn't whether to spend on AI tools. It's whether your budget structure encourages the usage patterns that drive return.

How Roo Code enables return-focused AI spend

Roo Code's BYOK (Bring Your Own Key) model aligns directly with this return-over-cost philosophy. When you connect your own API keys, you get transparent token costs without markup, making it straightforward to consolidate spend under a single organizational account.

Because Roo Code closes the loop - proposing diffs, running commands and tests, and iterating on failures - engineers spend tokens on completed work rather than fragmented context-switching. The agent handles the iteration cycle that would otherwise require manual intervention, which means token spend translates more directly to merged code.

Organizations using Roo Code with centralized API accounts can measure return by tracking merge request throughput against token consumption, creating a clear cost-per-outcome metric.

Cost anxiety vs. outcome focus: a comparison

DimensionDistributed budgets (cost focus)Centralized spend (return focus)
Engineer behaviorAvoids expensive tasks even when high-valueUses the right model for the job
Optimization targetMinimize token consumptionMaximize merge request throughput
Model selectionDefaults to cheaper, smaller modelsSelects based on task complexity
Leadership visibilityFragmented across team ledgersSingle account with outcome correlation
Risk profileUnder-utilization of AI capabilityRequires output instrumentation

The decision

Audit where your AI spend currently sits. If it's distributed across team budgets, ask: are engineers optimizing for cost or for output?

If the answer is cost, consider consolidating. Own the spend at a level where someone can credibly say: "I want you to focus on your return, not your cost."

Then measure MR throughput.

Frequently asked questions

Frame the conversation around measurable output, not cost containment. Present a pilot where you track merge request throughput before and after removing per-team budget constraints. Finance responds to productivity metrics with ROI attached. Show them the cost-per-merged-PR math, not just the monthly token bill.
Merge request throughput is the most direct proxy because it measures completed work. Other valid metrics include commits merged, story points delivered, or time-to-first-commit on new tasks. The key is choosing a metric that captures finished work rather than activity. Avoid measuring tokens consumed or hours using the tool - these are inputs, not outputs.
Yes. Roo Code's BYOK model lets you connect a single organizational API account that all team members use. This consolidates token spend into one billing relationship while giving engineers full access to capable models. The transparent pricing - no token markup - makes it easier to correlate spend with output when you measure at the organizational level.
Without output instrumentation, you lose visibility into what you're getting for the spend. The mitigation is measurement: if you track merge request throughput alongside token consumption, you can identify both inefficient usage patterns and high-value workflows. The risk of over-consumption is lower than the risk of under-utilization when engineers self-censor to avoid costs.
Begin by consolidating AI tool spend into a single account with clear billing visibility. Then establish a baseline: track merge requests per engineer per week before and after centralizing spend. Correlate changes in throughput with changes in token consumption. Most teams see throughput increase faster than cost, which is the return you're measuring.

Stop being the human glue between PRs

Cloud Agents review code, catch issues, and suggest fixes before you open the diff. You review the results, not the process.