The new token economy

Measuring productivity by token consumption

Apr 27, 2026

In April 2026, an engineer at Meta built an internal dashboard called Claudeonomics. It tracked how many AI tokens employees had consumed. The top 250 users were given titles like “Token Legend”, “Cache Wizard” and “Session Immortal”. In thirty days Meta employees collectively consumed 60.2 trillion tokens. The top individual user averaged 281 billion tokens in that month - a level of usage that would translate into millions of dollars in compute at typical enterprise pricing. Mark Zuckerberg was reportedly not among the top 250 users.

Gergely Orosz at The Pragmatic Engineer had already documented that this was not a Meta-only phenomenon. Multiple large technology companies have built variants of the same dashboard. Some are framed as leaderboards, others as adoption monitors, others as cost-tracking tools. The details differ by company. The pattern is consistent: AI consumption has become visible, comparable across engineers and in many cases gamified. The reader who wants the specific company-by-company reporting will find it in Orosz’s piece. What follows is built on his reporting but asks a different question: not whether dashboard exists, but what it’s existence tells us about how organizations are measuring AI value and what the right alternative looks like.

The pattern across these companies has acquired a name: tokenmaxxing. The premise is that the more tokens an engineer burns through AI coding assistants, the more productive they are. In some organizations token usage is being surfaced in ways that could influence perceptions of performance. The framing has support from the top of the industry. Nvidia CEO Jensen Huang said publicly on the All-In Podcast that he would be “deeply alarmed” if an engineer earning $500,000 a year did not consume at least $250,000 worth of tokens annually, treating token consumption as a signal of engagement and productivity. Huang’s position, fairly stated, is not that tokens should be the metric - it is that not using them at scale is a sign of underutilization. That is a defensible view of AI as essential tooling. The problem is what happens when organizations downstream of that view turn the “use lots of tokens” framing into “measure people by token consumption,” which is a different position.

Goodhart’s Law, applied

In 1975, British economist Charles Goodhart observed that any statistical regularity used by central banks for control purposes would tend to collapse once the control was applied. Anthropologist Marilyn Strathern restated his observation in 1997 in the form most people now know: “When a measure becomes a target, it ceases to be a good measure.”

Goodhart’s Law is one of the most reliably observed dynamics in any system that measures human behavior. Schools measured by standardized test scores teach to the test. Sales teams measured by call volume make shorter, less useful calls. The pattern is universal. Every measure that becomes a target produces a behavior that maximizes the measure while degrading the underlying outcome the measure was supposed to track.

Software engineering has its own well-documented version of this. For decades, the industry tried to measure developer productivity by counting lines of code. The metric had the same fatal property as token consumption. It measured an input - the volume of text produced - rather than an outcome, namely whether the resulting software solved the right problem reliably. Engineers who were measured by lines of code wrote more lines of code. They added boilerplate. They preferred verbose solutions over concise ones. The industry eventually abandoned the metric, not because it was hard to compute, but because the resulting incentive produced worse software.

Token consumption is the same metric in a new costume. It measures how much AI processing an engineer triggered, not whether the resulting code worked was maintainable or solved a real problem. And like lines of code it can be gamed in ways that are visible to anyone who looks.

Why this metric is uniquely dangerous

Lines of code, when it was the dominant productivity metric, at least had the virtue of being free. The cost of the bad metric was distortion, not money. An engineer who wrote ten percent more code than they should have did not bankrupt the company.

Token consumption is not free. The metric is being applied at a moment when AI tooling has become the largest single line item of software spending per engineer at many organizations. To put this in perspective: the typical engineering team uses some combination of project management software, a wiki, a messaging platform, an email service, and an integrated development environment. The combined per-seat cost of all of these, even at premium tiers, is rarely more than $100-150 per month per engineer. AI tooling at the levels these companies are now incentivizing routinely runs several times that on its own. Some organizations have begun setting per-engineer AI budgets in the hundreds of dollars per month, and reports of individual power users running personal token bills well into five figures in a single month are no longer rare in the industry press.

This means tokenmaxxing has a financial dimension that lines-of-code never had. When engineers game lines of code - the company gets worse software at the same cost. When engineers game token consumption - the company gets worse software AND a much larger bill. The empirical evidence on this is now substantial. Faros AI’s March 2026 report on customer data found that code churn - the rate at which freshly written code is deleted or rewritten - increased 861% under high AI adoption. GitClear reported that regular AI users averaged 9.4x higher code churn than non-AI counterparts. Jellyfish’s analysis of 7,548 engineers in Q1 2026 found that engineers with the largest token budgets achieved twice the throughput at ten times the token cost. The exact magnitudes vary by dataset, but the direction is consistent across reports: more tokens, more code generated, lower fraction of that code surviving review.

And there is a supply problem on top of the demand problem. The AI providers cannot keep up with the consumption their own customers are now incentivizing. Anthropic introduced weekly rate limits on Claude Code in August 2025, then reduced peak-hour quotas in March 2026, publicly acknowledging that they are compute-constrained. Enterprise demand for tokens is now outstripping the supply that exists.

The counter-position is forming

Before listing the executives now publicly arguing against tokenmaxxing it is worth saying clearly why the metric exists in the first place. The reason organizations adopt these metrics is structural rather than malicious. Token consumption is easy to measure, comparable across engineers, available in real time, and produces a number that can be reported up the chain. Outcomes - actual problems solved, actual code shipped that survives review, actual business value created - are none of these things. Faced with the choice between a measurable proxy and an unmeasurable target, every organization defaults to the proxy. This is not a tokenmaxxing problem. It is the same dynamic that produces every bad metric in management history.

A growing chorus of executives is publicly arguing that the entire frame is wrong. HubSpot CEO Yamini Rangan posted on LinkedIn:

“Outcome maxxing >> token maxxing.”

Linear COO Cristina Cordova:

“Ranking engineers by token spend is like me ranking my marketing team by who spent the most money. We may not have hit our KPIs, but Joe spent $200k on a branded blimp that only flies over his own house, so he's getting promoted to VP! Don't mistake a high burn rate for a high success rate.”

Appian CEO Matt Calkins compared tokenmaxxing to Soviet-style metrics - where output was judged by weight rather than quality.

Some companies are already experimenting with outcome-based alternatives. Instead of tracking AI usage, they are setting explicit delivery targets and incorporating expected productivity gains from AI into those targets. In some cases, these targets influence compensation and team-level incentives, shifting the focus from input metrics to delivered results. These approaches are still early and inconsistent, but they reflect a growing recognition that consumption is not a substitute for value. These are not yet best practices. They are the early experiments of executives who have noticed that the input metric is producing the wrong behavior and are trying to design something better. Each of them is making a slightly different bet on what the right output measure looks like. None of them has solved the problem cleanly. But they have at least correctly identified that the problem exists and that consumption is not a substitute for value.

What good token discipline looks like

If tokens are the wrong metric the practical question for any engineering leader is what the right one is. There is no universal answer yet but there are some useful starting points drawn from how good engineering organizations have always handled inputs that are easy to count and outcomes that are hard to.

First, measure cost-per-resolved-task rather than cost-per-token. A senior engineer who solves a complex bug with a single well-designed prompt and a junior engineer who solves the same bug with thirty iterative prompts have produced the same outcome at very different costs. The token game celebrates the junior engineer for having burned more tokens. The right metric should celebrate the senior engineer for having achieved the same outcome at a fraction of the cost. This requires actually defining what counts as a resolved task, which is hard, but it is the kind of hard that good engineering organizations have always had to do.

Second, measure the engineering practices that reduce token consumption while improving outcomes. Better prompt design. Better context management. Better output validation. Better understanding of which model to use for which task - the most powerful model is rarely the right choice for the simplest problem. These are the practices that the current metric is actively discouraging, because each of them reduces the input number while improving the output. Any organization that takes AI engineering seriously will eventually need to reward these practices explicitly.

Third, treat AI spending as a real budget category - not a free-tier experiment. The era when AI was a curiosity that engineers dabbled with on personal accounts is over. AI spend is now operational expenditure at scale. It deserves the same financial discipline as any other major operational cost: clear budgets, clear ownership, regular review of cost-per-outcome rather than cost-per-input, and an honest conversation about which deployments are producing value and which are not. Most organizations are not yet treating AI spend this way.

The new business category

One business category has emerged in roughly the last decade to manage exactly this kind of problem. When cloud computing transformed from a developer convenience into a major operational cost category, organizations discovered that nobody was responsible for managing the bill. Engineers spun up resources without thinking about cost. Finance teams saw the aggregate number going up but had no visibility into what was driving it. The tools that existed for traditional infrastructure procurement did not work because cloud spending was distributed across thousands of small decisions made daily by engineers who did not see themselves as making financial decisions. Out of that gap, FinOps emerged as a discipline - a practice that combines engineering, finance, and product to bring cost visibility into engineering decisions and engineering judgment into financial planning. With it came a category of tooling each trying to surface what cloud spending was actually paying for. The category did not exist in 2015. By 2025 it was a multi-billion-dollar industry.

Token economy management is on the same trajectory and looks structurally similar in every important way. The spending is distributed across thousands of small decisions made daily by engineers who do not see themselves as making financial decisions. Finance teams see the aggregate going up without visibility into what is driving it. The traditional procurement tools do not work because the spending is not centralized. The tools to manage it well do not yet exist at the maturity that FinOps tools have reached, but they will. The category of model management tools - software that measures cost-per-outcome rather than cost-per-token, that helps organizations choose the right model for each task, that surfaces the engineering practices that reduce consumption without sacrificing quality - is going to be a meaningful business category in the next few years. Some of the analytics companies in the developer productivity space are already moving in this direction. None of them has yet built the full equivalent of what FinOps tools do for cloud spending but the demand is now obvious enough that someone will.

The structural connection

There is one more observation worth making because it connects this practical problem to a larger pattern in how AI is currently being deployed. The reason organizations measure tokens rather than outcomes is the same reason capital allocators measure AI spending rather than AI value. Tokens are easy to count. Outcomes are hard to define. The financial system at every level from the venture capital funding the AI labs to the engineering manager evaluating the AI engineer, defaults to the measure that can be counted, even when the thing that should be counted is something else entirely.

Tokenmaxxing is what happens when this default is applied at the engineering team level. The trillions flowing into AI infrastructure on the basis of measurable inputs rather than demonstrable outputs is the same dynamic at the macroeconomic level. Both are downstream of the same structural preference: the financial and management systems that surround AI consistently choose the metric that is easy to measure over the metric that captures what is actually valuable. The cost of this preference, at every scale, is the same. We optimize for what we can count and we get worse outcomes than we would have if we had optimized for what we actually wanted.

The good news is that token economy management is something an individual engineering organization can actually fix. The discipline can be developed. The competitive advantage of being among the first to take this seriously is large enough to justify the investment. The companies that recognize tokens as the wrong metric, name what they want to measure instead, and build the discipline to do so, will operate at a structural advantage over the ones running leaderboards. That advantage is available right now to any organization willing to do the work.

Sources. The Meta Claudeonomics dashboard is documented by The Information and Fortune (April 2026). The empirical productivity data is from Faros AI (March 2026), GitClear (January 2026), and Jellyfish (Q1 2026 report on 7,548 engineers), as summarized in TechCrunch coverage. The Anthropic rate limit history draws on TechCrunch and The Register (2025-2026). Goodhart’s Law in its canonical form is from Marilyn Strathern’s 1997 paper “Improving Ratings: Audit in the British University System,” restating Charles Goodhart’s 1975 economic observation. Counter-position quotes from Yamini Rangan, Cristina Cordova, Matt Calkins are drawn from public statements.

Shoutout to Towfiqu barbhuiya for the image!

UnderExplained AI

Comments

Ready for more?