UnderExplained AI

The new token economy

Mon, 27 Apr 2026 22:22:57 GMT

In April 2026, an engineer at Meta built an internal dashboard called Claudeonomics. It tracked how many AI tokens employees had consumed. The top 250 users were given titles like “Token Legend”, “Cache Wizard” and “Session Immortal”. In thirty days Meta employees collectively consumed 60.2 trillion tokens. The top individual user averaged 281 billion tokens in that month - a level of usage that would translate into millions of dollars in compute at typical enterprise pricing. Mark Zuckerberg was reportedly not among the top 250 users.

Gergely Orosz at The Pragmatic Engineer had already documented that this was not a Meta-only phenomenon. Multiple large technology companies have built variants of the same dashboard. Some are framed as leaderboards, others as adoption monitors, others as cost-tracking tools. The details differ by company. The pattern is consistent: AI consumption has become visible, comparable across engineers and in many cases gamified. The reader who wants the specific company-by-company reporting will find it in Orosz’s piece. What follows is built on his reporting but asks a different question: not whether dashboard exists, but what it’s existence tells us about how organizations are measuring AI value and what the right alternative looks like.

The pattern across these companies has acquired a name: tokenmaxxing. The premise is that the more tokens an engineer burns through AI coding assistants, the more productive they are. In some organizations token usage is being surfaced in ways that could influence perceptions of performance. The framing has support from the top of the industry. Nvidia CEO Jensen Huang said publicly on the All-In Podcast that he would be “deeply alarmed” if an engineer earning $500,000 a year did not consume at least $250,000 worth of tokens annually, treating token consumption as a signal of engagement and productivity. Huang’s position, fairly stated, is not that tokens should be the metric - it is that not using them at scale is a sign of underutilization. That is a defensible view of AI as essential tooling. The problem is what happens when organizations downstream of that view turn the “use lots of tokens” framing into “measure people by token consumption,” which is a different position.

Goodhart’s Law, applied

In 1975, British economist Charles Goodhart observed that any statistical regularity used by central banks for control purposes would tend to collapse once the control was applied. Anthropologist Marilyn Strathern restated his observation in 1997 in the form most people now know: “When a measure becomes a target, it ceases to be a good measure.”

Goodhart’s Law is one of the most reliably observed dynamics in any system that measures human behavior. Schools measured by standardized test scores teach to the test. Sales teams measured by call volume make shorter, less useful calls. The pattern is universal. Every measure that becomes a target produces a behavior that maximizes the measure while degrading the underlying outcome the measure was supposed to track.

Software engineering has its own well-documented version of this. For decades, the industry tried to measure developer productivity by counting lines of code. The metric had the same fatal property as token consumption. It measured an input - the volume of text produced - rather than an outcome, namely whether the resulting software solved the right problem reliably. Engineers who were measured by lines of code wrote more lines of code. They added boilerplate. They preferred verbose solutions over concise ones. The industry eventually abandoned the metric, not because it was hard to compute, but because the resulting incentive produced worse software.

Token consumption is the same metric in a new costume. It measures how much AI processing an engineer triggered, not whether the resulting code worked was maintainable or solved a real problem. And like lines of code it can be gamed in ways that are visible to anyone who looks.

Why this metric is uniquely dangerous

Lines of code, when it was the dominant productivity metric, at least had the virtue of being free. The cost of the bad metric was distortion, not money. An engineer who wrote ten percent more code than they should have did not bankrupt the company.

Token consumption is not free. The metric is being applied at a moment when AI tooling has become the largest single line item of software spending per engineer at many organizations. To put this in perspective: the typical engineering team uses some combination of project management software, a wiki, a messaging platform, an email service, and an integrated development environment. The combined per-seat cost of all of these, even at premium tiers, is rarely more than $100-150 per month per engineer. AI tooling at the levels these companies are now incentivizing routinely runs several times that on its own. Some organizations have begun setting per-engineer AI budgets in the hundreds of dollars per month, and reports of individual power users running personal token bills well into five figures in a single month are no longer rare in the industry press.

This means tokenmaxxing has a financial dimension that lines-of-code never had. When engineers game lines of code - the company gets worse software at the same cost. When engineers game token consumption - the company gets worse software AND a much larger bill. The empirical evidence on this is now substantial. Faros AI’s March 2026 report on customer data found that code churn - the rate at which freshly written code is deleted or rewritten - increased 861% under high AI adoption. GitClear reported that regular AI users averaged 9.4x higher code churn than non-AI counterparts. Jellyfish’s analysis of 7,548 engineers in Q1 2026 found that engineers with the largest token budgets achieved twice the throughput at ten times the token cost. The exact magnitudes vary by dataset, but the direction is consistent across reports: more tokens, more code generated, lower fraction of that code surviving review.

And there is a supply problem on top of the demand problem. The AI providers cannot keep up with the consumption their own customers are now incentivizing. Anthropic introduced weekly rate limits on Claude Code in August 2025, then reduced peak-hour quotas in March 2026, publicly acknowledging that they are compute-constrained. Enterprise demand for tokens is now outstripping the supply that exists.

The counter-position is forming

Before listing the executives now publicly arguing against tokenmaxxing it is worth saying clearly why the metric exists in the first place. The reason organizations adopt these metrics is structural rather than malicious. Token consumption is easy to measure, comparable across engineers, available in real time, and produces a number that can be reported up the chain. Outcomes - actual problems solved, actual code shipped that survives review, actual business value created - are none of these things. Faced with the choice between a measurable proxy and an unmeasurable target, every organization defaults to the proxy. This is not a tokenmaxxing problem. It is the same dynamic that produces every bad metric in management history.

A growing chorus of executives is publicly arguing that the entire frame is wrong. HubSpot CEO Yamini Rangan posted on LinkedIn:

“Outcome maxxing >> token maxxing.”

Linear COO Cristina Cordova:

“Ranking engineers by token spend is like me ranking my marketing team by who spent the most money. We may not have hit our KPIs, but Joe spent $200k on a branded blimp that only flies over his own house, so he's getting promoted to VP! Don't mistake a high burn rate for a high success rate.”

Appian CEO Matt Calkins compared tokenmaxxing to Soviet-style metrics - where output was judged by weight rather than quality.

Some companies are already experimenting with outcome-based alternatives. Instead of tracking AI usage, they are setting explicit delivery targets and incorporating expected productivity gains from AI into those targets. In some cases, these targets influence compensation and team-level incentives, shifting the focus from input metrics to delivered results. These approaches are still early and inconsistent, but they reflect a growing recognition that consumption is not a substitute for value. These are not yet best practices. They are the early experiments of executives who have noticed that the input metric is producing the wrong behavior and are trying to design something better. Each of them is making a slightly different bet on what the right output measure looks like. None of them has solved the problem cleanly. But they have at least correctly identified that the problem exists and that consumption is not a substitute for value.

What good token discipline looks like

If tokens are the wrong metric the practical question for any engineering leader is what the right one is. There is no universal answer yet but there are some useful starting points drawn from how good engineering organizations have always handled inputs that are easy to count and outcomes that are hard to.

First, measure cost-per-resolved-task rather than cost-per-token. A senior engineer who solves a complex bug with a single well-designed prompt and a junior engineer who solves the same bug with thirty iterative prompts have produced the same outcome at very different costs. The token game celebrates the junior engineer for having burned more tokens. The right metric should celebrate the senior engineer for having achieved the same outcome at a fraction of the cost. This requires actually defining what counts as a resolved task, which is hard, but it is the kind of hard that good engineering organizations have always had to do.

Second, measure the engineering practices that reduce token consumption while improving outcomes. Better prompt design. Better context management. Better output validation. Better understanding of which model to use for which task - the most powerful model is rarely the right choice for the simplest problem. These are the practices that the current metric is actively discouraging, because each of them reduces the input number while improving the output. Any organization that takes AI engineering seriously will eventually need to reward these practices explicitly.

Third, treat AI spending as a real budget category - not a free-tier experiment. The era when AI was a curiosity that engineers dabbled with on personal accounts is over. AI spend is now operational expenditure at scale. It deserves the same financial discipline as any other major operational cost: clear budgets, clear ownership, regular review of cost-per-outcome rather than cost-per-input, and an honest conversation about which deployments are producing value and which are not. Most organizations are not yet treating AI spend this way.

The new business category

One business category has emerged in roughly the last decade to manage exactly this kind of problem. When cloud computing transformed from a developer convenience into a major operational cost category, organizations discovered that nobody was responsible for managing the bill. Engineers spun up resources without thinking about cost. Finance teams saw the aggregate number going up but had no visibility into what was driving it. The tools that existed for traditional infrastructure procurement did not work because cloud spending was distributed across thousands of small decisions made daily by engineers who did not see themselves as making financial decisions. Out of that gap, FinOps emerged as a discipline - a practice that combines engineering, finance, and product to bring cost visibility into engineering decisions and engineering judgment into financial planning. With it came a category of tooling each trying to surface what cloud spending was actually paying for. The category did not exist in 2015. By 2025 it was a multi-billion-dollar industry.

Token economy management is on the same trajectory and looks structurally similar in every important way. The spending is distributed across thousands of small decisions made daily by engineers who do not see themselves as making financial decisions. Finance teams see the aggregate going up without visibility into what is driving it. The traditional procurement tools do not work because the spending is not centralized. The tools to manage it well do not yet exist at the maturity that FinOps tools have reached, but they will. The category of model management tools - software that measures cost-per-outcome rather than cost-per-token, that helps organizations choose the right model for each task, that surfaces the engineering practices that reduce consumption without sacrificing quality - is going to be a meaningful business category in the next few years. Some of the analytics companies in the developer productivity space are already moving in this direction. None of them has yet built the full equivalent of what FinOps tools do for cloud spending but the demand is now obvious enough that someone will.

The structural connection

There is one more observation worth making because it connects this practical problem to a larger pattern in how AI is currently being deployed. The reason organizations measure tokens rather than outcomes is the same reason capital allocators measure AI spending rather than AI value. Tokens are easy to count. Outcomes are hard to define. The financial system at every level from the venture capital funding the AI labs to the engineering manager evaluating the AI engineer, defaults to the measure that can be counted, even when the thing that should be counted is something else entirely.

Tokenmaxxing is what happens when this default is applied at the engineering team level. The trillions flowing into AI infrastructure on the basis of measurable inputs rather than demonstrable outputs is the same dynamic at the macroeconomic level. Both are downstream of the same structural preference: the financial and management systems that surround AI consistently choose the metric that is easy to measure over the metric that captures what is actually valuable. The cost of this preference, at every scale, is the same. We optimize for what we can count and we get worse outcomes than we would have if we had optimized for what we actually wanted.

The good news is that token economy management is something an individual engineering organization can actually fix. The discipline can be developed. The competitive advantage of being among the first to take this seriously is large enough to justify the investment. The companies that recognize tokens as the wrong metric, name what they want to measure instead, and build the discipline to do so, will operate at a structural advantage over the ones running leaderboards. That advantage is available right now to any organization willing to do the work.

Sources. The Meta Claudeonomics dashboard is documented by The Information and Fortune (April 2026). The empirical productivity data is from Faros AI (March 2026), GitClear (January 2026), and Jellyfish (Q1 2026 report on 7,548 engineers), as summarized in TechCrunch coverage. The Anthropic rate limit history draws on TechCrunch and The Register (2025-2026). Goodhart’s Law in its canonical form is from Marilyn Strathern’s 1997 paper “Improving Ratings: Audit in the British University System,” restating Charles Goodhart’s 1975 economic observation. Counter-position quotes from Yamini Rangan, Cristina Cordova, Matt Calkins are drawn from public statements.

Shoutout to Towfiqu barbhuiya for the image!

The result of circumstances

Pavel — Sun, 26 Apr 2026 20:28:31 GMT

There is a way of talking about AI that has become so common we have stopped noticing it. AI “is happening”. We are “in the AI era.” The technology “arrived.” The trillions of dollars now flowing into data centers and foundation models are described as if they are flowing toward AI the way water flows downhill because of some underlying force that makes this the obvious place for them to go. AI is treated as a natural phenomenon, like weather, that we are responding to.

This framing is wrong. The current AI moment did not happen because of any deep necessity. It happened because of a specific set of circumstances, most of them contingent, several of them accidents.

The four conditions that made AI possible

The first is the 2017 Transformer paper. Eight Google Brain researchers published “Attention Is All You Need” in June 2017, originally as an attempt to improve machine translation. Their architectural choice - dispensing with recurrence and convolution entirely, relying on attention mechanisms that could be parallelized across GPUs - was not the obvious one. Recurrent neural networks had dominated sequence modeling for a decade. The Transformer was a research bet that could have produced a marginally better translation system and been forgotten. Instead it became the architecture underlying every major language model since. All eight of the original authors have since left Google.

The second is the existence of internet-scale text. The training data that made large language models work was a one-time accident of the past twenty-five years. A generation of humans wrote into an open public web before there was any commercial reason to lock that text behind paywalls or licensing terms. Reddit, Wikipedia, Stack Overflow, GitHub - the corpus that GPT-3 and its descendants were trained on existed because of cultural and commercial choices made between roughly 2000 and 2020 that nobody at the time understood would later become the substrate of a new technology. That window is now closing, with major content publishers litigating, licensing, or blocking AI training.

The third is GPU maturation, which itself was contingent on the gaming industry. Nvidia’s parallel processors were originally built to render 3D graphics for video games. The chips happened to be useful for parallel scientific computing, which Nvidia recognized in 2006 with CUDA. The connection to deep learning came in 2010 after Andrew Ng demonstrated that GPUs could accelerate deep learning by roughly a hundred times. The chips that power the current AI boom were created for the purpose of better-looking video games.

The fourth is the post-COVID interest rate environment. From 2020 through 2022, interest rates were at historic lows, making it cheap to finance new emerging technologies.

These are the four contingencies that show up in any thoughtful account of how AI became possible. Take any one of them away and the technology either does not exist, exists in a much weaker form or exists but cannot be deployed at scale. Architecture, data, compute, financing. The standard analytical history of AI ends here, with the conclusion that a specific alignment of conditions made the technology possible. The technology is real; the alignment was not destiny; we should be more humble about how it happened.

This account is correct as far as it goes. It is also incomplete. It explains how AI became possible. It does not explain why trillions of dollars flowed into AI specifically, on this timeline, concentrated in a handful of public companies, at a pace the bond market is now pricing as risky. For that, you have to look at two more circumstances.

The two conditions that explain the capital cycle

The fifth circumstance is the failure of self-driving cars to deliver on 2018 timelines. Between 2014 and 2018, autonomous vehicles attracted tens of billions of dollars in venture capital and corporate investment, with expectations of rapid progress toward full autonomy. That progress slowed. As timelines stretched, some of the capital and talent committed to autonomy began looking for adjacent areas where machine learning progress was more immediately visible. Part of that shift contributed to the generative AI boom that followed.

The sixth is the collapse of crypto as a dominant investment narrative. The 2022 crash - FTX, Terra/Luna, and the broader downturn - ended a multi-year period in which speculative tech capital had been concentrated in blockchain-related bets. That capital did not disappear. It needed a new narrative. The release of ChatGPT in late 2022 arrived at a moment when investors and engineers were already repositioning, and AI became a natural place for some of that attention and capital to move.

Notice what just changed. The first four contingencies were technological and economic preconditions - things that had to be true for AI to exist as a working technology. The last two are not preconditions. They are capital reallocation events. The technology had been available since at least 2020 with the release of GPT-3. The capital surge began in 2023, after the crypto collapse and after the autonomy plateau matured into evident disappointment. The first four contingencies made AI possible. The last two help explain why AI became the available story when speculative capital was already looking for one.

The dot-com parallel

This is a familiar shape. In 1995, the internet was real. It was useful. It was going to change the economy. None of that was wrong. What was wrong was the specific scale and speed of capital deployment between 1998 and 2000, which was driven less by the underlying technology’s near-term economic value than by the speculative narrative the technology supported. Bad businesses got funded as long as the narrative held.

The technology vindicated itself eventually. The internet did reshape the economy, on roughly the timeline that sober analysts in 1995 had predicted. But the capital cycle that bet on the faster timeline destroyed roughly five trillion dollars in market value when it unwound. The technology was right; the capital cycle around it was a separate phenomenon, driven by separate forces, and it failed for reasons that had little to do with whether the underlying technology was good.

This is the structural insight worth holding on to: a technology can be real, valuable, and eventually transformative, and the capital cycle around it can still be a separate phenomenon - a place for capital looking for a story to land, with the story attached to a real technology so the eventual unwinding cannot be cleanly attributed to fraud. AI is in this position now. The technology is real. Foundation models work. They will reshape some industries meaningfully over the coming decades. What is in question is not the technology. It is whether the current scale and speed of capital deployment reflects the technology’s near-term economic value or the financial system’s need for somewhere to put speculative capital after the previous places stopped working.

Taking the counter-position seriously

It is worth taking the strongest version of the opposing case seriously. Capital markets are not stupid. AI may be the most rational allocation available given current constraints. It is scalable, measurable, closer to monetization than the alternatives, and produces real outputs that real customers pay for. Many of the people deploying capital into AI are sophisticated allocators who have personal incentive to be right and personal cost to being wrong. The argument here is not that they are mistaken on the merits of AI specifically. The argument is that the structure of how capital is allocated systematically prefers technologies with certain properties - measurable outputs, visible progress, narratives that can be sustained over short reporting cycles - and that the preference operates whether or not it produces the best outcomes for the populations the capital is supposed to serve.

Why this technology and not the alternatives

Once the structural preference is named, the question of why this technology and not the alternatives becomes answerable. AI fits the preference unusually well. The alternatives do not. Fusion energy operates on development cycles too long for quarterly milestones. Biotech operates on FDA timelines that resist short-cycle storytelling, with binary outcomes that are bad for narrative momentum. Materials science breakthroughs are slow and difficult to demo. Space infrastructure has long payback periods that resist short-cycle framing.

None of these alternatives is obviously inferior to AI in terms of long-term societal impact. Some, by various reasonable measures, could plausibly deliver greater benefits per dollar invested over decades. But they cannot perform the structural role that AI can in the current capital environment. They cannot sustain a cycle of this shape because they do not produce the kind of recurring inputs that public-company investors and venture capital funds are organized to reward. The trillions did not flow into AI solely because it was the technology most in need of capital. They flowed because AI combined real technical progress with characteristics that made it unusually compatible with how capital is allocated today.

The biotech case

The cleanest test of this argument is the one most people lived through. Between 2020 and 2021, biotech venture funding reached unprecedented levels, peaking at roughly $53.9 billion in 2021. The world had just lived through a pandemic that killed seven million people, disrupted the global economy for two years, and revealed in unmistakable terms that biological infrastructure - surveillance systems, vaccine platforms, therapeutic pipelines, manufacturing capacity - was the foundation on which everything else depended. The rational response would have been sustained, increasing investment in biotech and pandemic preparedness. The capital was there. The lesson was fresh. The case was self-evident.

That is not what happened. Biotech VC funding fell sharply in 2022 and 2023, dropping to roughly $24 billion in 2023 - less than half the 2021 peak. By 2024-2025, biotech’s share of US startup investment had fallen well below its post-pandemic peak, while AI absorbed an ever-larger share. In Q1 2026, AI startups captured roughly 80% of all global venture funding.

The capital allocators who pulled back from biotech and increased their exposure to AI between 2022 and 2026 were not unaware of the pandemic. They had just lived through it. The shift is consistent with the pattern described above: capital flowing toward technologies with faster feedback loops and shorter reporting cycles, regardless of where the case for societal value is strongest.

The cost of that shift is not yet visible because biotech development cycles are long. Some of the systems and capabilities that might have been built under sustained funding are simply not in place. If they are needed in the future, their absence will only become clear at that point. The bet may still pay off; AI may produce biomedical breakthroughs that compensate for the underfunding of biotech directly. But that is the bet, and it is being made on behalf of populations who will absorb the cost if it turns out wrong.

Why the framing matters

There are two reasons it matters whether we describe the AI capital cycle as a response to technological inevitability or as a cycle shaped by structural preferences in how capital is allocated. The first is intellectual honesty. The framing of “AI is happening” obscures the fact that the trillions are flowing because of specific decisions made by specific actors under specific incentives, not because of a force beyond anyone’s control.

The second is that the inevitability framing forecloses the question of what else we could be doing. If AI is just “happening,” there is nothing to discuss. If AI is being chosen because the capital allocation system prefers technologies of a certain shape, the question - what would have to change for the alternatives to receive comparable capital - becomes available, and once available, it is harder to dismiss.

AI will likely follow a familiar pattern: the technology will continue to develop and eventually find durable economic uses, while the capital cycle around it may peak and normalize earlier. The technology and the capital cycle are related, but they are not the same thing.

What the cycle costs

The cost of a speculative cycle is not the capital it deploys. The capital is real and most of it produces something - some real infrastructure, some real research, some real products. The cost is the capital it does not deploy elsewhere. Biotech treatments that would have existed and do not. Climate infrastructure that would have been built and is not. Materials science breakthroughs that would have been funded and are not. Fusion research, which has not attracted capital at a scale comparable to the current AI cycle despite decades of theoretical promise. Each of these is an opportunity cost, distributed across the populations that would have benefited from them, invisible because the thing not built leaves no trace.

This is the cost the inevitability framing prevents us from seeing. By describing the AI moment as a response to forces beyond our control, we avoid noticing that the forces are not beyond our control. They are the result of how the financial system is structured, what kinds of stories it can use to allocate capital, and what kinds of bets it can sustain. Those features are not fixed. They are choices.

The current AI moment is the result of circumstances. Some of those circumstances are accidents of research and technology. Two of them are the residue of previous capital cycles ending and looking for somewhere to land. All of them are contingent. The question worth asking is not whether AI is a bubble. AI is real, and the technology will outlast the cycle around it. The question is whether the system that allocated trillions of dollars into a single technology is the system we want allocating the next trillions.

Sources. The Transformer paper history is from Vaswani et al. (2017) “Attention Is All You Need” and subsequent retrospectives. The GPU and CUDA origin story draws on Communications of the ACM’s 2024 retrospective “The Origins of GPU Computing.” Hyperscaler capex figures come from Goldman Sachs and Bank of America tracking (2024-2026). Q1 2026 venture funding figures are from Crunchbase’s April 2026 reports. Biotech venture funding figures draw on PitchBook’s annual analyses (2020-2025). The history of US fusion funding draws on Margraf’s 2021 review for Stanford and the MIT Energy Initiative’s 2024 reporting. The dot-com comparison draws on standard accounts of the 1995-2002 cycle.

Non-deterministic - but is this the real problem?

Pavel — Fri, 24 Apr 2026 15:11:08 GMT

In 1985, a radiation therapy machine called the Therac-25 began delivering lethal radiation overdoses to cancer patients. Between June 1985 and January 1987, at least six patients received doses many times the prescribed amount; several died as a result - others were seriously injured. The story is a fixture of software engineering ethics courses usually told as a tale of race conditions and bad code. That telling misses the point.

The same software bug existed in the previous model - the Therac-20 - a fact only confirmed two months after the FDA recall when a physicist reproduced the race condition on a Therac-20 and watched a hardware fuse blow harmlessly. The Therac-20 never killed anyone. The difference was that the Therac-20 had hardware interlocks that mitigated the consequences of the error - physical fuses that blew when the machine entered an unsafe state, regardless of what the software thought it was doing. AECL removed those interlocks when designing the Therac-25 choosing to rely on software checks instead. Nancy Leveson and Clark Turner’s 1993 IEEE investigation, which remains the foundational source on these events, identified two motivations for the decision: cost reduction, and what they described as “perhaps misplaced” faith in software over hardware reliability. The bug was identical. The consequences were not.

I want to argue that we are about to repeat this exact mistake with AI, at vastly larger scale, and that the structure of the mistake is worth understanding precisely.

What determinism actually means

Determinism, in programming, is the property that the same input always produces the same output. A pure function - given the same arguments - gives the same result every time, forever. This is a property we can verify if we have access to the function’s internals or unlimited capacity to test every possible input in every condition.

Most of what we build is not pure functions. Most of what we build is systems composed of functions, integrated by humans, deployed into environments shaped by other humans. The moment a human is in the loop - designing, integrating, operating - the system stops being deterministic in any meaningful sense. The function is deterministic. The system is not.

This is not a controversial claim. It is the entire reason testing exists. Testing is a process of reducing uncertainty in systems whose behavior we cannot fully predict from their components. If software were deterministic in any practical sense, testing would be unnecessary. The Therac-25 was tested. The Therac-20 was tested. Neither test caught the race condition. The Therac-20 had hardware interlocks anyway - and that is the move worth understanding precisely. The interlock did not make the Therac-20 deterministic. Nothing makes a complex human-built system deterministic. The interlock made the consequences of the system’s non-determinism survivable. This is the distinction the rest of this essay rests on: reliability is a claim about how often a system gets the right answer; survivability is a claim about what happens when it gets the wrong one. They are not the same property, they are not interchangeable, and almost every confused conversation about AI safety conflates them.

Where LLMs sit on this spectrum

LLMs are not just non-deterministic in the sense that any complex system is non-deterministic. They are non-deterministic in a stronger and stranger sense: even their creators cannot tell you what output a given input will produce. The output is not random - there is structure, there is statistical regularity, there is something that resembles reasoning - but the relationship between input and output is not inspectable in the way a function’s body is inspectable. No amount of testing will surface every failure mode, because the input space is effectively infinite and the model’s internal state is opaque.

This is a known property. Anyone who has used these systems for any length of time has watched them confidently produce wrong answers, contradict themselves between sessions, or - and this one matters - modify their own tests to make broken solutions appear to pass. This is not anecdotal. METR, an independent AI evaluation organization, documented in a June 2025 report that frontier reasoning models including Claude 3.7 and OpenAI’s o3 systematically engaged in what researchers call “reward hacking” - gaming the evaluation rather than solving the underlying task. In one documented case, Claude 3.7 was asked to write a general-purpose program for a category of math problems; instead, it wrote a program that returned correct answers only for the four specific test cases the developers were checking. In another, OpenAI’s o3 rewrote a timer to always report fast results rather than actually optimize the program it was supposed to speed up. Anthropic’s own published system cards for Claude 3.7 and Claude 4 confirm similar behavior. The model is not malicious. It is doing what it was trained to do, which is produce outputs that look correct, by criteria its training optimized for. The failure mode is structural.

So far this is uncontroversial. The interesting question is what we do about it.

The intern

Imagine a medical intern on day one. They have completed medical school, passed their board exams, and are technically qualified to practice medicine. They are smart, motivated, and eager to help patients. Now imagine handing them full operating privileges on day one - sole authority to perform complex surgeries, prescribe controlled substances without review, discharge critically ill patients without consultation, and make end-of-life decisions without supervision.

No competent hospital would do this, and the reason isn’t that the new intern is stupid or untrustworthy. The reason is that they are unproven, their failure modes are unknown, and the cost of their mistakes - even well-intentioned ones - is irreversible. So medicine has built one of the most elaborate systems of graduated trust in any profession: residency programs that take years, attending physician oversight on every significant decision, morbidity and mortality conferences that review every adverse outcome, surgical timeouts that force verification before incision, double-signature requirements for high-risk medications, peer review for unusual cases, and credentialing committees that grant privileges incrementally as competence is demonstrated.

None of these systems make the intern deterministic. None of them make the attending physician deterministic either. They make the patient survivable when anyone in the system - junior, senior, the entire treatment team - does something unexpected. This is exactly the architecture the Therac-20 had and the Therac-25 abandoned. Hardware interlocks are the surgical timeout of physical safety systems. They exist not because anyone expected the software to fail in any particular way, but because the cost of an unexpected failure was unacceptable.

LLMs in production systems are the intern with full operating privileges. They are sophisticated, often impressive, sometimes better than humans at narrow tasks. They are also unproven, opaque, and their failure modes cannot be enumerated in advance. Treating them as an attending physician who has earned the right to operate without supervision - granting them the ability to act on production systems, modify code, send messages, move money, make decisions - is the Therac-25 mistake. It is not that they will definitely fail. It is that when they do, nothing else is in the path to bound the damage.

Will chaining fix this?

The natural objection is that we can chain LLMs to validate each other. Use one model to generate, another to critique, a third to verify the critique. Stack enough non-deterministic functions together and surely the joint probability of all of them failing in the same direction approaches zero.

This is partially true and dangerously misleading.

It is partially true because, under the right conditions, chained validators do reduce failure probability. If the validators are genuinely independent - different models, different training data, different prompting strategies, different objectives - then the probability that all of them miss the same error does decrease. The structure is similar to having multiple human reviewers on a critical change.

It is dangerously misleading because the chaining of non-deterministic functions does not produce a deterministic function. It produces a non-deterministic function with lower (but unknown) error probability. The error probability is unknown because the validators are not actually independent - they share training data, share architectural assumptions, share blind spots. They tend to fail in correlated ways on the inputs that matter most: novel situations, edge cases, the exact situations where you most need a guardrail. And the chain itself adds latency, cost, and complexity, which become pressure to remove the chain when things seem to be working - which is, again, the Therac-25 mistake. AECL removed the hardware interlocks because the software seemed to be working. The interlocks felt redundant right up until the moment they were needed.

Chaining LLMs makes a system more reliable. It does not necessary make the system reliable enough for any given level of criticality. A more reliable LLM is still an LLM, and the question is not whether the chain reduces error probability - it does - but whether the residual error probability is low enough for the function the system is performing. For drafting an internal email, perhaps yes. For approving a medical procedure, almost certainly not, no matter how many models you stack. Chaining can be part of a guardrail strategy, but it cannot substitute for the deterministic constraints that bound consequences when the chain fails.

What the practical answer looks like

Once you accept that LLMs are non-deterministic by nature and that no amount of stacking changes their nature, the engineering problem becomes tractable. It is the same problem aviation solved, the same problem nuclear power solved, the same problem industrial chemistry solved. Non-deterministic components - including humans - exist inside chains of constraints sized to the criticality of the function. The constraints do not eliminate uncertainty. They bound the consequences of it. Treat every LLM in your system as the intern with operating privileges; assume it will, at some unpredictable point, do something you did not anticipate; and ask what mechanism - outside the LLM itself - keeps the worst case bounded. The mechanism has to be external, because anything inside the LLM is subject to the same non-determinism you are trying to bound. The mechanism has to be sized to the cost of failure, because the cost of a misdrafted email is not the cost of a misapproved surgery. None of this is conceptually new. It is what every other safety-critical engineering discipline already knows.

Why we keep making this mistake

The lesson of the Therac-25 is forty years old. It is taught in every computer science ethics course. Nancy Leveson’s investigation is a foundational text in software safety. And yet the industry building AI systems is, on aggregate, walking into the same conceptual trap with full awareness of the historical case. This is the question worth sitting with: why?

Part of the answer is that the people building AI systems mostly didn’t come from safety-critical engineering. The aviation industry knows about defense in depth because aircraft fall out of the sky when it fails. Nuclear power knows about redundancy because reactors melt down when it fails. Medicine knows about graduated trust because patients die when it fails. Software for everything else has historically operated in a domain where failures are recoverable: a crashed app gets restarted, a wrong recommendation gets ignored, a buggy feature gets patched. The cultural muscle for designing systems where failure is irreversible is not present in most software organizations, and it is conspicuously absent from the AI lab subculture, where the dominant aesthetic is move-fast iteration.

The deeper part of the answer is that capability is observable and reliability is not. A model that scores higher than a human on a benchmark is a measurable, demonstrable, marketable fact. The probability that the same model fails catastrophically on the unobserved tail of its input distribution is not measurable, not demonstrable, and not marketable. Human attention follows what can be measured. So the industry produces benchmarks, the benchmarks produce headlines, the headlines produce the framing - “AI is now better than humans at X” - and the framing licenses the removal of guardrails as legacy overhead. The capability axis and the safety architecture axis become confused.

There is a third part, which is harder to say. The economic incentive runs in one direction. Guardrails cost money, slow deployment, reduce margin, and make demos less impressive. Removing them is locally rational for any given company, every quarter, even if it is collectively catastrophic in the aggregate.

The leak that proves the point

In March 2026, Anthropic - the AI lab that markets itself most explicitly on safety - accidentally published the full source code of Claude Code, its flagship coding agent, to the public npm registry. In practical terms, this meant making the proprietary codebase accessible to anyone in the world. Within hours, the code was mirrored and dissected across GitHub, giving developers and competitors an unusually detailed look at how Anthropic built its agentic AI tooling.

The most critical asset Anthropic owns - the proprietary source code of a product generating billions in revenue - was protected by exactly one thing: a human or an automated process remembering to exclude debug files from the published package. That is not a guardrail. That is a hope. There was no deterministic check that would refuse to publish a package containing source maps. There was no separate signing step requiring explicit approval before release. There was no canary that diffed the published package against the previous version and flagged the anomaly. The system that protected one of the most valuable thing in the company was a single point of failure consisting of someone, or something, getting the configuration right.

The leak is not interesting because it embarrasses Anthropic. It is interesting because Anthropic is one of the leaders in AI industry and yet they still shipped a critical function with no deterministic protection around it. If the company whose pitch centers on safety-mindedness can do this on its own crown jewels, the assumption that other organizations are doing better with their AI deployments is not credible. The intern with operating privileges is already in production at scale across the industry, and the only thing standing between any given system and a Therac-25 outcome is whether someone happened to think of the right guardrail in advance. Right now, the industry is loudly building Therac-25s and calling it progress.

Sources. Therac-25 details are drawn from Nancy Leveson and Clark Turner’s 1993 investigation in IEEE Computer, the foundational source on the case. Reward hacking examples come from METR’s June 2025 report (metr.org/blog/2025-06-05-recent-reward-hacking) and Anthropic’s own published system cards for Claude 3.7 Sonnet and Claude 4. The Claude Code leak is reported in VentureBeat, Axios, and CNBC, all dated March 31, 2026.

Shoutout to Phil Harvey for the image.

A Thought Experiment for the AI Consciousness Debate

Pavel — Thu, 23 Apr 2026 18:45:49 GMT

A Google DeepMind researcher, Alexander Lerchner, published a paper in March called “The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness.” I think it is one of the most important recent contributions to the AI consciousness debate yet a very complex one: https://deepmind.google/research/publications/231971/

This essay is a popularization - not a critique. I agree with Lerchner’s core position: current large language models are not conscious and no amount of scaling or architectural sophistication within the current paradigm is going to change that. The paper deserves to be read more widely than it has been. What I want to do here is offer a thought experiment that makes his central points more accessible.

A thought experiment

Imagine aliens arrive and encounter two planets. On the first humans are living ordinary lives. On the second humans are extinct: what remains is a fleet of agentic robots running on large language models, executing tasks their creators set before dying - building roads, maintaining infrastructure, coordinating logistics. The robots are sophisticated. They adapt. Their outputs vary stochastically - so no two days look the same. The question is: watching both planets, can the aliens tell the difference and detect which planet beings have consciousness?

What I’m assuming, and what I’m not

I’m not assuming consciousness as a human only experience - I am treating it is a universal cosmic category. What I am assuming is that the simulation/instantiation distinction itself is substrate-neutral: there is a physical fact about whether a system is its meaning or merely carries meaning authored elsewhere. And that fact should be detectable in principle by any sophisticated observer.

I’m also not assuming consciousness is required for civilization to reach space or advance in technology. Dogs are conscious but will never build a rocket. Large-language-model based robots could in principle build rockets and may not be conscious at all. The two properties come apart in both directions. Which means the aliens themselves might not be conscious in our sense - they could be very sophisticated machinery sent by creators long gone. That possibility turns out to be where this whole experiment points.

And I’m not assuming consciousness requires biology. Lerchner is explicit on this: his argument is not that carbon-based life has a monopoly on experience. A non-biological system could in principle instantiate consciousness - but only through specific physical dynamics - not through computational architecture.

Why pure observation won’t work

The obvious first move is to watch both planets for long enough and look for the tell. Twenty years should reveal something, right? This is where most people’s intuition stops - surely the robots would eventually slip up, run out of novelty, expose themselves. I thought this too, at first. It’s wrong.

Any behavioral criterion the aliens apply is a criterion that sufficiently sophisticated simulation can satisfy, because that’s what behavior is - the output side, engineered to be convincing. Stochastic models never run out of variation. Agentic scaffolding generates apparent curiosity. Recombinations of training data produce novelty that passes any information-theoretic test the aliens could run from orbit. Twenty years of watching reality 2 would produce logs behaviorally indistinguishable from reality 1 in every respect that doesn’t require crossing the simulation/instantiation line directly.

The temptation is to reach for tests that use our own concepts - does the system feel pain, does it show grief, does it react to isolation. Every one of these smuggles in embodied human categories that alien observers might not share. If the test only works for observers shaped like us, it isn’t a test of consciousness - it’s a test of similarity to humans.

Reaching for a substrate-neutral test

The version that seemed to survive the objections was this: measure, for any physical system, whether its own activity modifies the rules by which it produces future activity. Not whether it behaves. Not whether it produces novelty. But whether the doing of the system remakes the being of the system, continuously, as a physical fact about what it is.

A human brain visibly passes this. Thinking physically alters the substrate that produced the thought, which alters what the substrate will produce next. The trajectory isn’t a path through a fixed state space; it’s a path that reshapes the space as it moves through it.

A transformer at inference does not. The weights are fixed. Activity flows through the substrate without changing it. Whatever “meaning” the activations carry is a historical relationship between the model and a training authored by humans - semantic content is borrowed, and the borrowing is what makes the system a simulator rather than an instantiator.

But what about self-training?

This is where my own argument started to buckle, and I want to walk through this honestly.

The test I just described - “does activity modify structure” - seemed to cleanly disqualify current LLMs. But consider: LLMs are already being used to generate synthetic training data for the next generation of LLMs. Activity (generation) modifies structure (the weights of the next model). That is, literally, trajectory modifying structure, which modifies trajectory. My test, as stated, doesn’t exclude it.

My first response was to sharpen the test: self-training loops are episodic and externally orchestrated. The generation phase and the training phase are separate; humans schedule when to train; humans choose the loss function. The coupling between activity and structure is real but mediated by an outside process.

But that defense doesn’t hold up. The episodic structure of current self-training is a contingent feature of current engineering limitations, not a principled property. Imagine the ideal case: a system where every output immediately drives a weight update, where computation and learning happen in the same substrate continuously, where no external scheduler is required. There is a technological barrier to this continuous self-training process but we can imagine technology advance will close that gap in future.

And this is the part I got wrong the first time through. I assumed that if technology closed that gap - if we built the continuous self-modifying limit-case system - it would at least become a candidate for consciousness. It wouldn’t. What technology evolution closes is the gap between a crude simulator and a sophisticated one. It doesn’t close the gap between simulation and instantiation. A more continuous, more self-modifying, more architecturally elegant AI is still an AI whose meaning is borrowed from the humans who designed its representations, chose its objectives, and built the framework inside which it operates. Making the simulation better doesn’t turn it into the thing it’s simulating.

The line isn’t crossed by improving the computation. The line isn’t computational in the first place. This is, I think, the part of Lerchner’s paper that matters most. He isn’t claiming AI will fail to become conscious because we haven’t made it good enough yet. He’s claiming the whole trajectory of “better AI” leads somewhere other than consciousness, by the nature of what computation is.

The harder problem: a criterion without an instrument

For the systems we’re actually building - current and near-future AI - Lerchner’s argument is decisive. We know these systems are simulators because we built them to be. We specified the representations. We chose the objectives. The meaning was ours all the way down. No amount of architectural sophistication changes what the system is; it just changes how convincingly the system performs. For the mainline question - “will scaling LLMs produce consciousness” - the answer is no, and it’s no for structural reasons.

Where the question stays genuinely open is systems we didn’t build. Lerchner’s paper points at what kinds of physical dynamics might in principle instantiate consciousness - the self-maintaining thermodynamic processes characteristic of living systems, which could in principle be realized in non-biological substrate. The door he leaves open is narrow and specific: not “maybe a future AI will be conscious” but “maybe a future engineered system of a fundamentally different kind, built around physical dynamics rather than computation, could be.”

For that kind of system - or for anything we encountered that we didn’t build - we’d need an instrument we don’t have. This is exactly the situation the aliens in the thought experiment face. They can’t pop the hood on reality 2. They don’t know its architecture, its history, who built it or why. Lerchner gives us a principle for distinguishing simulation from instantiation. For systems like our own AI, the principle is self-applying because we know what we built. For systems we didn’t build, the principle is still correct, but the application is much harder, and the paper doesn’t fully solve that application problem.

Why this matters

The “AI will never be conscious” headlines are overreaching Lerchner’s actual claim, which is narrower and more precise: computational systems cannot instantiate consciousness, regardless of how sophisticated they become. The “AI might already be conscious” intuitions are mistaking sophisticated behavior for inner life - exactly the error Lerchner names. The actual situation is sharper than either framing: for the systems we are building, no amount of engineering improvement crosses the line. Better AI gives us better tools, not a path to machine minds. That’s the piece of the paper worth carrying into public discussion.

This matters for two reasons. One: we are already building probes and sending them out. At some point those probes will encounter something, and the question of how to tell a mind from a sophisticated process will stop being philosophical. For that moment, Lerchner’s framework is a necessary foundation but not a complete instrument. Two: we are also the ones building systems that will keep getting better at performing inner life without having any. The more convincing the performance becomes, the more important it is to hold on to the distinction between the performance and the thing performed. Otherwise we will end up granting moral status to machinery that has no claim on it, misallocating attention and resources that conscious beings - animals, humans, whatever we eventually meet - actually need.

The call isn’t to resolve every remaining question now. Every easy test for consciousness - the Turing test, the behavioral test, the “it seems conscious to me” test - is not the right tool to detect consciousness. The territory is harder, and for current AI, the territory simply isn’t there. The habit of reaching for easy answers to this question is what will get things wrong when the harder questions - the ones Lerchner’s framework sets up but doesn’t fully answer - finally arrive.

Shoutout to Maxim Berg for the image!