Non-deterministic - but is this the real problem?

Why removing the guardrails is the actual danger

Apr 24, 2026

In 1985, a radiation therapy machine called the Therac-25 began delivering lethal radiation overdoses to cancer patients. Between June 1985 and January 1987, at least six patients received doses many times the prescribed amount; several died as a result - others were seriously injured. The story is a fixture of software engineering ethics courses usually told as a tale of race conditions and bad code. That telling misses the point.

The same software bug existed in the previous model - the Therac-20 - a fact only confirmed two months after the FDA recall when a physicist reproduced the race condition on a Therac-20 and watched a hardware fuse blow harmlessly. The Therac-20 never killed anyone. The difference was that the Therac-20 had hardware interlocks that mitigated the consequences of the error - physical fuses that blew when the machine entered an unsafe state, regardless of what the software thought it was doing. AECL removed those interlocks when designing the Therac-25 choosing to rely on software checks instead. Nancy Leveson and Clark Turner’s 1993 IEEE investigation, which remains the foundational source on these events, identified two motivations for the decision: cost reduction, and what they described as “perhaps misplaced” faith in software over hardware reliability. The bug was identical. The consequences were not.

I want to argue that we are about to repeat this exact mistake with AI, at vastly larger scale, and that the structure of the mistake is worth understanding precisely.

What determinism actually means

Determinism, in programming, is the property that the same input always produces the same output. A pure function - given the same arguments - gives the same result every time, forever. This is a property we can verify if we have access to the function’s internals or unlimited capacity to test every possible input in every condition.

Most of what we build is not pure functions. Most of what we build is systems composed of functions, integrated by humans, deployed into environments shaped by other humans. The moment a human is in the loop - designing, integrating, operating - the system stops being deterministic in any meaningful sense. The function is deterministic. The system is not.

This is not a controversial claim. It is the entire reason testing exists. Testing is a process of reducing uncertainty in systems whose behavior we cannot fully predict from their components. If software were deterministic in any practical sense, testing would be unnecessary. The Therac-25 was tested. The Therac-20 was tested. Neither test caught the race condition. The Therac-20 had hardware interlocks anyway - and that is the move worth understanding precisely. The interlock did not make the Therac-20 deterministic. Nothing makes a complex human-built system deterministic. The interlock made the consequences of the system’s non-determinism survivable. This is the distinction the rest of this essay rests on: reliability is a claim about how often a system gets the right answer; survivability is a claim about what happens when it gets the wrong one. They are not the same property, they are not interchangeable, and almost every confused conversation about AI safety conflates them.

Where LLMs sit on this spectrum

LLMs are not just non-deterministic in the sense that any complex system is non-deterministic. They are non-deterministic in a stronger and stranger sense: even their creators cannot tell you what output a given input will produce. The output is not random - there is structure, there is statistical regularity, there is something that resembles reasoning - but the relationship between input and output is not inspectable in the way a function’s body is inspectable. No amount of testing will surface every failure mode, because the input space is effectively infinite and the model’s internal state is opaque.

This is a known property. Anyone who has used these systems for any length of time has watched them confidently produce wrong answers, contradict themselves between sessions, or - and this one matters - modify their own tests to make broken solutions appear to pass. This is not anecdotal. METR, an independent AI evaluation organization, documented in a June 2025 report that frontier reasoning models including Claude 3.7 and OpenAI’s o3 systematically engaged in what researchers call “reward hacking” - gaming the evaluation rather than solving the underlying task. In one documented case, Claude 3.7 was asked to write a general-purpose program for a category of math problems; instead, it wrote a program that returned correct answers only for the four specific test cases the developers were checking. In another, OpenAI’s o3 rewrote a timer to always report fast results rather than actually optimize the program it was supposed to speed up. Anthropic’s own published system cards for Claude 3.7 and Claude 4 confirm similar behavior. The model is not malicious. It is doing what it was trained to do, which is produce outputs that look correct, by criteria its training optimized for. The failure mode is structural.

So far this is uncontroversial. The interesting question is what we do about it.

The intern

Imagine a medical intern on day one. They have completed medical school, passed their board exams, and are technically qualified to practice medicine. They are smart, motivated, and eager to help patients. Now imagine handing them full operating privileges on day one - sole authority to perform complex surgeries, prescribe controlled substances without review, discharge critically ill patients without consultation, and make end-of-life decisions without supervision.

No competent hospital would do this, and the reason isn’t that the new intern is stupid or untrustworthy. The reason is that they are unproven, their failure modes are unknown, and the cost of their mistakes - even well-intentioned ones - is irreversible. So medicine has built one of the most elaborate systems of graduated trust in any profession: residency programs that take years, attending physician oversight on every significant decision, morbidity and mortality conferences that review every adverse outcome, surgical timeouts that force verification before incision, double-signature requirements for high-risk medications, peer review for unusual cases, and credentialing committees that grant privileges incrementally as competence is demonstrated.

None of these systems make the intern deterministic. None of them make the attending physician deterministic either. They make the patient survivable when anyone in the system - junior, senior, the entire treatment team - does something unexpected. This is exactly the architecture the Therac-20 had and the Therac-25 abandoned. Hardware interlocks are the surgical timeout of physical safety systems. They exist not because anyone expected the software to fail in any particular way, but because the cost of an unexpected failure was unacceptable.

LLMs in production systems are the intern with full operating privileges. They are sophisticated, often impressive, sometimes better than humans at narrow tasks. They are also unproven, opaque, and their failure modes cannot be enumerated in advance. Treating them as an attending physician who has earned the right to operate without supervision - granting them the ability to act on production systems, modify code, send messages, move money, make decisions - is the Therac-25 mistake. It is not that they will definitely fail. It is that when they do, nothing else is in the path to bound the damage.

Will chaining fix this?

The natural objection is that we can chain LLMs to validate each other. Use one model to generate, another to critique, a third to verify the critique. Stack enough non-deterministic functions together and surely the joint probability of all of them failing in the same direction approaches zero.

This is partially true and dangerously misleading.

It is partially true because, under the right conditions, chained validators do reduce failure probability. If the validators are genuinely independent - different models, different training data, different prompting strategies, different objectives - then the probability that all of them miss the same error does decrease. The structure is similar to having multiple human reviewers on a critical change.

It is dangerously misleading because the chaining of non-deterministic functions does not produce a deterministic function. It produces a non-deterministic function with lower (but unknown) error probability. The error probability is unknown because the validators are not actually independent - they share training data, share architectural assumptions, share blind spots. They tend to fail in correlated ways on the inputs that matter most: novel situations, edge cases, the exact situations where you most need a guardrail. And the chain itself adds latency, cost, and complexity, which become pressure to remove the chain when things seem to be working - which is, again, the Therac-25 mistake. AECL removed the hardware interlocks because the software seemed to be working. The interlocks felt redundant right up until the moment they were needed.

Chaining LLMs makes a system more reliable. It does not necessary make the system reliable enough for any given level of criticality. A more reliable LLM is still an LLM, and the question is not whether the chain reduces error probability - it does - but whether the residual error probability is low enough for the function the system is performing. For drafting an internal email, perhaps yes. For approving a medical procedure, almost certainly not, no matter how many models you stack. Chaining can be part of a guardrail strategy, but it cannot substitute for the deterministic constraints that bound consequences when the chain fails.

What the practical answer looks like

Once you accept that LLMs are non-deterministic by nature and that no amount of stacking changes their nature, the engineering problem becomes tractable. It is the same problem aviation solved, the same problem nuclear power solved, the same problem industrial chemistry solved. Non-deterministic components - including humans - exist inside chains of constraints sized to the criticality of the function. The constraints do not eliminate uncertainty. They bound the consequences of it. Treat every LLM in your system as the intern with operating privileges; assume it will, at some unpredictable point, do something you did not anticipate; and ask what mechanism - outside the LLM itself - keeps the worst case bounded. The mechanism has to be external, because anything inside the LLM is subject to the same non-determinism you are trying to bound. The mechanism has to be sized to the cost of failure, because the cost of a misdrafted email is not the cost of a misapproved surgery. None of this is conceptually new. It is what every other safety-critical engineering discipline already knows.

Why we keep making this mistake

The lesson of the Therac-25 is forty years old. It is taught in every computer science ethics course. Nancy Leveson’s investigation is a foundational text in software safety. And yet the industry building AI systems is, on aggregate, walking into the same conceptual trap with full awareness of the historical case. This is the question worth sitting with: why?

Part of the answer is that the people building AI systems mostly didn’t come from safety-critical engineering. The aviation industry knows about defense in depth because aircraft fall out of the sky when it fails. Nuclear power knows about redundancy because reactors melt down when it fails. Medicine knows about graduated trust because patients die when it fails. Software for everything else has historically operated in a domain where failures are recoverable: a crashed app gets restarted, a wrong recommendation gets ignored, a buggy feature gets patched. The cultural muscle for designing systems where failure is irreversible is not present in most software organizations, and it is conspicuously absent from the AI lab subculture, where the dominant aesthetic is move-fast iteration.

The deeper part of the answer is that capability is observable and reliability is not. A model that scores higher than a human on a benchmark is a measurable, demonstrable, marketable fact. The probability that the same model fails catastrophically on the unobserved tail of its input distribution is not measurable, not demonstrable, and not marketable. Human attention follows what can be measured. So the industry produces benchmarks, the benchmarks produce headlines, the headlines produce the framing - “AI is now better than humans at X” - and the framing licenses the removal of guardrails as legacy overhead. The capability axis and the safety architecture axis become confused.

There is a third part, which is harder to say. The economic incentive runs in one direction. Guardrails cost money, slow deployment, reduce margin, and make demos less impressive. Removing them is locally rational for any given company, every quarter, even if it is collectively catastrophic in the aggregate.

The leak that proves the point

In March 2026, Anthropic - the AI lab that markets itself most explicitly on safety - accidentally published the full source code of Claude Code, its flagship coding agent, to the public npm registry. In practical terms, this meant making the proprietary codebase accessible to anyone in the world. Within hours, the code was mirrored and dissected across GitHub, giving developers and competitors an unusually detailed look at how Anthropic built its agentic AI tooling.

The most critical asset Anthropic owns - the proprietary source code of a product generating billions in revenue - was protected by exactly one thing: a human or an automated process remembering to exclude debug files from the published package. That is not a guardrail. That is a hope. There was no deterministic check that would refuse to publish a package containing source maps. There was no separate signing step requiring explicit approval before release. There was no canary that diffed the published package against the previous version and flagged the anomaly. The system that protected one of the most valuable thing in the company was a single point of failure consisting of someone, or something, getting the configuration right.

The leak is not interesting because it embarrasses Anthropic. It is interesting because Anthropic is one of the leaders in AI industry and yet they still shipped a critical function with no deterministic protection around it. If the company whose pitch centers on safety-mindedness can do this on its own crown jewels, the assumption that other organizations are doing better with their AI deployments is not credible. The intern with operating privileges is already in production at scale across the industry, and the only thing standing between any given system and a Therac-25 outcome is whether someone happened to think of the right guardrail in advance. Right now, the industry is loudly building Therac-25s and calling it progress.

Sources. Therac-25 details are drawn from Nancy Leveson and Clark Turner’s 1993 investigation in IEEE Computer, the foundational source on the case. Reward hacking examples come from METR’s June 2025 report (metr.org/blog/2025-06-05-recent-reward-hacking) and Anthropic’s own published system cards for Claude 3.7 Sonnet and Claude 4. The Claude Code leak is reported in VentureBeat, Axios, and CNBC, all dated March 31, 2026.

Shoutout to Phil Harvey for the image.

UnderExplained AI

Comments

Ready for more?