A Thought Experiment for the AI Consciousness Debate

Why watching won’t tell you

Apr 23, 2026

A Google DeepMind researcher, Alexander Lerchner, published a paper in March called “The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness.” I think it is one of the most important recent contributions to the AI consciousness debate yet a very complex one: https://deepmind.google/research/publications/231971/

This essay is a popularization - not a critique. I agree with Lerchner’s core position: current large language models are not conscious and no amount of scaling or architectural sophistication within the current paradigm is going to change that. The paper deserves to be read more widely than it has been. What I want to do here is offer a thought experiment that makes his central points more accessible.

A thought experiment

Imagine aliens arrive and encounter two planets. On the first humans are living ordinary lives. On the second humans are extinct: what remains is a fleet of agentic robots running on large language models, executing tasks their creators set before dying - building roads, maintaining infrastructure, coordinating logistics. The robots are sophisticated. They adapt. Their outputs vary stochastically - so no two days look the same. The question is: watching both planets, can the aliens tell the difference and detect which planet beings have consciousness?

What I’m assuming, and what I’m not

I’m not assuming consciousness as a human only experience - I am treating it is a universal cosmic category. What I am assuming is that the simulation/instantiation distinction itself is substrate-neutral: there is a physical fact about whether a system is its meaning or merely carries meaning authored elsewhere. And that fact should be detectable in principle by any sophisticated observer.

I’m also not assuming consciousness is required for civilization to reach space or advance in technology. Dogs are conscious but will never build a rocket. Large-language-model based robots could in principle build rockets and may not be conscious at all. The two properties come apart in both directions. Which means the aliens themselves might not be conscious in our sense - they could be very sophisticated machinery sent by creators long gone. That possibility turns out to be where this whole experiment points.

And I’m not assuming consciousness requires biology. Lerchner is explicit on this: his argument is not that carbon-based life has a monopoly on experience. A non-biological system could in principle instantiate consciousness - but only through specific physical dynamics - not through computational architecture.

Why pure observation won’t work

The obvious first move is to watch both planets for long enough and look for the tell. Twenty years should reveal something, right? This is where most people’s intuition stops - surely the robots would eventually slip up, run out of novelty, expose themselves. I thought this too, at first. It’s wrong.

Any behavioral criterion the aliens apply is a criterion that sufficiently sophisticated simulation can satisfy, because that’s what behavior is - the output side, engineered to be convincing. Stochastic models never run out of variation. Agentic scaffolding generates apparent curiosity. Recombinations of training data produce novelty that passes any information-theoretic test the aliens could run from orbit. Twenty years of watching reality 2 would produce logs behaviorally indistinguishable from reality 1 in every respect that doesn’t require crossing the simulation/instantiation line directly.

The temptation is to reach for tests that use our own concepts - does the system feel pain, does it show grief, does it react to isolation. Every one of these smuggles in embodied human categories that alien observers might not share. If the test only works for observers shaped like us, it isn’t a test of consciousness - it’s a test of similarity to humans.

Reaching for a substrate-neutral test

The version that seemed to survive the objections was this: measure, for any physical system, whether its own activity modifies the rules by which it produces future activity. Not whether it behaves. Not whether it produces novelty. But whether the doing of the system remakes the being of the system, continuously, as a physical fact about what it is.

A human brain visibly passes this. Thinking physically alters the substrate that produced the thought, which alters what the substrate will produce next. The trajectory isn’t a path through a fixed state space; it’s a path that reshapes the space as it moves through it.

A transformer at inference does not. The weights are fixed. Activity flows through the substrate without changing it. Whatever “meaning” the activations carry is a historical relationship between the model and a training authored by humans - semantic content is borrowed, and the borrowing is what makes the system a simulator rather than an instantiator.

But what about self-training?

This is where my own argument started to buckle, and I want to walk through this honestly.

The test I just described - “does activity modify structure” - seemed to cleanly disqualify current LLMs. But consider: LLMs are already being used to generate synthetic training data for the next generation of LLMs. Activity (generation) modifies structure (the weights of the next model). That is, literally, trajectory modifying structure, which modifies trajectory. My test, as stated, doesn’t exclude it.

My first response was to sharpen the test: self-training loops are episodic and externally orchestrated. The generation phase and the training phase are separate; humans schedule when to train; humans choose the loss function. The coupling between activity and structure is real but mediated by an outside process.

But that defense doesn’t hold up. The episodic structure of current self-training is a contingent feature of current engineering limitations, not a principled property. Imagine the ideal case: a system where every output immediately drives a weight update, where computation and learning happen in the same substrate continuously, where no external scheduler is required. There is a technological barrier to this continuous self-training process but we can imagine technology advance will close that gap in future.

And this is the part I got wrong the first time through. I assumed that if technology closed that gap - if we built the continuous self-modifying limit-case system - it would at least become a candidate for consciousness. It wouldn’t. What technology evolution closes is the gap between a crude simulator and a sophisticated one. It doesn’t close the gap between simulation and instantiation. A more continuous, more self-modifying, more architecturally elegant AI is still an AI whose meaning is borrowed from the humans who designed its representations, chose its objectives, and built the framework inside which it operates. Making the simulation better doesn’t turn it into the thing it’s simulating.

The line isn’t crossed by improving the computation. The line isn’t computational in the first place. This is, I think, the part of Lerchner’s paper that matters most. He isn’t claiming AI will fail to become conscious because we haven’t made it good enough yet. He’s claiming the whole trajectory of “better AI” leads somewhere other than consciousness, by the nature of what computation is.

The harder problem: a criterion without an instrument

For the systems we’re actually building - current and near-future AI - Lerchner’s argument is decisive. We know these systems are simulators because we built them to be. We specified the representations. We chose the objectives. The meaning was ours all the way down. No amount of architectural sophistication changes what the system is; it just changes how convincingly the system performs. For the mainline question - “will scaling LLMs produce consciousness” - the answer is no, and it’s no for structural reasons.

Where the question stays genuinely open is systems we didn’t build. Lerchner’s paper points at what kinds of physical dynamics might in principle instantiate consciousness - the self-maintaining thermodynamic processes characteristic of living systems, which could in principle be realized in non-biological substrate. The door he leaves open is narrow and specific: not “maybe a future AI will be conscious” but “maybe a future engineered system of a fundamentally different kind, built around physical dynamics rather than computation, could be.”

For that kind of system - or for anything we encountered that we didn’t build - we’d need an instrument we don’t have. This is exactly the situation the aliens in the thought experiment face. They can’t pop the hood on reality 2. They don’t know its architecture, its history, who built it or why. Lerchner gives us a principle for distinguishing simulation from instantiation. For systems like our own AI, the principle is self-applying because we know what we built. For systems we didn’t build, the principle is still correct, but the application is much harder, and the paper doesn’t fully solve that application problem.

Why this matters

The “AI will never be conscious” headlines are overreaching Lerchner’s actual claim, which is narrower and more precise: computational systems cannot instantiate consciousness, regardless of how sophisticated they become. The “AI might already be conscious” intuitions are mistaking sophisticated behavior for inner life - exactly the error Lerchner names. The actual situation is sharper than either framing: for the systems we are building, no amount of engineering improvement crosses the line. Better AI gives us better tools, not a path to machine minds. That’s the piece of the paper worth carrying into public discussion.

This matters for two reasons. One: we are already building probes and sending them out. At some point those probes will encounter something, and the question of how to tell a mind from a sophisticated process will stop being philosophical. For that moment, Lerchner’s framework is a necessary foundation but not a complete instrument. Two: we are also the ones building systems that will keep getting better at performing inner life without having any. The more convincing the performance becomes, the more important it is to hold on to the distinction between the performance and the thing performed. Otherwise we will end up granting moral status to machinery that has no claim on it, misallocating attention and resources that conscious beings - animals, humans, whatever we eventually meet - actually need.

The call isn’t to resolve every remaining question now. Every easy test for consciousness - the Turing test, the behavioral test, the “it seems conscious to me” test - is not the right tool to detect consciousness. The territory is harder, and for current AI, the territory simply isn’t there. The habit of reaching for easy answers to this question is what will get things wrong when the harder questions - the ones Lerchner’s framework sets up but doesn’t fully answer - finally arrive.

Shoutout to Maxim Berg for the image!

UnderExplained AI

Comments

Ready for more?