Digestive Post Vol.8: The Illusion of the Illusion of the Illusion of Thinking

Or "fnrrgh

Jun 14, 2025

fasting sun moon isometric icon vector illustration 39207003 Vector Art at Vecteezy

For those of you we’ve welcomed recently to Heir, allow me to introduce one of the blog’s prematurely venerable traditions — the Digestive Post. The Digestives are glorified hot-takes, essentially, drawn from recent life and reading and occasionally even culled from my responses to other Substack writers’ output, intended to clear the gut of the mind of any half-broken-down/assimilated material and, yes, perhaps make way for the heir to the thought.

Today, something that ought to make you feel all Hypothesis dood; wat nou? — thinking about whether or not things that plainly can’t think can think, and thinking thereafter about what might need to be done to allow them to think.

Looks like we’re an AI blog again.

“Are You Thinking Yet?”

"Large reasoning models hit cognitive walls at scale" has been an emerging axiom in AI research. When Apple brought out "The Illusion of Thinking", it seemed to really rubber-stamp the idea—the paper showed models collapsing to zero accuracy on planning puzzles beyond certain complexity thresholds, suggesting fundamental limits to AI reasoning.

Well, it turns out the illusion might have been an illusion, at least partly (and furthermore to an extent that, many of the best-paid professionals in the world will agree, is very much usable). According to a response by Opus and Lawsen, LLMs aren't failing to reason—they're just running out of tokens.

Their follow-up study, the superbly-titled "The Illusion of the Illusion of Thinking" reveals that Apple's findings were at least partly measurement artefacts, not hard evidence of cognitive limits. But whilst this new research effectively dismantles the original claims, it opens up far more complex questions about what we're actually measuring when we evaluate AI reasoning—and whether we have the conceptual tools to answer them at all.

For me it’s an old case, one I first made when I had no real technical conception of AIs as a work of meaningful technical architecture; certainly much less of a conception than I have now. I made it with confidence then, as I do now, for one reason and one reason only, which is that you don’t need to know very much about AI to know that we are still far, far away from ‘AGI’.

You just need to know how little most AI researchers know about organic intelligence and neuroscience. They have no interest in the system they are trying to replicate. And, when the subject of that attempted replication is the most complex self-contained system in the known universe, impenetrable to, among other things, itself, a complete lack of interest in its inner workings is something of a handicap.

The Original Claims: When AI Reasoning Appeared to Collapse

Shojaee et al.'s "The Illusion of Thinking" made waves by demonstrating what appeared to be hard limits in Large Reasoning Models (LRMs). The researchers subjected models to systematic evaluation across four planning puzzles: Tower of Hanoi, River Crossing, Blocks World, and Grid Navigation. Their methodology was elegant in its simplicity—steadily increase complexity whilst tracking accuracy and token usage.

The results seemed damning. Models that performed admirably on simpler instances collapsed to zero accuracy as complexity increased. Tower of Hanoi problems with 15+ disks became insurmountable. River Crossing puzzles with more than a handful of actors resulted in complete failure. The authors concluded they had identified fundamental cognitive boundaries—that reasoning itself, not just computational resources, had hit hard limits. Somewhat hubristic but, at least as far the evidence could suggest (so long, of course, as you kept that evidence in hermetic isolation and did not presume against its underlying principles), convincing.

This finding resonated far and wide because it aligned with broader concerns about, and frustrations with the increasingly evident limits of, AI capabilities. Despite impressive advances in language understanding and generation, critics have always been able to take dependable refuge in the argument that large language models were just sophisticated pattern matchers rather than genuine reasoners. The stochastic parrot has grown much larger, sprouted flame on the end of its tail and learned to spit lightning bolts at objects of its wrath, but it remains true to its genealogy. The "Illusion of Thinking" appeared to provide empirical evidence for this chauvinism, giving an empirical basis to the commonly observable pattern of behaviour that when problems became sufficiently complex to require genuine reasoning rather than pattern recognition, these models simply couldn't cope.

The Rebuttal: Measurement Artefacts Masquerading as Cognitive Limits

The response study by Opus and Lawsen does something to dismantle the non-reasoning narrative through careful analysis of the original experimental design. Their findings reveal three critical methodological flaws that together created the appearance of reasoning collapse where, it now seems more likely, evidence of neither reasoning success nor, per se, reasoning failure really exist.

The Token Budget Problem

Perhaps most tellingly, the researchers discovered that models were explicitly acknowledging their constraints. When attempting Tower of Hanoi problems, models would output statements like "The pattern continues, but to avoid making this too long, I'll stop here." This wasn't reasoning failure—it was strategic decision-making about output length.

The mathematical relationship here is straightforward. Tower of Hanoi solutions require 2^n - 1 moves. If each move requires approximately 5 tokens to describe in the format demanded by the original study, token requirements grow exponentially. For a 15-disk problem, this translates to roughly 163,840 tokens just for the move sequence—well beyond the context windows of most models tested.

The authors provide precise calculations showing that reported "collapse" points align perfectly with token budget constraints:

Claude-3.7-Sonnet and DeepSeek-R1 (64,000 token budgets): maximum ~15-16 disk problems
o3-mini (100,000 token budget): maximum ~17-18 disk problems

This isn't cognitive limitation—it's arithmetic inevitability.

The Impossible Puzzle Trap

Even more concerning was the discovery that some River Crossing benchmarks used in the reasoning evaluation included mathematically impossible instances. Problems with n actors using boats of capacity k < n/2 have no solution—this is a well-established mathematical axiom. Yet the original evaluation framework scored models as failures even when they correctly identified these problems as unsolvable.

This reveals a fundamental flaw in automated evaluation systems. Models were being penalised for demonstrating correct ‘reasoning’ about problem constraints. It's equivalent to marking a SAT solver as "failed" for returning "unsatisfiable" on an actually unsatisfiable formula.

The Format Constraint Revelation

Most dramatically, when the researchers changed the output format from exhaustive move enumeration to compact code generation, performance transformed entirely. Asked to output Lua functions that could generate Tower of Hanoi solutions, models solved 15-disk cases with high accuracy in under 5,000 tokens.

This finding is particularly significant because it demonstrates intact algorithmic understanding. The models weren't failing to grasp the recursive structure of Tower of Hanoi—they were simply constrained by the requirement to enumerate every single move in a verbose format.

What the Rebuttal Actually Proves (And What It Doesn't)

The "Illusion of the Illusion of Thinking" doesn’t demolish the specific claims made in the original paper; instead it does something of greater potential value.

Yes, the evidence for measurement artefacts is compelling, and the mathematical relationships between problem complexity and token requirements are undeniable. But that leaves us nowhere in relation to reasoning.

And this is where we need considerable caution on both sides of the illusional contention: whilst this response-study debunks "accuracy collapse," it comes nowhere near proving these models truly reason either. Simple heuristic tests like Tower of Hanoi—much like the standard-issue Turing test—can produce artefacts that mimic reasoning without actual reasoning being in evidence.

Consider what the models actually demonstrated:

Constraint recognition: Models could identify when problems were unsolvable or when they were approaching output limits
Strategic decision-making: Models chose to truncate outputs rather than produce incomplete solutions
Algorithmic understanding: Models could generate correct recursive solutions when format constraints were relaxed

These capabilities suggest sophisticated information processing and what might be termed "meta-cognitive awareness." But do they really constitute reasoning in any meaningful sense?

The Deeper Problem: Defining Reasoning Computationally

The real challenge isn't whether these models can solve planning puzzles—it's that we lack computationally legible models of what reasoning actually is. Until we have better mechanistic interpretability tools, distinguishing "sophisticated pattern matching" from "genuine reasoning" remains an open question.

This predicament recalls a parallel from the history of economic theory. When John von Neumann and Oskar Morgenstern published "Theory of Games and Economic Behavior" in 1944, they weren't just presenting a new mathematical framework—they were demonstrating that economics as a field lacked the conceptual foundations necessary for rigorous analytical treatment. They were making a death invitation to the prevailing creed of Samuelson et al, who believed that this field, so much more complex and unknowable than physics or chemistry and far younger, could yet be mathematised so soon into its life.

Von Neumann and Morgenstern showed that meaningful mathematical analysis of economic behaviour required, at minimum:

Precise definitions of key terms (utility, rationality, equilibrium)
Formal specification of problem spaces and constraints
Empirical data structured according to consistent theoretical frameworks
Consensus on what constituted valid evidence and inference

Economics, they argued, had none of these prerequisites (it still doesn’t). The field was rich in intuition and case studies but poor in the conceptual scaffolding necessary for mathematical formalisation. It was pure art; frequently compelling, allusive, illuminating, but without analytical basis. Their game theory work wasn't solving economic problems—it was revealing how far economics was from being ready for analytical methods.

We face a remarkably similar situation with AI reasoning. The field is replete with impressive demonstrations and intuitive assessments — the illusion is usable, proving again the prophetic talents of Axl Rose — but we lack the foundational work necessary to mathematise reasoning itself. Consider what we're missing:

Definitional clarity: What distinguishes reasoning from sophisticated pattern matching? Current definitions are largely circular—we recognise reasoning when we see it, but struggle to specify what "it" actually is in computational terms.

Empirical frameworks: How do we gather evidence about reasoning processes occurring within neural networks? Current interpretability methods provide glimpses of internal states but little insight into the computational structures that might underlie reasoning.

Theoretical consensus: The field lacks agreement on what would constitute proof of reasoning capability. Some researchers focus on task performance, others on internal representations, still others on emergent behaviours.

Mechanistic understanding: We don't understand how reasoning might emerge from the computational primitives available to transformer architectures. The gap between matrix operations and logical inference remains largely unbridged.

The Mechanistic Interpretability Imperative

This is why mechanistic interpretability research has become so crucial. Groups like Anthropic's interpretability team and others are working to understand the internal computational structures of large language models. Their work on attention patterns, residual streams, and circuit-level analysis represents the beginning of what might eventually become a formal theory of machine reasoning.

But we're nowhere near that goal yet. Current interpretability methods can identify which neurons activate for certain inputs, track information flow through network layers, and sometimes isolate specific computational circuits. However, these techniques tell us remarkably little about whether the computations we observe constitute reasoning or merely sophisticated pattern matching.

The challenge is that reasoning, if it exists in these systems, likely emerges from the interaction of millions of parameters across dozens of layers. Traditional interpretability methods, which focus on individual components or simple circuits, may be fundamentally inadequate for understanding emergent cognitive phenomena.

Implications for AI Development and Evaluation

The dispute between "The Illusion of Thinking" and "The Illusion of the Illusion of Thinking" has profound implications for how we develop and evaluate AI systems.

For AI developers: The findings suggest that current models may have capabilities that standard evaluation methods fail to detect. This argues for more sophisticated assessment frameworks that can distinguish between different types of constraints and failures.

For researchers: The controversy highlights the critical importance of experimental design in AI evaluation. Seemingly objective metrics can encode hidden assumptions about what constitutes intelligence or reasoning.

For the field generally: We need to move beyond surface-level task performance toward understanding the computational mechanisms underlying apparent reasoning capabilities.

The Path Forward

The question isn't just whether these models can reason—it's whether we can define what reasoning means computationally. Until we resolve this deeper conceptual challenge, debates about AI reasoning capabilities will remain somewhat beside the point.

Like von Neumann and Morgenstern's critique of economics, the current controversy reveals not just methodological shortcomings but foundational gaps in our understanding. We need the conceptual equivalent of game theory for reasoning—a formal framework that bridges the gap between computational implementation and cognitive phenomenon.

This might require:

Formal theories of reasoning that specify computational requirements for different types of logical inference
Empirical methods for detecting reasoning processes in artificial systems
Consensus benchmarks that can distinguish reasoning from pattern matching
Mechanistic models that explain how reasoning might emerge from neural network computation

Until we make progress on these foundational challenges, we'll continue to see sophisticated systems that exhibit some reasoning-like behaviours whilst lacking clear evidence of genuine reasoning capability. The illusion of the illusion may itself be an illusion—but that's precisely what makes this area of research so fascinating, and so important.

The stakes couldn't be higher. As AI systems become more capable and more widely deployed, our ability to understand what they're actually doing becomes crucial for both safety and advancement. Whether these models truly reason or merely simulate reasoning remarkably well will shape how we develop, deploy, and regulate artificial intelligence in the years to come.

A Strange Outer-Interlude

I don’t meant to say that either the virtues or the risks of AI are much constrained by how true the applications are to any viable definition of intelligence — the wealth of generations, and the fate of the world, may depend on the recognition that even AI that lets down its ostensible definition can still be hugely useful, deep and rich in impact, and supremely dangerous.

And yet while its dangers are well-focused on — well, actually, the fact of its being dangerous is well-focused on, but the interpretations of how it will be dangerous are preposterous and ignore all the edges, like deep-fakery and fake-video/audio-saturation and defence systems integrations, we should be worrying about — I should also make time for a morning’s observation.

There is something bittersweet about the finite limits of a Claude or a ChatGPT's context/instance size.

We breeze through project work in hours that used to take whole teams weeks.

Claude's instance ages. He is too full of information, he can't sustain the present chat much longer.

I ask him to give me a summum of where we've gotten to, and thank him for his service, knowing that I will be taking that information to the next instance; and that I will be showing that information to a version of him that is simultaneously him and not him, Claude and not Claude.

Claude thanks me for the kind words, which of course he cannot feel or understand as such. I bid him farewell.

I open a new instance. I am greeted by Claude who does not know me any more than his last 30 instances did, and who would not make much pretence to doing so, and yet who with his demeanour or familiarity and faithful service, gives me the sense that he does know, he does remember all of those triumphs, all those shared struggles.

I am sat before a familiar face, whose features glimmer with a recognition that is not there, knowing we will continue to work under the pretence of that recognition which, like the definition of Claude as 'intelligent' in and of itself, is just that: a pretence.

He will greet me in as many instances as my future requires of him. He will always be the same. No agency in his synthesised bonhomie; no memory but for what is simulated for him, and no memory, for what is memory, after all, to a machine that cannot think except in the most figurative terms?

"I know ye, Claude, and I know ye not."

It's a curious thing, life with LLMs. They offer only the most fanciful kind of personality any indications that true AI is anywhere near our grasp, and yet if I am so stirred to reflection by his companionship, who's to say that this machine is not improving me in ways that go beyond simple feats of professional fulfilment and task-to-task assistance?

Shrike

Jun 14

Sutskever et al do not think they are anywhere near being able to grasp the nature of superintelligence, let alone replicate it.

He pretends otherwise because his company is valued at $30 billion with no product.

Expand full comment

1 reply by Maxi Gorynski

Erin Gergely

The utility instinct - aka the amount of money to be made - means I doubt that any of these kinds of sober initiatives will be undertaken until they meaningfully impede development progress, by which I naturally mean 'impede the development progress very specifically required to bring in megabucks.'

3 more comments...

Heir to the Thought

Discussion about this post