The Augmented Educator

The Augmented Educator

Reframing the Stochastic Parrot

On the Limits of a Seductive Metaphor

Michael G Wagner's avatar
Michael G Wagner
Jan 16, 2026
∙ Paid
Upgrade to paid to play voiceover

This post follows my standard early access schedule: paid subscribers today, free for everyone on January 27.

Every few days, with the predictability of a recurring appointment, another post surfaces in my LinkedIn feed. The message is always the same: large language models are nothing more than “stochastic parrots.” They are systems that merely recombine linguistic patterns without understanding, devoid of meaning or intent. The argument has persisted for years now, cycling through the discourse with remarkable consistency. These posts arrive with the certainty of scripture, often accompanied by knowing nods about how “we need to remember what these systems really are.” The conviction is admirable. The analysis, unfortunately, is not.

I find myself uncomfortable with this narrative, not because I believe LLMs possess some mystical form of consciousness, but because the confident dismissal rests on assumptions about human cognition that contemporary neuroscience has spent the last two decades quietly dismantling. The stochastic parrot metaphor, introduced by Emily Bender and colleagues in their influential 2021 paper, offers a seductive clarity: humans understand language because we ground it in physical reality and communicative intent; machines merely manipulate statistical patterns. One has meaning; the other only form.

The problem is that this distinction requires a model of human cognition that may not exist.

The Appeal of the Stochastic Parrot

To understand why the stochastic parrot metaphor persists with such force, we need to appreciate its theoretical elegance. The argument emerges from a thought experiment known as the Octopus Test, proposed by Bender and Alexander Koller in 2020. Two humans stranded on separate islands communicate via underwater telegraph. A hyper-intelligent octopus taps the cable, observes their exchanges for months, learns the statistical patterns of their language, then cuts the cable and impersonates one of them.

The octopus can reproduce conversational patterns perfectly. When asked, “How are you?” it responds, “I’m fine, the weather is great” because this correlation appears thousands of times in the data. But when one human sends an urgent message—”I’m being attacked by a bear, what should I do with this coconut?”—the octopus fails. It knows “run” frequently appears near “bear” in text, but it cannot reason about the physical affordances of coconuts as defensive implements. It manipulates symbols without accessing the world those symbols describe.

This thought experiment captures something many of us intuitively believe about the difference between human and machine intelligence. We assume our understanding derives from embodied experience in a shared physical world, while LLMs have access only to text—a “low-bandwidth, highly compressed projection” of reality, as AI researcher Yann LeCun has argued. Text alone, the reasoning goes, cannot support genuine understanding.

The appeal extends beyond theoretical elegance. In an era of AI hype and anthropomorphic projection onto these systems, the stochastic parrot offers a needed corrective. It reminds us not to confuse fluent output with comprehension, or pattern matching with reasoning. This caution matters for how we deploy these tools in education and beyond.

But the metaphor only works if we possess a coherent account of what human understanding actually is—and whether it differs fundamentally from sophisticated pattern matching.

The Prediction Machine in Your Skull

The most potent challenge to the stochastic parrot framework comes from predictive processing theory, which has achieved something approaching consensus status in contemporary neuroscience. Associated with researchers like Karl Friston, Andy Clark, and Anil Seth, this framework re-conceives the brain not as a passive receiver of sensory information but as an active prediction engine that generates its model of reality from the inside out.

The mechanism works hierarchically. Your brain continuously generates predictions about incoming sensory data at every level: the shape of the coffee cup on your desk, the weight of your phone in your hand, the sound of approaching footsteps. When sensory input matches these predictions, the signal is “explained away” and doesn’t propagate up the processing hierarchy. Only prediction errors—moments when reality violates expectation—get passed upward to update the internal model.

This isn’t a peripheral detail about neural processing. It’s a fundamental reconceptualization of what perception is. In Anil Seth’s memorable phrase, human consciousness amounts to “controlled hallucination.” We don’t see the world as it is; we see our brain’s predictions about the world, continuously refined by error signals from our senses. The external world serves primarily to constrain and correct our hallucinations, not to write a picture directly onto our neural hardware.

The mathematical architecture bears a striking resemblance to how LLMs are trained: minimize prediction error against observed data. The brain predicts the next moment of sensory experience; the language model predicts the next token in a sequence. Both systems build statistical models of their respective input domains, continuously updating to reduce the gap between prediction and observation.

If the brain operates by predicting sensory states and updating on error, the dismissal of LLMs as mere “prediction machines” loses its critical force. We’re prediction machines too. The distinction lies in what we predict (high-dimensional sensory experience versus discrete text tokens) and how those predictions connect to the world, but perhaps not in the fundamental mechanism of intelligence itself.

Karl Friston’s Free Energy Principle attempts to unify this picture mathematically. The brain, on this account, is an organ for minimizing long-term average surprise—for building models that successfully predict sensory input. This involves the same basic objective function as training a language model: minimize the divergence between predicted and observed distributions.

This creates an uncomfortable symmetry. When we say LLMs “hallucinate,” we typically mean they generate plausible-sounding text unsupported by facts. But if human perception is controlled hallucination, constrained by sensory feedback, then both systems generate probable continuations of their training data. Ours happens to include rich sensory grounding; theirs includes the statistical structure of human language. Both are, in a technical sense, hallucinating probable next states—the difference is simply that sensory reality provides stricter constraints than text alone.

When Parrots Build World Models

The stochastic parrot metaphor assumes LLMs learn only surface-level statistical correlations without developing internal representations of the world those correlations describe. Recent research in mechanistic interpretability has challenged this assumption in unexpected ways.

User's avatar

Continue reading this post for free, courtesy of Michael G Wagner.

Or purchase a paid subscription.
© 2026 Michael G Wagner · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture