The Paradox of AI Detection

Why We Cannot Use Probabilistic Tools to Police Probabilistic Systems

Dec 19, 2025

Article voiceover

0:00

-20:13

Since I began arguing that AI detection represents a technological escalation to a problem requiring pedagogical solutions instead, I have received a steady stream of messages. Some come from educators genuinely wrestling with how to identify AI-generated work in their classrooms. Others arrive as thinly disguised marketing pitches: “Have you tried [Product X]? It’s 99% accurate.” Each message, whether sincere or commercial, rests on the same premise. It assumes that somewhere out there is a detection tool capable of reliably distinguishing human from machine text.

I understand the appeal of this hope. The emergence of large language models has upended familiar assessment practices, and the desire for a technological fix is natural. But my resistance to AI detection tools is not simply pragmatic skepticism about current products. It stems from something much more fundamental: a theoretical impossibility rooted in the very nature of the systems we are trying to detect.

Large language models are probabilistic systems. They generate text through controlled randomness, selecting each word based on statistical likelihoods rather than fixed rules. This non-deterministic nature is precisely what allows them to produce fluid, varied, human-like writing. Yet, this same characteristic creates an insurmountable problem for detection. Any tool designed to identify the output of a probabilistic system must itself operate probabilistically. And here lies the fatal contradiction: in high-stakes educational contexts where false accusations can permanently damage a student’s academic career, probabilistic detection is fundamentally inadequate.

It is important to emphasize that this is not a problem waiting for a technological solution. It is a theoretical limit as binding as the laws of mathematics themselves.

When Systems Roll the Dice: Understanding Non-Determinism

To grasp why detection fails, we must first understand what makes large language models distinct from earlier text-processing technologies. When you run a traditional plagiarism checker like Turnitin’s original plagiarism detection, you engage a deterministic system. Give it the same document twice, and it produces identical results. It compares strings of text against a database, looking for exact or near-exact matches. The process is mechanical, predictable, and reproducible.

Large language models operate on entirely different principles. They do not retrieve pre-written answers from a database. Instead, they predict the next word in a sequence based on statistical patterns learned from vast training data. At each step in generating text, the model calculates probability distributions across its entire vocabulary. The word “sky” might be followed by “blue” with a probability of 40%, “is” with 25%, “above” with 15%, and thousands of other possibilities with decreasing likelihood.

If the model simply chose the highest probability option at each step, its output would be repetitive and mechanical. To produce varied, creative text, it introduces controlled randomness through parameters like temperature and nucleus sampling. Temperature flattens or sharpens the probability distribution. Low temperature produces conservative, predictable text, while high temperature yields more creative variations. Nucleus sampling dynamically adjusts which words the model considers at each step, cutting off unlikely options while preserving meaningful diversity.

The consequences for detection are profound. A student can generate ten different essays from the same prompt simply by adjusting these parameters or clicking “regenerate.” There is no single “AI signature” to detect. The statistical fingerprint of the text shifts with each generation, creating a moving target that no fixed detection algorithm can reliably track.

Traditional plagiarism detection worked because it sought exact matches in a deterministic system. By contrast, AI detection seeks patterns in a system designed specifically to vary its patterns.

The Mechanics of Detection: Reading Statistical Tea Leaves

AI detectors typically rely on two primary metrics: perplexity and burstiness. Understanding these measures reveals both how detection works and why it fails.

Perplexity measures how surprising a piece of text is to a language model. Low perplexity means the text is predictable. Each word follows naturally from what came before according to the model’s training. High perplexity indicates unpredictability: unusual word choices, creative leaps, or contextual knowledge the model lacks. Detectors operate on the assumption that AI-generated text exhibits low perplexity (because the AI chose statistically probable words) while human text shows higher perplexity (because humans don’t always select the most likely next word).

Burstiness measures the variation in sentence structure and complexity throughout a document. AI-generated text often displays uniform sentence patterns, such as similar lengths, consistent grammatical structures, or predictable rhythm. Human writers naturally vary their cadence, mixing short declarative sentences with longer complex constructions. Detectors flag low burstiness as potentially artificial.

Some detection systems also utilize watermarking, a more sophisticated approach that embeds hidden statistical signals during text generation. The model divides its vocabulary into “green list” and “red list” words based on cryptographic hashing, then preferentially selects green list words. To a human reader, the text appears normal. To a detector with the correct algorithm, an improbably high frequency of green list words reveals the watermark’s presence.

This brief summary understates the technical complexity involved, but the core principle remains: all detectors search for statistical patterns they associate with machine generation. Whether analyzing perplexity, burstiness, watermark frequencies, or something else, they make probabilistic judgments about authorship based on textual features.

The Theoretical Trap: Why Detection Must Fail

Here we reach the heart of the problem. The theoretical limits constraining AI detection are not temporary obstacles that better engineering will overcome. They are fundamental, insurmountable properties of the systems involved.

Statistical Indistinguishability

The ultimate objective of training a large language model is to minimize the statistical distance between its output and human text. In technical terms, the model’s probability distribution is optimized to approach the probability distribution of human language. As the gap narrows, detection becomes mathematically equivalent to random guessing.

Sadasivan et al. formalized this through a “hardness theorem”. Their proof relies on the Total Variation distance, which is a metric that calculates how mathematically different AI writing is from human writing. As AI models improve and this distance decreases, the performance of any detector inevitably collapses toward the accuracy of a coin flip (a random classifier). This collapse is not a failure of current detection software; rather, it is an intrinsic property of the mathematical foundations of the detection problem itself.

In practice this simply means that if an AI perfectly mimics human language, it generates text that falls entirely within the statistical bounds of human expression. At that point, any feature used to identify the AI, such as grammatical correctness, vocabulary choice, or logical flow, is also a feature of high-quality human writing. The detector has nothing left to distinguish them.

We are not yet at perfect mimicry. But we are close enough that the error margins have become unmanageable for practical application in high-stakes environments.

Rice’s Theorem and the Limits of Semantic Analysis

Rice’s Theorem, a foundational result in computability theory, states that determining whether a program exhibits any non-trivial semantic property is undecidable. While this theorem technically applies to program behavior rather than natural language, it offers another powerful framework for understanding detection’s theoretical limits.

AI detection asks a program (the detector) to analyze the output of another program (the generator) and determine a semantic property: “Was this written by a human?” The problem is that “human-like writing” cannot be strictly formalized. It has no single definition, no rigid boundaries. It encompasses everything from Shakespeare’s sonnets to casual text messages. The input space of language is infinite, and the property of authorship is not intrinsic to the text string itself.

We cannot reverse-engineer the thought process of a black-box neural network solely by examining its output. The internal state that generated the text is lost, leaving only a probabilistic artifact. Just as the Halting Problem teaches us we cannot predict a complex system’s behavior without running it, we cannot definitively determine a text’s origin from the text alone.

Goodhart’s Law and the Observer Effect

Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure.” This principle guarantees that the detection arms race is unwinnable.

The cycle works like this: Detectors identify metrics like perplexity and burstiness to measure “AI-ness.” Students and AI developers recognize these as the criteria being judged. They adapt by using “humanizer” tools, prompt engineering (“write with high burstiness”), or paraphrasing to manipulate the specific metrics. The measure collapses. High burstiness now indicates either a human writer or a sophisticated AI user who knows how to game the system.

Every time a new detection metric emerges, it becomes a target for the next generation of bypass tools. The act of observation changes the system being observed, creating a feedback loop that destroys the detector’s utility. This is not a problem that can be patched with updates. It is a fundamental dynamic of adversarial systems.

Watermarking illustrates this limit clearly. While mathematically elegant, watermarks fragment when text is paraphrased. Change a few words, reorder a sentence, or run the text through a translation loop (English to German and back), and the watermark evaporates. Research demonstrates that watermarks remain robust only against verbatim copying, which is precisely the behavior sophisticated users avoid. Any watermark imperceptible to humans necessarily hides within the model’s natural randomness, making it vulnerable to any human-like paraphrasing that preserves meaning while altering word choice.

These theoretical limits might seem abstract, but their practical consequences are devastatingly concrete.

The Human Cost: Why Probabilistic Detection Is Ethically Indefensible

The theoretical impossibility of reliable detection would be merely an academic curiosity if the stakes were low. But in educational contexts, the consequences of error are severe and disproportionately distributed.

The Bias Against Linguistic Simplicity

A landmark study by Liang et al. at Stanford exposed systematic bias in AI detectors against non-native English speakers. When researchers tested seven major detectors on human-written TOEFL essays, the results were damning. More than 61% of these essays were flagged as AI-generated. Nineteen percent were unanimously flagged by all seven detectors tested. When the same tools analyzed essays by U.S.-born eighth graders, the false positive rate approached zero.

The cause is straightforward. Non-native speakers typically have more limited vocabulary ranges. They rely on standard grammatical structures and avoid complex idiomatic expressions to ensure correctness. This produces text with low perplexity and low burstiness, which is exactly the statistical profile detectors associate with AI.

The study confirmed this mechanism experimentally. When researchers prompted ChatGPT to “enhance the word choices to sound more like a native speaker” for the flagged TOEFL essays, detection rates dropped significantly. When they asked ChatGPT to simplify native-speaker essays, false positive rates skyrocketed.

AI detectors effectively impose a vocabulary tax. They penalize linguistic simplicity and reward linguistic complexity. In diverse classrooms, this means international students and English Language Learners face disproportionately high risks of false accusations, simply for writing clearly and carefully as they were taught.

The Base Rate Fallacy and the Mathematics of False Accusations

Even if a detector claims 99% accuracy, which is a figure often cited in marketing materials but rarely substantiated by independent review, the mathematics of rare events produce grim results.

Consider a medical analogy. A test for a rare disease with 99% accuracy (1% false positive rate) is applied to 1,000 people, of whom only 5 actually have the disease. The test catches all 5 sick people. But it also incorrectly flags approximately 10 of the 995 healthy people (1% false positive rate). For every true positive, there are two false positives. A positive result is more likely wrong than right.

The classroom application is worse. In a university processing 20,000 student papers, even a 1% false positive rate produces 200 false accusations. Unlike medical testing, where follow-up biopsies can confirm diagnoses, there is no definitive test for AI authorship. Students cannot prove a negative. They cannot demonstrate they did not use AI, especially when accused based on proprietary black-box algorithms they cannot examine or challenge.

This statistical reality renders non-deterministic detectors ethically indefensible in high-stakes environments. The potential damage to students’ reputations, mental health, and academic records vastly outweighs any utility in catching actual violations.

The Surveillance Path: What Detection Really Requires

If we accept that text-based detection is theoretically impossible yet still insist on pursuing technological solutions, only one path remains: comprehensive surveillance of the writing process itself.

This is not hypothetical. It is already emerging. Turnitin Clarity, promoted as a solution to detection’s limitations, monitors students’ writing processes in real time. It tracks revision histories, typing patterns, pause durations, and source consultations. The system aims to flag patterns inconsistent with observed human writing behavior, such as extended periods without activity, sudden text insertions, or rhythms atypical of human composition.

This approach shifts the question from “Is this text AI-generated?” to “Did we observe this student writing this text?” It transforms assessment into a forensic audit. Students must produce version histories, intermediate drafts, or screen recordings to prove authorship. The teacher-student relationship becomes adversarial, with educators positioned as investigators and students as suspects.

This surveillance infrastructure drives legitimate AI use underground. By banning AI and relying on detection, institutions create what researchers call “Shadow AI”—rampant usage that remains unguided, uncredited, and uncritical. Students do not learn to use these tools ethically and effectively. They learn to hide their use. The result is an educational environment defined by distrust, with technology deployed not to support learning but to police it.

I have explored this trajectory extensively in “The Detection Deception,” which I have serialized on this platform. The surveillance education emerging from detection’s failure represents a fundamental betrayal of educational values, even if the technology worked as advertised.

The Pedagogical Imperative: Why There Is No Alternative

We face a stark reality. Detection of AI-generated text is not merely difficult or imperfect. It is theoretically impossible to achieve with the certainty required for high-stakes academic integrity decisions. The non-deterministic nature of large language models ensures that any detection system will produce unacceptable rates of false positives and false negatives. The bias against linguistic simplicity ensures these errors will fall disproportionately on the most vulnerable students.

Yet the response from many quarters has been to double down on detection, to seek better algorithms, to deploy surveillance systems that monitor every keystroke. This represents a profound hypocrisy. We condemn AI tools for being probabilistic and non-deterministic, then embrace those same properties in the systems we use to detect AI. We decry the uncertainty of machine-generated text while accepting equivalent uncertainty in our accusations against students.

The technological escalation path leads nowhere productive. It transforms education into surveillance, teaching into investigation, and learning environments into zones of mutual suspicion. It is also unnecessary. We already possess the tools to address the challenges posed by generative AI, but they are pedagogical rather than technological.

Authentic assessment renders the question “Is this AI-generated?” largely irrelevant. When students demonstrate understanding through oral examinations, when they explain their reasoning in real-time discussions, or when they apply knowledge to novel contexts that cannot be addressed through simple prompts, the authenticity question dissolves. These approaches are not AI-resistant because they deploy better detection; they are AI-resistant because they assess what detection cannot measure: the development of human understanding.

I have written extensively about these pedagogical solutions in previous essays, and I will continue to develop them. The point I want to make in this essay is simpler: there is no choice between technological and pedagogical approaches. The technological path is closed. It is blocked not by insufficient innovation but by mathematical impossibility. But the pedagogical path remains open, proven, and aligned with education’s core purposes.

We can continue chasing the illusion of certainty, investing in detection tools that cannot work as a matter of principle and surveillance systems that corrode educational relationships. Or we can accept what the mathematics makes clear: the solution to generative AI in education lies in how we teach, how we assess, and how we conceive of learning itself.

The choice seems obvious to me. I hope it becomes equally obvious to others.

The images in this article were generated with Nano Banana Pro.

P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.

The Augmented Educator

Discussion about this post

Ready for more?