The Cognitive Laundromat

Clean Room Engineering, Semantic Plagiarism, and the Crisis of Academic Originality

Apr 10, 2026

Article voiceover

0:00

-19:02

In early March 2026, a website called Malus began circulating through developer forums and open-source communities with a pitch so brazen it stopped people mid-scroll. “Finally,” the site announced, “liberation from open-source license obligations.” The premise was simple: tell the service which open-source software your product relies on, and Malus’s AI systems would remove the license by independently recreating every component from scratch. One set of AI agents would analyze only public documentation, producing a detailed functional specification containing no code. A completely separate set of agents, which had never communicated with the first, would build the software anew. The resulting code would arrive under MalusCorp’s proprietary license, which had zero attribution requirements and zero obligation to share improvements. In other words, it had zero legal strings of any kind.

The project around this fictional company grew out of a presentation by Dylan Ayery and Mike Nolan at FOSDEM 2026, the annual free software conference in Brussels, titled with characteristic bluntness: “Let’s end open source together with this one simple trick.” Malus by MalusCorp was satire, a deliberately provocative thought experiment designed to demonstrate how AI-driven reverse engineering could render open-source licensing unenforceable. The name of the service had hinted at its purpose, as “malus” is Latin for “bad” or “harmful.” Commenters on Hacker News captured the prevailing mood. “I almost went crazy until I realized it was satire,” wrote one; “I understand this is satire,” replied another, “but in six months it might not be so far from reality.”

The reason Malus landed so hard was that reality had already proved the joke prophetic. Just days before the site went viral, a real controversy erupted over a widely used piece of software called chardet, a Python library downloaded millions or times per month. Dan Blanchard, who had maintained the project for over a decade, used Anthropic’s Claude Code to rewrite the entire library from scratch in five days. He then changed its license, removing the original requirement that anyone building on the code must share their improvements under the same terms.

His argument was straightforward. Since the AI had produced entirely new code, the old license no longer applied. Mark Pilgrim, chardet’s original creator, who had largely withdrawn from public life since 2011, resurfaced to contest this premise. The maintainers had spent years immersed in the original code, Pilgrim argued, and “adding a fancy code generator into the mix does not somehow grant them any additional rights.” The developer community split. Simon Willison, co-creator of the Django web framework, captured the uncertainty when he wrote he was leaning toward the rewrite being legitimate, but that the arguments on both sides were “entirely credible.”

What strikes me most about this sequence of events is that the core operation Malus satirized and Blanchard actually performed is identical in structure to something students do every day. Take someone else’s work. Extract the underlying ideas. Regenerate a new version that looks nothing like the original. Claim it as your own. In the software world, this process has a name, a legal history, and a body of case law stretching back four decades. It is called “clean room engineering,” and its migration from corporate law into the educational landscape represents one of the most consequential, and least discussed, threats to academic integrity that educators currently face.

The architecture of the information firewall

The clean room concept has a specific and revealing history, one that illuminates why its educational implications are so troubling.

The method was pioneered in the early 1980s, when competitors sought to manufacture IBM PC-compatible computers without infringing IBM’s copyrighted BIOS firmware. Companies like Phoenix Technologies divided their engineers into two strictly isolated groups. The first group, the “dirty room,” examined the copyrighted IBM code and produced a detailed functional specification describing what the software did, stripped entirely of how the original authors expressed it. The second group, the “clean room,” consisted of engineers who had never seen the IBM code. Working only from the specification, they wrote entirely new software from scratch. Because the implementation team had no access to the original copyrighted expression, any functional similarity in the final product could be legally defended as independent creation.

The legal architecture supporting this practice rests on a principle codified in Section 102(b) of the Copyright Act of 1976: copyright protects the expression of an idea, not the idea itself. This idea/expression dichotomy means that a plaintiff alleging infringement must prove both that the defendant had access to the copyrighted work and that the resulting product is substantially similar to the original expression. Clean room engineering surgically severs the access component of this test. If the implementation team never saw the original, there is no access, and without access, functional equivalence does not constitute infringement.

Courts have repeatedly validated this logic. In Sega Enterprises, Ltd. v. Accolade, Inc., the Ninth Circuit recognized the necessity of intermediate copying during reverse-engineering to uncover unprotectable functional specifications. In NEC Corp. v. Intel Corp., the clean room procedure served as evidence that similarities in microcode were dictated by functional constraints rather than illicit copying. The judicial consensus has been remarkably consistent. If independent creation can be verified through rigorous documentation and isolation protocols, the resulting product is lawful regardless of how closely it mirrors the original.

For decades, this remained an expensive, slow, and labor-intensive process accessible only to large corporations. The requirement to maintain two entirely separate teams of highly paid engineers, coupled with the legal overhead of auditing functional specifications, made clean room engineering prohibitive for smaller organizations. Phoenix Technologies’ original IBM BIOS clone took months of painstaking work.

Then generative AI compressed the process from months to minutes. As Malus showed (satirically, but accurately), an AI system can now ingest an entire codebase, generate an exhaustive functional specification, pass that specification through an automated information firewall to a second, isolated model, and produce a functionally equivalent but legally distinct output. The chardet rewrite, which Blanchard described as a five-day project, is one of the first real-world tests of this capability applied to a consequential piece of software.

The educational clean room

The implications for education emerge once you recognize that the clean room process is not inherently about software. It is about separating ideas from their expression. And that separation is precisely what students have started to do with academic work.

Consider how the process translates. A student takes a copyrighted academic paper, a well-graded essay from a peer, or a complex argument from a published source. They feed it into an AI system (the dirty room), prompting the model to extract the abstract concepts, foundational arguments, and empirical data points while stripping away all original linguistic expression. They then prompt a second model, or the same model in a fresh context (the clean room), to generate a completely new essay based solely on those abstract concepts. The output is structurally and linguistically distinct from the source material. It passes all traditional plagiarism detection. By the legal definition of independent creation, it is original work.

I have written at length in previous essays about the limitations of AI plagiarism detection, and the dynamics here confirm those concerns in a particularly unsettling way. Traditional detection tools like Turnitin rely on identifying structural or character-preserving plagiarism: near verbatim copies with minor synonym substitutions and cut-and-paste rearrangements. Against semantically laundered text, these keyword-matching systems are nearly useless. The practice, which researchers have termed “semantic plagiarism,” involves the theft of ideas and arguments presented through entirely novel expression. The research literature suggests detection rates as low as 40 percent, precisely because the AI’s output is, by its very design, a structurally independent creation.

Next-generation detectors using natural language processing claim significantly higher rates by mapping semantic vectors and deep linguistic features rather than raw text. I do not dismiss these tools entirely. But as I have discussed many times on the Augmented Educator, they remain plagued by false positives and the fundamental problem of an arms race in which the generative models constantly improve and learn to evade detection. The detector chases the generator in a cycle with no stable equilibrium.

The precision distinction

The crisis facing educators is not primarily technical. It is philosophical, and it exposes a friction between two definitions of originality that most people have never needed to distinguish.

In copyright law, the output of a clean room is acceptable. It is independent creation, and independent creation is precisely what copyright is designed to protect and encourage. The entire framework assumes that if you arrive at a functionally equivalent product without copying the original expression, you have demonstrated the creative independence that the law rewards. The chardet dispute, whatever its eventual resolution, operates entirely within this legal logic.

Academic integrity requires something fundamentally different. It demands not merely novel expression but cognitive authorship: evidence that the student engaged in the critical labor of reading, synthesizing, analyzing, and drafting. An AI-generated essay may be entirely free of copyright infringement in a court of law and simultaneously represent a profound violation of academic integrity, because the student did not perform the intellectual work that the assignment was designed to develop. The clean room method, when applied by a student, does not merely circumvent plagiarism detection. It outsources the learning process itself.

This distinction is important because it reveals why the reactive approach of detection and punishment is structurally inadequate. The problem is not that students are copying. The problem is that the very concept of “copying” has become insufficient to capture what is happening. Semantic plagiarism is not copying in any traditional sense. It is closer to what I would call “cognitive laundering.” This is the extraction of intellectual value from a source, processed through an algorithmic intermediary, and delivered in a form that bears no traceable resemblance to the original. The metaphor of laundering is apt because, like financial laundering, the process severs the chain of provenance. The ideas emerge clean on the other side.

The case for the defense

I need to acknowledge that the clean room metaphor, applied to education, is not entirely a story of threat and loss. There are legitimate and constructive uses of the same underlying capability.

Students who use AI to help them understand a complex argument, break it into its parts, and then reconstruct it in their own language are engaging in a process that, done transparently, closely resembles good scholarly practice. The act of abstracting an argument to its core logic and re-expressing it is, after all, what we ask students to do when we assign summary and synthesis exercises. What sets them apart is transparency and intent. The question is whether the student is using the tool to deepen their understanding or to bypass it.

Similarly, institutions that deploy AI systems trained with rigorous differential privacy protections can create what amounts to an ethical clean room for student ideation, a space where generative tools can assist brainstorming and drafting without the risk of inadvertently reproducing copyrighted material from the training set. The research literature on differential privacy suggests that such systems can provide a mathematical guarantee that the model’s output cannot be reverse-engineered to reveal specific training data, which has genuine value in academic contexts where both originality and ethical AI use matter.

These are actual possibilities, and I do not want to dismiss them. The question is whether they represent the dominant use case or the exception.

From product to process

I have argued before that if the final product can no longer be trusted as an authentic proxy for student learning, the assessment method must strengthen to evaluate the cognitive process itself. This is not a new argument either; educators have been making versions of it for years. But the clean room problem gives it a new urgency and a sharper theoretical foundation.

The shift from product-based to process-based assessment involves several concrete adaptations. Hybrid models that combine AI tools with traditional instruction can measure a student’s ability to critically evaluate, iterate upon, and improve AI-generated output, transforming the AI from a covert ghostwriter into an explicit object of analysis. AI-resistant assessments, including design critiques, oral examinations, in-class writing exercises, real-time debates, and hands-on projects, resist the clean room problem because they require immediate, embodied cognitive performance that cannot be outsourced to an algorithm.

Many educators have started to require students to maintain extensive design logs, version histories, and reflective journals that track the evolution of their thinking. The irony here is striking. This pedagogical strategy directly mirrors the exact documentation that corporate lawyers require to prove independent creation in a commercial clean room defense. The student’s design log serves the same evidentiary function as Phoenix Technologies’ specification documents, but in reverse. Instead of proving that the final product was independently created (and therefore legally clean), the log proves that the student actually performed the cognitive work (and therefore genuinely learned something). The same documentation framework that enables evasion in one context enables authentication in another.

There is also a compelling case for what I would call the return of the physical. Experiential, embodied learning in controlled environments provides a pedagogical baseline that AI cannot currently fake. When students are required to physically manipulate instruments, solve problems in real time, demonstrate psychomotor skills, or collaborate face-to-face under observation, they cannot outsource the task to a clean room. Universities investing in laboratories, maker spaces, simulation environments, and hands-on assessment are therefore not retreating from technology. They are building the spaces where human capability can be observed and authenticated.

The laundromat closes

The deeper lesson of the clean room metaphor is not that students are cheating more cleverly, though some are. It is that the fundamental architecture of academic assessment was built on an assumption that has quietly become false: that the idea that the difficulty of producing polished, original-seeming text served as a natural barrier, ensuring cognitive engagement. Writing a good essay was hard enough that doing it honestly was, for most students, the path of least resistance. Clean room engineering in its original corporate form depended on a similar assumption. It required that the expense and difficulty of maintaining two isolated teams would limit the practice to cases where the economic stakes justified the investment.

Generative AI has demolished both barriers simultaneously. The corporate clean room that once took months and cost millions can now be executed in minutes for pennies. The academic equivalent, the production of semantically original text from laundered ideas, is available to any student with a browser. In both cases, the collapse of practical difficulty has exposed the gap between what the rules technically prohibit and what they can actually prevent.

The response cannot be nostalgia for a world where the barriers still held. Nor can it be a purely technological arms race between ever-more-sophisticated generators and ever-more-sophisticated detectors. The response must be architectural. It requires a redesign of educational assessment that evaluates the thinking, not just the text; that requires demonstration, not just documentation; and that treats transparency about AI use as an ethical baseline rather than a confession.

Institutions that move decisively toward process-based assessment, embodied learning, and transparent AI integration will not merely survive the clean room problem. They will emerge with pedagogical models that are more rigorous and more honest about what learning actually requires. Those that continue to treat the final submitted artifact as a reliable proxy for cognition will find themselves, like an open-source license in the age of AI, technically still in force but practically unenforceable.

The clean room was invented to prove that something was independently created. Education’s task is the opposite. It demands proof that something was dependently learned, shaped by a genuine encounter with difficulty, mediated by human judgment, and earned through cognitive labor that no algorithm can perform on a student’s behalf.

The laundering machine can clean the text. It cannot clean the mind.

The images in this article were generated with Nano Banana 2.

Share The Augmented Educator

P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.

The Augmented Educator

Discussion about this post

Ready for more?