After Detection Fails
Five Fundamental Challenges That Will Reshape Education in the Age of Artificial Intelligence
Last week I wrote about a convergence I observed at the 18th annual International Conference of Education, Research and Innovation in Seville, where educators were moving past debates about policing AI toward conversations about what assessment might look like in a post-detection world. But as I’ve reflected on those conversations in Seville and the broader research emerging from conferences worldwide, I’ve realized that the consensus forming among educators extends beyond assessment methods. We’re beginning to agree not just on what won’t work in this case but also on the nature of the larger pedagogical and institutional challenges we face.
We’re confronting problems far more fundamental than cheating detection. The questions now occupying serious attention concern cognitive development, assessment validity, structural equity, and the basic mechanics of how human beings learn. What strikes me most about this shift is what it reveals. The technological arms race to detect AI-generated work wasn’t just ineffective; it was a distraction from the deeper crisis we now must confront.
The Contours of Consensus
The emerging agreement isn’t about what to do with AI in education, as that remains fiercely contested. Instead, it concerns which questions we need to address. This convergence of research results would have appeared pessimistic, if not alarmist, only a year and a half prior. Now they appear measured, empirically grounded, and increasingly difficult to dismiss.
First, there’s broad acknowledgment that generative AI fundamentally differs from previous educational technologies. Unlike calculators, which offload arithmetic computation, or search engines, which externalize information retrieval, large language models offload the synthesis and generation of ideas themselves. This distinction matters because it shifts the locus of cognitive work in ways that directly challenge traditional learning theories. A tool that can create coherent essays doesn’t just improve students’ abilities—it potentially substitutes for the very cognitive processes we’re trying to develop.
Second, the field has largely abandoned the fantasy of reliable detection. The technical limitations of AI detection tools, including problematic false positive rates and a vulnerability to simple workarounds, have become undeniable. More significantly, the legal vulnerabilities these tools create for institutions are now well documented. What promised to preserve the existing system intact turned out to be a conceptual dead end, not just a technical setback.
Third, educators and researchers increasingly recognize that adoption has drastically outpaced institutional readiness. Global surveys show that up to 86% of students now use AI tools in their studies, with more than half engaging with these tools weekly. Meanwhile, 45% of educators report receiving no formal training on AI integration. This literacy gap creates a dangerous asymmetry. Students operate with tools that act as cognitive force multipliers, while educators lack the frameworks to assess the actual learning beneath the polished outputs.
These three points of consensus establish the landscape we’re operating in. But they also open onto deeper questions about what happens when students have constant access to tools that can think for them. The challenges that follow—five of them, each fundamental in its own way—require us to reconsider assumptions about learning, assessment, and institutional structure that have shaped education for decades.
1. The Cognitive Development Challenge
Before examining how we might assess learning in this environment, we need to understand what’s at stake. The concern extends beyond academic integrity into questions about cognitive development itself.
Learning requires what researchers call “desirable difficulty,” a productive struggle that forces learners to construct and integrate knowledge actively. Generative AI, by providing immediate synthesized responses, removes this friction. Students can complete assignments, solve problems, and produce sophisticated outputs without engaging in the cognitive work that those tasks were designed to develop.
Recent research has documented what this looks like in practice. Studies of students using AI coding assistants find that while they complete programming assignments more quickly, many show significantly impaired problem-solving abilities when the AI scaffolding is removed. Investigations of AI use in writing find that students who rely heavily on these tools often struggle to explain or defend the arguments present in their own submissions. Perhaps most concerning, evidence suggests that students who use AI extensively overestimate their own competence. They conflate the tool’s capabilities with their own knowledge, creating what researchers call an “illusion of knowing.”
This cognitive challenge intersects with the assessment crisis in revealing ways. If students are using AI to complete assignments in ways that bypass the intended learning, then the problem isn’t merely that we can’t detect this use. Rather, it’s that the assignments themselves have become disconnected from the learning outcomes they were meant to assess. The actual issue isn’t cheating; it’s the decoupling of performance from cognition.
The implication is that AI-resistant assessment serves a dual purpose. First, it maintains the validity of our credentials by ensuring that demonstrated competence reflects actual capability. Second, and perhaps more importantly, it preserves the conditions under which learning can occur by creating spaces where students must engage in productive cognitive struggle without the option of AI-mediated shortcuts. This means prioritizing process over product, ensuring that we document and assess not just what students produce but how they arrive at their conclusions.
2. The Assessment Validity Challenge
This brings us to what I consider the most consequential challenge now confronting education: the destabilization of our assessment infrastructure. This crisis extends far beyond plagiarism concerns into questions about what our credentials actually certify.
Traditional assessment methods were designed for an environment where producing sophisticated written or analytical work required the actual possession of sophisticated thinking skills. That assumption no longer holds. A student can now submit work that meets every rubric criterion for demonstration of knowledge without having engaged in the cognitive processes we believe that work represents.
The immediate response from many institutions has been to retreat toward proctored, timed examinations and controlled environments. This represents a pedagogical regression. Timed, proctored examinations systematically disadvantage students with disabilities or non-native speakers. They also measure a narrow band of cognitive skills, mainly rapid recall and performance under pressure, while failing to assess the higher-order thinking we value. In other words, surveillance-based assessment represents a retreat not merely to pre-AI conditions but to pre-21st-century pedagogical principles.
The research literature now acknowledges what many of us have recognized through practice. We need assessment methods that are AI-resistant not through technological barriers but through their fundamental design. These approaches share several characteristics. They require human interaction, such as dialogue, defense, or collaborative exploration, revealing not just what a student knows but how they know it. They emphasize process documentation alongside final products, making the learning journey itself an object of assessment. They incorporate iterative revision and feedback cycles that expose gaps in understanding that polished first drafts can conceal. And they center on authentic tasks that connect to contexts outside the artificial bubble of academic evaluation.
In short, AI-resistant methods are built on one fundamental principle: when assessment requires students to document their thinking process, to engage in dialogue about their understanding, or to defend their conclusions in conversation, it becomes significantly more difficult to substitute AI outputs for genuine learning.
3. The Equity Challenge
The consensus also acknowledges that AI in education raises profound equity concerns that extend far beyond questions of access to tools. While early discussions focused on whether all students had equal access to AI assistants, more recent analysis reveals deeper structural issues.
One dimension concerns what researchers are calling “data colonialism.” Large language models are trained predominantly on English-language text from Western sources, which means they perform dramatically better for students from those linguistic and cultural contexts. Students writing in English receive sophisticated assistance, while those working in lower-resource languages receive outputs of significantly lower quality. The epistemological frameworks, citation practices, and rhetorical conventions embedded in these models reflect particular cultural traditions, potentially marginalizing alternative knowledge systems and ways of knowing.
Perhaps more insidious is what might be called the “expertise gap” in AI use. Early evidence suggests that students who already possess strong foundational knowledge use AI to extend and enhance their capabilities, while those with weaker foundations use it as a replacement for developing those capabilities. In other words, AI may increase rather than reduce educational inequality. This has direct implications for how we think about AI integration. If access to AI tools disproportionately benefits already-advantaged students, then unrestricted AI access without careful pedagogical scaffolding could widen achievement gaps rather than narrow them.
The assessment approach I’ve been advocating on this Substack, which is fundamentally built on human interaction, has the potential to mitigate some of these equity concerns. It creates spaces where students must demonstrate their actual understanding rather than their ability to deploy sophisticated tools. It allows educators to identify and address gaps in foundational knowledge before they compound. And it resists the automated discrimination that can occur when assessment is delegated to algorithmic systems that encode historical biases.
4. The Teacher Agency Challenge
The next area of consensus worth noting concerns the impact of AI on the teaching profession itself. The literacy gap between student tool proficiency and educator preparation represents more than a practical problem. It reflects a fundamental challenge to teacher agency and professional identity.
When students command tools that educators don’t understand, it disrupts the epistemic authority that structures the teacher-student relationship. When administrative pressure to integrate AI technologies isn’t accompanied by adequate support and training, it creates conditions for burnout and professional alienation. And when institutional responses to AI emphasize surveillance and restriction rather than pedagogical innovation, it positions teachers as enforcers rather than mentors.
The emerging consensus recognizes that sustainable AI integration in education requires centering teacher expertise and professional judgment. This means moving beyond training focused merely on how to use tools toward deeper engagement with questions about when and why particular tools serve pedagogical purposes. It means creating space for educators to experiment, fail, and iterate without punitive consequences. And it especially means resisting the impulse to automate teaching functions that require human judgment.
5. The Scalability Challenge
Here we encounter what may be the most uncomfortable implication of the emerging consensus: the assessment methods that can meaningfully verify learning in an AI-saturated environment do not scale in the ways our current educational system requires.
Consider what AI-resistant assessment demands. Oral examinations require dedicated faculty time and cannot be automated. Portfolio-based assessment with sustained mentorship and feedback necessitates low student-to-instructor ratios. Authentic projects embedded in real-world contexts require individualized design and evaluation. And collaborative assessments with peer review and discussion demand facilitation and careful observation. All of these approaches share a common feature: they are labor-intensive precisely because they resist mechanization.
This creates a fundamental tension with the model of higher education that has dominated for the past half-century. The industrialized university, with its large lecture courses, standardized assessments, and emphasis on credential production at scale, was predicated on the assumption that learning could be verified through standardized instruments administered to large cohorts. AI doesn’t merely challenge this assumption—it obliterates it.
The implications extend well beyond assessment. If meaningful learning verification requires human interaction, then the student-to-faculty ratios that currently prevail in many institutions become pedagogically untenable. If authentic assessment requires individualized design, then the standardized curriculum model faces existential pressure. And if we can no longer rely on take-home assignments to demonstrate learning, then the asynchronous, self-paced modality that has characterized online education must be fundamentally reconsidered.
This is not an argument for abandoning technology or returning to some romanticized pre-digital pedagogy. Rather, it’s recognition that AI forces us to confront questions about the purposes and structures of education that we’ve been deferring for decades. The industrialized model emerged not because it represented optimal pedagogy but because it allowed institutions to scale credential production within particular economic constraints. AI makes visible what was always true: we scaled by accepting significant compromises in the quality and depth of learning verification.
The uncomfortable question now is whether we’re willing to accept the implications of doing assessment well. The research consensus suggests that the pedagogical path forward requires smaller classes, more intensive mentorship, and greater emphasis on dialogue and process to enable assessment methods that resist automation. These are no longer luxuries to be pursued when resources allow. They are fundamental requirements for maintaining the validity of educational credentials in an AI-enhanced environment.
Living With Uncertainty
The consensus I’ve been describing is not a consensus about solutions. It’s a consensus about the nature and severity of the challenges we face. This might seem like cold comfort, but I find it oddly encouraging. We’ve moved past the initial phase of reactive panic and the subsequent phase of magical thinking about technological fixes. We’re now confronting, with some degree of honesty, the actual difficulties ahead.
This shift creates space for more substantive conversations about education’s purposes and structures. If we can’t preserve existing assessment methods through surveillance, we must design better ones. If our current model doesn’t scale, we must consider what would justify the resources required to do education well. And if AI challenges our assumptions about cognition and learning, we must develop more sophisticated pedagogical frameworks.
None of this will be easy, and much of it will be uncomfortable. Institutions will resist changes that require increased resource investment. Faculty will struggle with assessment methods that demand more time and different skills. Students will initially balk at approaches that require more demonstrable work than submitting polished essays. But the alternative—clinging to assessment methods that have lost their validity while hoping that better detection tools will emerge—is untenable.
Facing these challenges requires us to be honest about what we actually want education to accomplish. If we want credentials that meaningfully certify knowledge and capability, we must assess in ways that actually verify those things, regardless of scalability concerns. If we want students to develop robust cognitive skills rather than proficiency at deploying AI tools, we must create learning environments that require genuine intellectual struggle. And if we want a fair educational system, we must ensure that our responses to AI don’t simply amplify existing advantages.
These are difficult commitments, and they fly in the face of decades of policy emphasizing efficiency, scale, and standardization. But the emerging consensus suggests we’ve reached a point where we can no longer defer these questions. The future of education in an AI-saturated world will be determined by the clarity of our values and our willingness to build systems that reflect them. The consensus emerging now gives us the conceptual foundations to do that work. What remains is the harder task: summoning the institutional will to act on what we’re beginning to understand.
What are you seeing in your own teaching? Have AI-resistant assessment methods proven workable at your institution’s scale, or are you finding them unsustainable? Are the cognitive development concerns I’ve described showing up in your students’ work? Share your experiences in the comments. We need honest conversations about what’s actually possible given the constraints we’re operating under.
P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work






