Fourteen AI-Proof Assessment Methods for the Age of Generative Intelligence
How Dialogic Classrooms Challenge the Industrial Model of Higher Education
I just returned from the 18th annual International Conference of Education, Research and Innovation (ICERI2025) in Seville, Spain, where I observed something quite remarkable. In multiple academic sessions, I experienced a growing consensus among researchers concerning the future of assessment within the context of artificial intelligence. The conversation was no longer about whether generative AI needs to be controlled and policed in order to preserve our current approach to the assessment of learning. Instead, most educators were converging on what might replace our crumbling assessment infrastructure. The answer emerging from these discussions was both radical and ancient: dialogic forms of assessment represent our most promising path through the AI disruption.
This convergence felt particularly meaningful to me given my current work on “The Detection Deception,” which I’m serializing here on this Substack. In that book, I argue that educational institutions have been seduced into a surveillance arms race they cannot win. Each new generation of detection software promises to identify AI-generated work, yet these tools fundamentally misunderstand the problem. We’ve been trying to forensically analyze student-created products after the fact rather than addressing the broken chain of custody that makes traditional take-home assessments obsolete. As I have argued many times, the solution isn’t technological; it’s pedagogical. We need to shift our focus from policing outputs to observing processes, from analyzing artifacts to engaging in dialogue, and, most importantly, from what students produce in isolation to how they think in interaction.
The timing of this shift matters. As I walked through Seville’s medieval streets, past buildings that have weathered centuries of technological disruption, I kept thinking about permanence and change in education. The printing press didn’t destroy the university; it transformed it. The internet didn’t eliminate the classroom; it extended it. Now, generative AI arrives not as education’s executioner but as its most demanding reformer. It forces us to abandon what was always a convenient fiction—that a submitted essay definitively shows individual understanding—and return to what has always been education’s true foundation: the irreducibly human interaction between teacher and student and the situated learning that emerges from lived experience. We need to return to the understanding that true knowledge reveals itself only through dialogue.
What follows is a practical compendium of fourteen dialogic assessment methods that embody this shift. These aren’t experimental proposals or theoretical frameworks; they’re time-tested approaches already being deployed in classrooms from community colleges to research universities. Each method works because these assessments evaluate the essence of human learning: real-time reasoning, the integration of personal experience with academic knowledge, and the metacognitive awareness that allows humans to recognize and correct their own thinking as it unfolds.
Consider this article a recipe book for educators ready to move beyond the detection deception toward assessment practices that honor what makes human learning distinctive. And I realize that this post is longer than usual, but I felt it is important to keep everything in one piece.
1. Whiteboard Defense
The whiteboard defense transforms problem-solving from a private cognitive act into a public performance of reasoning. Most commonly deployed in STEM fields—particularly engineering and physics—this method requires students to work through complex problems at a whiteboard while articulating their thinking process to an instructor who serves as both audience and interlocutor. The assessment focuses not on reaching the correct answer but on demonstrating the cognitive journey: how students identify relevant principles, select problem-solving strategies, recognize dead ends, and pivot toward productive approaches. The instructor employs Socratic questioning throughout, asking students to justify their assumptions, explain their notation choices, and predict the implications of their emerging solution. This creates what researchers call “cognitive-process-verification.” The instructor directly observes reasoning as it occurs rather than attempting to infer it from a polished final product.
Implementing whiteboard defense requires thoughtful consideration of several factors. The individual attention necessary for Socratic facilitation means institutions must commit to smaller class sizes, particularly in upper-division courses. Instructors need training in balancing supportive guidance with evaluative rigor—a skill that develops through practice and mentorship. Some students may initially feel uncomfortable with public problem-solving, though most adapt quickly when the classroom culture emphasizes learning over performance. The key challenge lies in developing instructors’ ability to distinguish conceptual understanding from procedural memorization, a nuanced skill that represents the true art of dialogic assessment.
2. Mathematical Proof Discussion
Mathematical proof discussion elevates the traditional proof-writing exercise into a dynamic exchange about mathematical reasoning itself. Primarily used in pure mathematics courses like real analysis or abstract algebra, this method requires students not merely to construct proofs but to defend their logical architecture in real-time dialogue. Students present their proof strategy, explain their choice of proof technique (direct, contradiction, induction), and articulate why certain approaches fail while others succeed. The instructor probes the boundaries of the proof, asking students to identify where arguments might break down, which conditions are necessary versus sufficient, and how slight modifications to hypotheses would affect the conclusion. This transforms proof from a static artifact into a living argument that must withstand intellectual pressure.
Mathematical proof discussion demands sophisticated facilitation that challenges instructors to deepen their own practice. Faculty must develop the ability to recognize subtle logical gaps and formulate questions that probe conceptual foundations—skills that enhance their own mathematical thinking. While many students initially struggle with the transition from calculation to argumentation, this method helps bridge that gap by making the reasoning process explicit and collaborative. The time investment, often twenty to thirty minutes per student, pays dividends in developing the mathematical maturity that defines genuine understanding.
3. Experimental Design Defense
The oral defense of an experimental design shifts assessment from experimental results to experimental reasoning. Used extensively in chemistry, biology, and experimental physics courses, this method requires students to justify their design choices, defend their selection of controls and variables, and interpret unexpected data in real-time. Students must explain why they chose specific reagent concentrations, how they determined appropriate sample sizes, and what alternative methodologies they considered and rejected. When presented with anomalous results, which can either be their own or hypothetical ones posed by the instructor, students must generate plausible explanations and propose follow-up experiments. This assessment captures what written lab reports often obscure: whether students understand the logic of scientific inquiry or merely follow procedural recipes.
Successfully implementing an oral defense requires strategic resource allocation and training. Institutions can address scalability through well-trained teaching assistants who undergo calibration sessions to ensure consistent evaluation standards. While students from varied backgrounds may initially approach scientific discourse differently, the method’s emphasis on reasoning over vocabulary helps level the playing field when implemented thoughtfully. The requirement of actual laboratory experience becomes an opportunity to ensure all students gain hands-on research skills. The investment in this assessment method yields graduates who understand scientific thinking at a fundamental level, distinguishing those who grasp the logic of inquiry from those merely following procedures.
4. Studio Critique
The studio critique represents perhaps the oldest form of dialogic assessment, with roots stretching back to Renaissance artist workshops. In contemporary art, architecture, and design programs, the “crit” requires students to present their creative work to peers and instructors who provide immediate, often challenging feedback. But the assessment goes beyond evaluating the artifact itself. It examines students’ capacity to articulate their creative choices, respond to criticism constructively, and integrate multiple perspectives into their artistic vision. Students must explain not just what they created but why, discussing influences, techniques, conceptual frameworks, and the relationship between intention and execution. The public nature of the critique adds another layer: students learn to give feedback as well as receive it, developing the critical vocabulary essential to creative fields.
The studio critique’s success hinges on establishing a classroom culture that balances honest feedback with a supportive community. This is a skill instructors can develop through practice and peer observation. While aesthetic judgment varies among evaluators, this diversity enriches the learning experience when framed as multiple valid perspectives rather than conflicting authorities. Time requirements, though substantial, create space for deep engagement with creative work. Students from different cultural backgrounds bring valuable perspectives on criticism and creativity, enriching the dialogue when instructors create inclusive frameworks for participation. The method’s strength lies precisely in navigating these productive tensions, teaching students to defend creative choices while remaining open to growth.
5. Patient Consultation Performance
Patient consultation performance assessment immerses healthcare students in the complex dynamics of clinical interaction. Used throughout medical and nursing education, this method—often employing standardized patients (trained actors)—evaluates not just diagnostic accuracy but the full spectrum of clinical communication: taking patient histories, explaining conditions in accessible language, demonstrating empathy, and navigating difficult conversations about prognosis or treatment options. Students must integrate biomedical knowledge with interpersonal skills, adapting their communication style to patient backgrounds, emotional states, and health literacy levels. The assessment captures competencies that written exams cannot measure: reading nonverbal cues, managing time while being thorough, and maintaining professionalism under pressure.
Building a robust patient consultation assessment requires an institutional commitment that yields significant returns. Training standardized patients represents an investment in creating consistent, high-quality learning experiences that benefit multiple cohorts. Programs can enhance reliability through team-based evaluation and clear rubrics while reducing the individual faculty burden. And developing culturally diverse scenarios becomes an opportunity to prepare students for increasingly multicultural patient populations. While standardized encounters differ from actual clinical situations, they provide safe spaces for students to develop skills without risk to real patients. The method’s value multiplies as healthcare increasingly recognizes that effective care depends on communication skills developed through structured practice and feedback.
6. Field Research Presentation
Field research presentation assessment evaluates students’ ability to conduct, analyze, and defend empirical research in disciplines ranging from anthropology to ecology to urban planning. Students must present their research design, data collection methods, analytical approaches, and findings to an audience that challenges their methodological choices and interpretive conclusions. The assessment examines not just what students discovered but how they navigated the messy realities of field research: dealing with unexpected obstacles, adjusting methods mid-study, recognizing their own biases, and acknowledging the limitations of their data. Students must defend their sampling strategies, explain how they established rapport with research subjects or communities, and justify their analytical frameworks while remaining open to alternative interpretations.
Field research presentations reveal authentic research challenges that strengthen students’ adaptability and methodological sophistication. The assessment requires instructors with methodological expertise who can recognize creative problem-solving in students’ field adaptations. While resource requirements for fieldwork can be substantial, partnerships with community organizations and creative use of local sites can expand opportunities. Students from diverse backgrounds often bring unique advantages to different research settings, enriching collective understanding when their perspectives are valued. The time investment in supervision and evaluation produces researchers capable of transforming messy field observations into systematic knowledge, developing the intellectual independence essential for original scholarship.
7. Case Study Defense
Case study defense transforms the popular business school method into a rigorous dialogic assessment. Students receive complex, often ambiguous scenarios drawn from real organizational challenges. This can be a company facing disruption, a nonprofit navigating mission drift, or a public agency managing competing stakeholder demands. But unlike traditional case discussions where participation might be perfunctory, the defense requires individual students to present their analysis and recommendations while fielding intensive questioning from instructors and peers. Students must identify key problems, analyze contributing factors, propose solutions, and defend their recommendations against challenges about feasibility, unintended consequences, and alternative approaches. The assessment evaluates not just analytical sophistication but judgment under uncertainty, and thus the capacity to make defensible decisions with incomplete information.
The case study defense method navigates the productive tension between theoretical analysis and practical application. Instructors with professional experience enrich assessments by introducing real-world complexity that textbooks often simplify. While high-quality cases require investment to develop or buy, they become reusable resources that benefit multiple cohorts. Students from varied backgrounds bring different problem-solving approaches that enrich discussion when skillfully facilitated. The time required for individual defenses can be managed through careful scheduling and peer evaluation components. This method develops professional judgment that emerges only through defending reasoning against informed challenge.
8. Historical Role-Playing Exercise
Historical role-playing exercises transport students into past moments of decision and conflict, requiring them to embody historical actors while defending positions shaped by period-specific constraints and worldviews. Students might become delegates at the Congress of Vienna, advisors during the Cultural Revolution, or philosophers at medieval universities, arguing positions that might oppose their personal beliefs but align with their character’s historical context. The assessment evaluates students’ historical knowledge, their understanding of period mentalities and structural constraints, and their ability to think within unfamiliar frameworks. Through dialogue and debate with peers playing other historical figures, students must show not just what happened but why historical actors made choices that might seem inexplicable from contemporary perspectives.
Historical role-playing requires careful design to navigate ethical complexities while maximizing pedagogical value. Instructors must thoughtfully select scenarios that develop historical empathy without causing harm, often focusing on decision-makers facing difficult choices rather than perpetrators of violence. The extensive preparation required deepens students’ engagement with historical sources and contexts. Assessment rubrics can effectively distinguish historical understanding from theatrical performance when clearly articulated. Students who initially feel uncomfortable with performance often discover that embodying historical perspectives enhances their analytical insights. The method develops a crucial capacity for understanding how structural forces and cultural contexts shape human action.
9. Simulated Client Consultation
Simulated client consultation assessment bridges the gap between classroom learning and professional practice in fields from law to counseling to financial planning. Students conduct mock consultations with trained actors or volunteers playing clients facing realistic professional scenarios: a small business owner needing legal advice about intellectual property, a family navigating retirement planning, a couple seeking relationship counseling. The assessment evaluates multiple competencies simultaneously: technical knowledge application, communication skills, ethical reasoning, and professional judgment. Students must gather relevant information through strategic questioning, explain complex concepts in accessible terms, manage emotional dynamics, and provide actionable guidance while maintaining appropriate boundaries.
Creating authentic client simulations requires an investment that yields lasting educational value. Well-trained actors can portray multiple client types across different courses, maximizing resource efficiency. Developing diverse scenarios prepares students for professional practice in multicultural contexts, which is a crucial skill in globalized professions. Clear rubrics and team evaluation enhance reliability while distributing the assessment workload. Students from varied backgrounds often bring valuable perspectives on client communication that enrich collective learning. While simulations differ from actual practice, they provide essential safe spaces for developing professional competence through guided experience and reflection, building the judgment that emerges only through human interaction.
10. Mock Trial/Moot Court
Mock trial and moot court exercises immerse students in the adversarial dynamics of legal reasoning, requiring them to construct and defend arguments under the pressure of opposition and judicial scrutiny. Students must master not just legal doctrine but legal performance: presenting opening statements that frame narrative, conducting examinations that elicit favorable testimony, raising objections that demonstrate procedural knowledge, and delivering closing arguments that synthesize evidence into persuasion. During the appellate phase of a moot court, students must answer hypothetical queries from the judges that assess the scope and repercussions of their legal arguments. The assessment evaluates capacities that written briefs cannot capture: thinking on one’s feet, adapting arguments to judicial concerns, and maintaining composure under intellectual pressure.
Mock trial and moot court programs can be scaled effectively through peer judges, alumni volunteers, and law school partnerships. The extensive preparation required deepens students’ understanding of both legal doctrine and procedure. Carefully designed cases that balance competing arguments teach students that legal reasoning involves weighing legitimate competing interests rather than finding a single correct answer. Students from diverse backgrounds often bring fresh perspectives to legal argumentation when the classroom culture values different advocacy styles. The competitive element, when properly channeled, motivates deep engagement with material. These exercises develop essential capacities that define effective legal practice.
11. Policy Simulation
Policy simulation exercises place students in complex governance scenarios where they must negotiate competing interests, incomplete information, and unintended consequences. Students might role-play climate treaty negotiations, public health emergency responses, or urban development planning processes, each representing different stakeholder perspectives with conflicting goals and constraints. The assessment goes beyond evaluating policy knowledge to examine skills essential for governance: building coalitions, making compromises, communicating across ideological divides, and adapting to changing circumstances. Through structured rounds of negotiation, crisis injection, and policy implementation, students demonstrate not just what policies they favor but how they navigate the political process of making policy reality.
Policy simulation design requires a sophisticated understanding of governance that instructors can develop through collaboration with practitioners and policy experts. The method teaches valuable lessons about compromise and coalition-building when framed as skills for effective governance rather than cynical manipulation. Students from diverse political backgrounds enrich simulations by introducing unexpected alliances and creative solutions. Time investments can be managed by integrating simulations across multiple class sessions or courses. Clear individual assessment criteria within group exercises ensure accountability. The method develops essential governance competencies: the ability to navigate complex political processes that require human judgment, negotiation, and ethical reasoning.
12. Structured Academic Debate
Structured academic debate elevates classroom discussion into rigorous intellectual competition, requiring students to research, argue, and defend positions on complex scholarly questions. Unlike informal discussions where participation might be superficial, structured debate demands deep preparation: students must master relevant literature, anticipate counterarguments, and prepare rebuttals. The format varies, but all require students to engage in real-time intellectual combat. The assessment evaluates not just argument quality but intellectual agility: responding to unexpected challenges, identifying logical fallacies, and synthesizing complex information under time pressure.
Structured debate’s competitive framework requires careful management to emphasize truth-seeking over mere victory. Clear evaluation criteria that reward evidence quality, logical rigor, and intellectual honesty guide students toward scholarly argumentation. The method accommodates different cognitive styles through varied formats, some emphasizing preparation, others quick thinking. Debates on complex issues can explicitly require nuanced positions that transcend simple binaries. Research requirements deepen engagement with scholarly literature while teaching critical evaluation of sources. The public performance aspect, initially challenging for some, builds confidence and develops the communication skills that define scholarly discourse.
13. Socratic Seminar
The Socratic seminar transforms classroom discussion into philosophical inquiry, using carefully crafted questions to lead students toward deeper understanding rather than providing direct answers. Students engage with complex texts or problems through dialogue, with the instructor serving as facilitator rather than authority, guiding conversation through strategic questioning that reveals assumptions, exposes contradictions, and builds toward insight. The assessment focuses not on reaching predetermined conclusions but on demonstrating intellectual courage: acknowledging ignorance, changing positions when confronted with better arguments, and pursuing truth rather than defending ego. Through sustained dialogue, students develop what Socrates called intellectual midwifery—the capacity to help ideas emerge through questioning rather than imposing them through assertion.
Socratic seminars challenge instructors to embrace productive uncertainty while maintaining intellectual rigor. The facilitation skills required—knowing when to probe, when to allow silence, when to redirect—develop through practice and reflection. Different cultural communication styles enrich discussions when instructors create frameworks that value both direct challenge and indirect suggestion. The open-ended nature that some students initially find frustrating ultimately teaches them to tolerate ambiguity and pursue deeper understanding. Assessment can effectively evaluate process through clear criteria for intellectual courage, quality of questions, and openness to changing positions. The time investment required for genuine inquiry yields students capable of self-examination and the collaborative thinking that characterizes philosophical maturity.
14. Peer Code Review
Peer code review adapts software industry practices into dialogic assessment for computer science education. Students present their code to peers who examine logic, efficiency, style, and documentation, creating a dialogue about not just whether code works but whether it represents good craftsmanship. The assessment goes beyond automated testing to examine choices that matter in professional development: variable naming conventions that enhance readability, architectural decisions that affect maintainability, algorithm selections that balance efficiency with clarity. Through defending their implementation choices and responding to peer critiques, students develop metacognitive awareness about programming as both a technical and communicative practice.
Peer code review develops both technical and interpersonal skills essential for professional programming. Students learn to provide constructive feedback, a skill as valuable as coding itself in collaborative environments. Thoughtfully balanced review groups allow beginners to learn from advanced students while contributing fresh perspectives on code clarity. Assessment can distinguish between code quality and review quality through separate rubrics, recognizing both as important competencies. And cultural diversity in communication styles enriches discussions about code as human communication when instructors establish norms for respectful critique. The time invested in review sessions teaches students that professional programming is inherently collaborative, requiring code that colleagues can understand and maintain.
The Question of Scale
At ICERI2025, one question persistently surfaced in the discussions following presentations about dialogic assessment methods: “This sounds wonderful for small seminars, but how does it scale?” The question usually came from administrators or educators at large public institutions where introductory courses routinely enroll hundreds of students into single sections. Behind the practical concern lay a deeper anxiety: if AI makes traditional assessment obsolete and dialogic methods require small classes, how can mass higher education survive? The honest answer challenges comfortable assumptions about educational efficiency.
We should refrain from using AI to help in the assessment itself since AI-proof assessments, by their very definition, should not involve AI. This isn’t technological Luddism but logical necessity. The methods described above work precisely because they assess capabilities that emerge only through human interaction: the integration of lived experience with academic knowledge, the real-time navigation of unexpected challenges, the metacognitive awareness that allows students to recognize and correct their thinking as it unfolds. These assessments depend on what happens in the classroom during the learning process. AI lacks access to this situated context that makes dialogic assessment meaningful. Using AI to evaluate human dialogue would be like using a colorblind judge to assess a painting competition. The tool lacks the sensory apparatus to perceive what matters.
This incompatibility between dialogic assessment and algorithmic evaluation points toward a more profound transformation. We might witness the end of the industrialized approach to education, particularly in higher education. The large lecture hall becomes increasingly untenable when students can bypass traditional assessment methods. And as large classrooms prove almost impossible to conduct in an AI-proof manner, students and their parents are likely to value smaller classroom sizes in their college selection decisions. This isn’t speculation but economic logic. University education, particularly in the United States, requires an enormous financial investment from families. Spending tens or hundreds of thousands of dollars on something that AI can complete with no learning taking place will increasingly be seen as a waste of resources. And the human connection between students and teachers will gain value precisely because it represents something unique and desired.
AI therefore emerges not as education’s destroyer but as its most demanding change agent. It forces institutions to abandon the convenient fiction that education can be delivered on an industrial scale without sacrificing quality. It requires us to acknowledge what we’ve always known but found economically inconvenient to admit: real education happens in relationship, through dialogue, in communities small enough for teachers to know their students as individuals. The methods described in this article aren’t stopgap measures until better detection software arrives. They represent a return to education’s dialogic foundations—foundations that predate the industrial university and will outlast it. The institutions that survive the AI transformation won’t be those with the best detection algorithms but those that restructure themselves around human-scale learning communities where dialogue, not surveillance, verifies understanding.
What has been your experience with these, or other, dialogic assessment methods in your own classroom? As generative AI disrupts traditional practice, what conversations are you hearing at your institution? Is the focus on the “detection deception” and surveillance , or is there a genuine shift toward pedagogical solutions? How are you grappling with the challenge of “scale”?
I’d welcome your observations, strategies, and reflections in the comments below.
P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.







Michael, thank you for these notes. It has been a while since I taught, but after leaving academia I came back to teach software engineering from industry for a decade and did it through roleplays, client consultations, scenario exploration, technical critiques and project critiques as much as possible. It didn't just teach critical thought and technical analysis but conflict management, negotiation, team work, adaptation and consensus-building. It drew out students who had previously hidden behind books and screens.
I was delighted to see these represented, and the student response to this was very warm because they could see immediate application -- course feedback would see comments like "the most useful thing I studied in my degree." From an employer's perspective, these were all critical capabilities that tertiary graduates often lack.
I'm hoping you're right about max class sizes but I doubt the higher education sector in Australia will revert back to small classes. You should see our workload calculations! It's spreadsheet formulas!
Respectfully, I disagree that the educator/ student relationship is always going to be positive. It's structurally unequal so assuming positivity as an outcome feels to me like an act of power by those that hold it.
Anyway, I was introduced to programmatic assessment very recently and I feel like that's a game changer. In the degrees we offer, our most secure, authentic and professionally valid assessment is a practicuum placement. Its pass/ fail.
Program based assessment means we're going to look at assessment design over 8, 10 or 12 subjects rather than focus on each individual subject. It also reframes the conversation from 'beating AI' to how best to support students to become qualified professionals. Many of the ideas you talk about for assessment will be included but at key program points, not in each subject. I'll keep reading to learn more!