The Augmented Educator

The Augmented Educator

The Theater of Competence: Role-Playing Assessment

Deep Dives Into Assessment Methods for the AI Age, Part 5

Michael G Wagner's avatar
Michael G Wagner
Feb 06, 2026
∙ Paid
Upgrade to paid to play voiceover

This post follows my standard early access schedule: paid subscribers today, free for everyone on February 17.

This series has traced a common thread through its examination of AI-resistant assessment: the shift from evaluating artifacts to observing performance, or, more accurately, from what students produce in isolation to what they demonstrate through presence. Part 1 explored the design critique, where thinking becomes visible through real-time defense. In part 2, we examined video logs as multimodal evidence anchored in the body. Part 3 turned to the Socratic Seminar, where the conversation itself becomes the object of assessment. And part 4 investigated the whiteboard defense, where problem-solving becomes visible as it unfolds in real time.

Each of these methods addresses a particular dimension of the assessment problem in the age of generative AI. Yet they still position students as themselves: a designer defending their design choices, or a student working through a problem they have been assigned. What none of these methods fully captures is the capacity to act under conditions of genuine social uncertainty. They cannot assess if a student can step into an unfamiliar role, encounter another person whose responses cannot be predicted, and navigate toward some good outcome using knowledge, judgment, and interpersonal skills simultaneously.

This fifth and final installment examines a method that places students directly inside the problem they are trying to solve: Role-Playing Assessment. Unlike methods that position students as commentators on their own work, role-playing asks students to become practitioners. They need to inhabit a professional or social identity and respond to an unfolding scenario as if the stakes were real. The assessment, therefore, is not what they say about what they would do. It is what they actually do, right now, with another person watching.

The Archaeology of Enactment: From Psychodrama to Professional Licensure

Role-playing assessment did not emerge from educational theory. It was born from the recognition that traditional testing systematically failed to predict performance under real-world conditions. To understand why this method works, we need to trace its convergent origins across three distinct fields: psychotherapy, military intelligence, and medical education.

The intellectual bedrock lies in the work of Jacob L. Moreno, a Viennese psychiatrist who, in the 1920s, developed psychodrama as a therapeutic technique. While Freud emphasized the excavation of the past through talk, Moreno insisted on the power of the present through action. Patients did not merely describe their conflicts; they enacted them on a stage, with trained auxiliaries playing significant figures in their lives. Moreno’s insight was that we learn who we are by trying on different social masks—that role-playing initiates the emergence of the self. This insight would prove foundational. He also developed the structural vocabulary that role-playing assessment still uses: the protagonist (the subject being assessed), the auxiliary ego (the supporting actor who facilitates the scenario, often also called the confederate), and the director (the educator who controls the simulation).

The transition from therapy to assessment occurred during World War II, when the US Office of Strategic Services faced a problem that no written test could solve: how to select spies. The OSS needed people who could operate behind enemy lines, maintain cover identities under interrogation, and lead resistance cells in hostile territory. Intelligence quotient was insufficient. What mattered was performance under stress.

At a secret facility in Virginia, OSS psychologists designed simulations that would later become templates for corporate and educational assessments. One exercise required candidates to move a heavy load across a stream using limited resources while observers evaluated leadership and cooperation. Another asked candidates to direct two “helpers” in building a wooden structure. The helpers were trained actors instructed to be obstructive and insulting. The goal was not to complete the task but to observe how candidates managed their frustration. This marked the first systematic use of role-play to measure personality traits and behavioral competencies for high-stakes selection.

After the war, these techniques migrated to the corporate world. AT&T’s landmark Management Progress Study in 1956 formalized what would become the Assessment Center method: a multi-day evaluation where candidates participated in business games, in-basket exercises, and role-plays designed to reveal competencies that interviews and tests could not detect. The innovation was methodological rigor. AT&T developed Behaviorally Anchored Rating Scales that moved assessment away from vague impressions toward specific observable behaviors.

But the most consequential development came from medicine. In 1975, Ronald Harden and his team at the University of Dundee introduced the Objective Structured Clinical Examination (OSCE), originally dubbed the “Steeplechase” because students moved from station to station like horses jumping fences. Harden’s genius was to break clinical competence into discrete, observable units. Instead of one long, variable interaction with an actual patient, where a student might pass because they drew a simple case or a lenient examiner, the OSCE industrialized clinical assessment. A student might complete twenty short, standardized stations: taking a history from an actor at Station 1, interpreting an ECG at Station 2, demonstrating CPR on a mannequin at Station 3. The format offered objectivity through standardized scripts, structure through specific checklists, and clinical relevance through performance-based tasks.

This history reveals role-playing assessment as a response to a failure that predates AI by decades: the gap between knowing that and knowing how. Written examinations measure whether a student can recall information and synthesize concepts. They cannot measure whether that student can navigate the embodied, interpersonal, emotionally volatile reality of professional practice. A surgeon may possess encyclopedic knowledge of anatomy, yet fail to manage a palliative care consultation. A lawyer may cite precedent with precision, yet crumble under judicial scrutiny. Role-playing assessment exists because competence is not merely cognitive. It is somatic, relational, and performative.

Why Role-Playing Assessment Resists Artificial Intelligence

The resilience of role-playing assessment does not rest on the claim that AI is merely “not good enough yet” at conversation. Current large language models can sustain coherent, persona-driven dialogue for extended interactions. They can adopt the voice of a patient with chronic back pain, a disgruntled employee, or a skeptical investor. The challenge is not that AI cannot participate in role-plays, but that it cannot demonstrate the specific human capacities that role-plays are designed to assess.

User's avatar

Continue reading this post for free, courtesy of Michael G Wagner.

Or purchase a paid subscription.
© 2026 Michael G Wagner · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture