Pre-Con. Workshop

#1405: An Evaluation Framework for Pedagogical Large Language Models

Mon Jun 15, 9:00 AM–5:00 PM · DCE 3020

Part of Pre-Conference Workshops - Full Day

Intelligent Tutoring & Adaptive Systems Generative AI & Large Language Models Assessment, Feedback & Formative Practices Computational Thinking & CS Education

Large language models (LLMs) are increasingly adopted as AI tutors, providing scalable, interactive assistance across a range of topics. Yet there remains a lack of a unified, comprehensive framework for evaluating the pedagogical behaviors these models exhibit across multiple tutoring sessions and varying learning scenarios. This workshop addresses this gap by introducing a two-layer evaluation framework that distinguishes between (1) foundational behaviors that LLMs must demonstrate to be considered pedagogically qualified, and (2) stylistic instructional behaviors that shape their effectiveness as tutors. Participants will refine and apply this framework through hands-on interactions with an LLM-powered tutoring system, engaging in a five-session sequence of introductory Python programming modules. Through this workshop, we aim to advance the application of generative AI tools in education settings in a safe and pedagogically effective manner.

Authors

Eugene Park, Daniel Wendel, Grace Lin, Sharifa Alghowinem