McMining: Automated Discovery of Misconceptions in Student Code
2510.08827v1
cs.SE, cs.AI, cs.CL, cs.CY
2025-10-14
Авторы:
Erfan Al-Hossami, Razvan Bunescu
Abstract
When learning to code, students often develop misconceptions about various
programming language concepts. These can not only lead to bugs or inefficient
code, but also slow down the learning of related concepts. In this paper, we
introduce McMining, the task of mining programming misconceptions from samples
of code from a student. To enable the training and evaluation of McMining
systems, we develop an extensible benchmark dataset of misconceptions together
with a large set of code samples where these misconceptions are manifested. We
then introduce two LLM-based McMiner approaches and through extensive
evaluations show that models from the Gemini, Claude, and GPT families are
effective at discovering misconceptions in student code.