diff options
Diffstat (limited to 'aied2017')
-rw-r--r-- | aied2017/aied2017.tex | 15 | ||||
-rw-r--r-- | aied2017/introduction.tex | 22 |
2 files changed, 20 insertions, 17 deletions
diff --git a/aied2017/aied2017.tex b/aied2017/aied2017.tex index 5309c76..41bcbb7 100644 --- a/aied2017/aied2017.tex +++ b/aied2017/aied2017.tex @@ -13,15 +13,22 @@ \begin{document} -\title{Patterns for debugging student programs} -\author{TODO} +\title{Automatic extraction of AST patterns \\ for debugging student programs} +\author{Timotej Lazar, Martin Možina, Ivan Bratko} \institute{University of Ljubljana, Faculty of Computer and Information Science, Slovenia} \maketitle \begin{abstract} -We propose new program features to support mining data from student submissions in a programming tutor. We extract syntax-tree patterns from student programs, and use them as features to induce rules for predicting program correctness. Discovered rules allow us to correctly classify a large majority of submissions based only on their structural features. Rules can be used to recognize intent, and provide hints in a programming tutor by pointing out incorrect or missing patterns. Evaluating out approach on past student data, we were able to find errors in over 80\% of incorrect submissions. +% motivation +When implementing a programming tutor, it is often difficult to consider all possible errors encountered by students. A possible alternative is to automatically learn a bug library of erroneous patterns from students’ programs. +% learning +We propose using abstract-syntax-tree patterns as features for learning rules to distinguish between correct and incorrect programs. These rules can be used for debugging student programs: rules for incorrect programs (buggy rules) contain patterns indicating mistakes, whereas each rule for correct programs covers a subset of submissions sharing the same solution strategy. +% generating hints +To generate hints, we first check all buggy rules and point out incorrect patterns. If no buggy rule matches, rules for correct programs are used to recognize the student’s intent and suggest patterns that still need to be implemented. +% evaluation +We evaluated our approach on past student programming data for a number of Prolog problems. For many problems, the induced rules correctly classified over 90\% of programs based only on their structural features. For approximately 75\% of incorrect submissions we were able to generate hints that were implemented by the student in some subsequent submission. \\\\ -\textbf{Keywords:} Intelligent tutoring systems · Programming · Hint generation +\textbf{Keywords:} Programming tutors · Error diagnosis · Hint generation · Abstract syntax tree · Syntactic features \end{abstract} \input{introduction} diff --git a/aied2017/introduction.tex b/aied2017/introduction.tex index 3a81767..2b2362d 100644 --- a/aied2017/introduction.tex +++ b/aied2017/introduction.tex @@ -1,26 +1,22 @@ \section{Introduction} -% programming feedback: difficult and important -Programming is an increasingly useful skill even for those not directly involved in software development. Open online courses are making programming education accessible to many. Since thousands of students can attend such courses, it is impossible for teachers to evaluate each participant’s work individually. On the other hand, in-time feedback directly referring to the specific errors can play an important role in the learning process. Providing such feedback automatically would greatly enhance these courses. +% why automatic feedback +Programming education is becoming increasingly accessible with massive online courses. Since thousands of students can attend such courses, it is impossible for teachers to individually evaluate each participant’s work. On the other hand, in-time feedback directly addressing student’s errors can aid the learning process. Providing feedback automatically could thus greatly enhance these courses. % ITS background -Intelligent tutoring systems have traditionally used manually constructed domain models to find and explain errors in student work. Developing a sufficiently detailed model requires significant knowledge-engineering effort~\cite{folsom-kovarik2010plan}; this is particularly true for programming tutors, where most problems have several alternative solutions and many possible implementations~\cite{le2013operationalizing}. +Traditional intelligent tutoring systems use manually constructed domain models to generate feedback. Cognitive tutors model the problem-solving \emph{process}: how students write programs. This is challenging because there are no well-defined steps when writing a program (as there are in chess, for example). Many tutors instead only analyze individual versions of the student’s program, and disregard its evolution. These models are often coded in terms of constraints or bug libraries~\cite{keuning2016towards}. % data-driven domain modeling -Large amounts of available of educational data -- particularly from online courses -- have made data-driven domain models feasible. By mining student data a tutor can learn important concepts and common errors, allowing it to generate feedback automatically. +Developing the domain model requires significant knowledge-engineering effort~\cite{folsom-kovarik2010plan}. This is particularly true for programming tutors, where most problems have several alternative solutions and many possible implementations~\cite{le2013operationalizing}. Data-driven tutors reduce the necessary effort by mining educational data -- often from online courses -- to learn how to fix common errors and generate feedback. + % problem statement -This paper addresses the problem of finding useful “program features” to support data mining. Such features should be robust against superficial or irrelevant variations in program code, and relatable to the knowledge components of the target (programming) skill in order to support hint generation. +This paper addresses the problem of finding useful features to support data mining in programming tutors. Features should be robust against superficial or irrelevant variations in program code, and relatable to the knowledge components of the target skill (programming), so as to support hint generation. -% our approach -We use structural patterns to encode relations between variables in a program’s abstract syntax tree. Patterns describe paths between selected leaf nodes. Instead of storing the entire path, each pattern only includes certain internal nodes, allowing it to match in different programs containing the same relation. We have induced rules to predict program correctness based on the presence of these patterns. Rules allow us to generate hints for incorrect programs by looking for missing or incorrect patterns. +% our approach: patterns + rules +We describe features with \emph{patterns} that encode relations between variables in a program’s abstract syntax tree (AST). Patterns describe paths between certain “interesting” leaf nodes. By omitting some nodes on these paths, patterns match different programs containing the same relation. We then induce rules to predict program correctness based on AST patterns, allowing us to generate hints based on missing or incorrect patterns. % contributions -The present paper contributes: -\begin{itemize} -\item structural patterns as program features; -\item rules to predict the correctness of student programs; and -\item evaluation of the effectiveness of those rules for generating hints. -\end{itemize} +The main contributions presented in this paper are: AST patterns as features for machine-learning, a rule-based model for predicting program correctness, and hints generated from incorrect or missing patterns in student programs. %%% Local Variables: %%% mode: latex |