diff options
Diffstat (limited to 'aied2017')
-rw-r--r-- | aied2017/aied2017.tex | 18 | ||||
-rw-r--r-- | aied2017/background.tex | 92 | ||||
-rw-r--r-- | aied2017/introduction.tex | 29 |
3 files changed, 135 insertions, 4 deletions
diff --git a/aied2017/aied2017.tex b/aied2017/aied2017.tex index d342a9c..5309c76 100644 --- a/aied2017/aied2017.tex +++ b/aied2017/aied2017.tex @@ -2,15 +2,26 @@ \usepackage[utf8]{inputenc} +\usepackage{fancyvrb} +\fvset{commandchars=\\\{\},baselinestretch=0.98,samepage=true,xleftmargin=5mm} + +\usepackage{forest} + +\newcommand\code[1]{\texttt{#1}} +\newcommand\red[1]{{\begingroup\color[rgb]{0.8,0.15,0.15}#1\endgroup}} +\newcommand\hl[1]{\textbf{#1}} + \begin{document} -\title{TODO} +\title{Patterns for debugging student programs} \author{TODO} \institute{University of Ljubljana, Faculty of Computer and Information Science, Slovenia} \maketitle \begin{abstract} -TODO +We propose new program features to support mining data from student submissions in a programming tutor. We extract syntax-tree patterns from student programs, and use them as features to induce rules for predicting program correctness. Discovered rules allow us to correctly classify a large majority of submissions based only on their structural features. Rules can be used to recognize intent, and provide hints in a programming tutor by pointing out incorrect or missing patterns. Evaluating out approach on past student data, we were able to find errors in over 80\% of incorrect submissions. +\\\\ +\textbf{Keywords:} Intelligent tutoring systems · Programming · Hint generation \end{abstract} \input{introduction} @@ -20,4 +31,7 @@ TODO \input{evaluation} \input{conclusion} +\bibliographystyle{splncs} +\bibliography{aied2017} + \end{document} diff --git a/aied2017/background.tex b/aied2017/background.tex index 563e5ea..715267b 100644 --- a/aied2017/background.tex +++ b/aied2017/background.tex @@ -1 +1,91 @@ -\section{Background}
\ No newline at end of file +\section{Background} + +ITSs perform two main functions: suggesting problems for the student to solve, and providing immediate feedback as the student works on each problem~\cite{vanlehn2006behavior}. In this paper we focus on the latter: how to provide guidance to a student solving a programming exercise? Furthermore, we wish to do so in a data-driven fashion -- the domain model underlying our analysis should be learned from student data, without manual intervention for each problem. + +In order to mine data from past student submissions, we need to define code features or metrics that a) can easily be extracted from the code, b) carry information allowing us to distinguish programs, and c) are insensitive to superficial differences (such as different identifiers) between otherwise equivalent programs. This section reviews some of the existing approaches to defining such code features. + +% program-level features +%Considering only entire programs, and not their individual parts, avoids this issue as we always have a “full picture” of a program. The main challenge then is to account for the superficial differences between programs. + +Rivers and Koedinger reduce syntactic and semantic variations by \emph{canonicalizing} programs into normal form~\cite{rivers2015data-driven}. Among other transformations they inline functions, compute constant expressions and reorder commutative expressions. To generate hints, they find the correct program closest to the student’s submission, and construct a token-by-token edit path between those programs. +Jin et al. do not compare programs directly, but construct a \emph{linkage matrix} for each program, describing data dependencies between individual statements in a program~\cite{jin2012program}. This yields a simplified representation that is robust against variations and can be used to extract crucial differences between programs. + +% solution strategies +Several authors manually define \emph{plans} or \emph{strategies} to describe possible solutions for each problem~\cite{gerdes2016ask-elle,johnson1990understanding,hong2004guided}. These specifications are usually written in a domain-specific language and allow a tutor to discover intent from partial programs. + +% test-case outputs +Another possibility is to disregard the program’s structure and instead focus on dynamic analysis. Nguyen et al. consider two programs equivalent when they return the same outputs for a large battery of test cases~\cite{nguyen2014codewebs}. Li et al. generate informative test cases dynamically by selecting inputs that exercise different program paths~\cite{li2016measuring}. + +% proper features +In constraint-based modeling pioneered by Ohlsson and Mitrovic, student submission are analyzed in terms of invariant domain principles~\cite{mitrovic2012fifteen}. Constraints typically take the form of if-then rules, such as “if a function has a non-void return type, then it must contain a return statement”~\cite{holland2009j-latte}. Tutors using this approach look for unsatisfied constraints in submitted programs to generate feedback. +% TODO note pattern hierarchy used in Le’s tutor +Le’s Prolog tutor improves constraint-based diagnosis by assigning weights to different types of constraints~\cite{le2009using}. + +% code- / syntax-based features +The methods discussed so far only consider entire programs (after normalization), or focus on epiphenomenal features such as run-time behavior. Defining program features in terms of the program’s code or structure is difficult, as each part of a program can potentially affect -- or be affected by -- any other part. In other words, a statement that is correct in the context of one submission might be erroneous in another. + +% text / token runs +Several attempts have been made at defining such features. The simplest option is to consider fragments of program text; the MOSS system for detecting plagiarism is based on $k$-grams~\cite{schleimer2003winnowing}. CCFinder looks for similar token sequences in different programs to discover code clones~\cite{kamiya2002ccfinder}. OverCode uses individual lines of code as features, allowing a teacher to visualize the similarities between many student submissions~\cite{glassman2015overcode}. + +% features in programming ITS should be meaningful +While features defined in terms of strings or tokens are easy to implement, they lack the structural information that would allow us to reason about the program’s behavior. Our goal is to define features that capture individual concepts or recurring patterns in student submissions. Nguyen et al. used the Codewebs tool to discover the different AST subtrees that have the same function in a given context~\cite{nguyen2014codewebs}. + +% tree-regular expressions +The features used in the present paper are inspired by \emph{tree regular expressions} used in the Tregex tool by Levy and Andrew~\cite{levy2006tregex}, and the \emph{trx} system by Bagrak and Shivers~\cite{bagrak2004trx}. Extending ordinary regular expressions to trees allows us to define structural patterns with varying levels of specificity. + + +%% dynamic / temporal programming model +%The first major ITS paradigm employs a \emph{dynamic} domain model to support hint generation. Here, the problem-solving process is typically modeled as a search through a state space of partial and complete solutions. Solving a problem is represented by a sequence of well-defined steps from the initial state to one of the possible goal states. Cognitive and other model-tracing tutors use this approach~\cite{anderson1995cognitive,vanlehn2005andes}. +% +%% dynamic model: data-driven +%The Hint Factory uses student data to model the solution space as a Markov decision process~\cite{barnes2008toward}. Several data-driven tutors~\cite{jin2012program,rivers2013automatic,piech2015autonomously} adapt and extend this approach to the programming domain. This allows them to generate hints for errors that students have made (and fixed) in the past. +% +%% dynamic model: issues +%One of the issues when developing a dynamic domain model for programming is the lack of directly observable, interpretable problem-solving steps. Some tutors like the Lisp-ITS deal with this by restricting student input: students must write programs in top-to-bottom order, and fix any erroneous symbols before continuing. Data-driven tutors typically only include submissions -- versions of the program the student submitted for testing -- in the solution space, and do not ascribe any meaning (in domain concepts) to transitions between submissions. +% +%%E.g. model-tracing tutors. Not well-suited for programming tutors because: +%% - no directly interpretable problem-solving steps +%% - student traces are basically insane, so learning from them is not a good idea anyway (Piech 2015) +%% - human tutors are usually able to comment without referring to previous program versions +%%Since there are no meaningful observable actions in a free-form programming tutor, the “programming steps” in these models are simply arrows connecting programs in the solution space of incorrect and correct submissions. Commonly occurring steps are arrows spanning pairs of submissions that were often observed in sequence. +% +%% static programming model +%The other major tutoring paradigm uses a \emph{static} domain model. Instead of representing the entire problem-solving process, this approach only models individual solution states. Since a) programming steps are difficult to define and observe, and b) human teachers are often able to resolve student errors based only on the current program, many programming tutors opt for a static domain model. +%%Avoid the need to understand student actions by only considering each submission individually. +% +%% static model: constraint-based modeling +%Probably the most commonly used approach is constraint-based modeling~\cite{mitrovic2012fifteen}, where programs are checked against a set of constraints. Constraints typically take the form of rules like “if a function has a non-void return type, then it must contain a return statement”~\cite{holland2009j-latte}. Tutors using this approach base hints on unsatisfied constraints in a submitted program. Le’s Prolog tutor improves constraint-based diagnosis by assigning different weights to constraints~\cite{le2009using}. +% +%% static model: ad hoc +%While model-tracing and CBM are established paradigms with a somewhat standardized programming models, many tutors use ad hoc models. These can be defined manually or learned using machine-learning techniques. +%% - plan-based +%Johnson identified the key concepts when writing a Pascal program for a single exercise, encoding them into programming plans that allow discovering student intentions and diagnosing errors~\cite{johnson1990understanding}. +%% - grammar-based (techniques) (Brna/Huang/Le) +%Hong’s Prolog tutor uses a similar approach to analyze programs, based on programming techniques~\cite{brna1991prolog}. +%% - strategies (Gerdes) +%Similarly, Ask-Elle by Gerdes et al. uses manually defined programming strategies to automatically derive possible solutions for a given Haskell problem~\cite{gerdes2016ask-elle}. +% +%% data-driven models +%Data-driven methods have also been used for constructing static programming models. The idea is to extract information from a large set of correct and incorrect submissions; the main challenge is how to distinguish superficial variations between programs from semantically important differences. +%% canonicalization (Rivers 2015) +%Rivers et al. reduce syntactic and semantic variations by \emph{canonicalizing} programs into a normal form: inlining functions, computing constant expressions, reordering commutative expressions, and so on~\cite{rivers2015data-driven}. To generate hints, they find the correct program closest to the student’s submission, and construct a token-by-token edit path between those programs. +%% AST subtrees (Nguyen 2014) +%Nguyen et al. analyzed submissions in a machine-learning course to learn a vocabulary of \emph{code phrases}: subtrees of submissions’ abstract syntax trees that perform the same function in a given context. + + +%Generate hints based on the differences between incorrect and correct programs. Must distinguish between syntactic (irrelevant) and semantic (relevant) variations. +% - n-grams or other purely textual attributes +% - mostly used for plagiarism (MOSS) / code clone detection +% +%** Authoring tools (10%) +%An interpretable model can help with classifying errors and discovering concepts. +% - reference? +% +%** Regex for trees (20%) +%Syntactic n-grams (Sidorov). +%Tgrep / trx for structural patterns in NLP - use the same for describing code! + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "aied2017" +%%% End:
\ No newline at end of file diff --git a/aied2017/introduction.tex b/aied2017/introduction.tex index ac26833..3a81767 100644 --- a/aied2017/introduction.tex +++ b/aied2017/introduction.tex @@ -1 +1,28 @@ -\section{Introduction}
\ No newline at end of file +\section{Introduction} + +% programming feedback: difficult and important +Programming is an increasingly useful skill even for those not directly involved in software development. Open online courses are making programming education accessible to many. Since thousands of students can attend such courses, it is impossible for teachers to evaluate each participant’s work individually. On the other hand, in-time feedback directly referring to the specific errors can play an important role in the learning process. Providing such feedback automatically would greatly enhance these courses. + +% ITS background +Intelligent tutoring systems have traditionally used manually constructed domain models to find and explain errors in student work. Developing a sufficiently detailed model requires significant knowledge-engineering effort~\cite{folsom-kovarik2010plan}; this is particularly true for programming tutors, where most problems have several alternative solutions and many possible implementations~\cite{le2013operationalizing}. + +% data-driven domain modeling +Large amounts of available of educational data -- particularly from online courses -- have made data-driven domain models feasible. By mining student data a tutor can learn important concepts and common errors, allowing it to generate feedback automatically. +% problem statement +This paper addresses the problem of finding useful “program features” to support data mining. Such features should be robust against superficial or irrelevant variations in program code, and relatable to the knowledge components of the target (programming) skill in order to support hint generation. + +% our approach +We use structural patterns to encode relations between variables in a program’s abstract syntax tree. Patterns describe paths between selected leaf nodes. Instead of storing the entire path, each pattern only includes certain internal nodes, allowing it to match in different programs containing the same relation. We have induced rules to predict program correctness based on the presence of these patterns. Rules allow us to generate hints for incorrect programs by looking for missing or incorrect patterns. + +% contributions +The present paper contributes: +\begin{itemize} +\item structural patterns as program features; +\item rules to predict the correctness of student programs; and +\item evaluation of the effectiveness of those rules for generating hints. +\end{itemize} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "aied2017" +%%% End: |