aied2017/background.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

\section{Background}

ITSs perform two main functions: suggesting problems for the student to solve, and providing immediate feedback as the student works on each problem~\cite{vanlehn2006behavior}. In this paper we focus on the latter: how to provide guidance to a student solving a programming exercise? Furthermore, we wish to do so in a data-driven fashion -- the domain model underlying our analysis should be learned from student data, without manual intervention for each problem.

In order to mine data from past student submissions, we need to define code features or metrics that a) can easily be extracted from the code, b) carry information allowing us to distinguish programs, and c) are insensitive to superficial differences (such as different identifiers) between otherwise equivalent programs. This section reviews some of the existing approaches to defining such code features.

% program-level features
%Considering only entire programs, and not their individual parts, avoids this issue as we always have a “full picture” of a program. The main challenge then is to account for the superficial differences between programs.
 
Rivers and Koedinger reduce syntactic and semantic variations by \emph{canonicalizing} programs into normal form~\cite{rivers2015data-driven}. Among other transformations they inline functions, compute constant expressions and reorder commutative expressions. To generate hints, they find the correct program closest to the student’s submission, and construct a token-by-token edit path between those programs.
Jin et al. do not compare programs directly, but construct a \emph{linkage matrix} for each program, describing data dependencies between individual statements in a program~\cite{jin2012program}. This yields a simplified representation that is robust against variations and can be used to extract crucial differences between programs.

% solution strategies
Several authors manually define \emph{plans} or \emph{strategies} to describe possible solutions for each problem~\cite{gerdes2016ask-elle,johnson1990understanding,hong2004guided}. These specifications are usually written in a domain-specific language and allow a tutor to discover intent from partial programs.

% test-case outputs
Another possibility is to disregard the program’s structure and instead focus on dynamic analysis. Nguyen et al. consider two programs equivalent when they return the same outputs for a large battery of test cases~\cite{nguyen2014codewebs}. Li et al. generate informative test cases dynamically by selecting inputs that exercise different program paths~\cite{li2016measuring}.

% proper features
In constraint-based modeling pioneered by Ohlsson and Mitrovic, student submission are analyzed in terms of invariant domain principles~\cite{mitrovic2012fifteen}. Constraints typically take the form of if-then rules, such as “if a function has a non-void return type, then it must contain a return statement”~\cite{holland2009j-latte}. Tutors using this approach look for unsatisfied constraints in submitted programs to generate feedback.
% TODO note pattern hierarchy used in Le’s tutor
Le’s Prolog tutor improves constraint-based diagnosis by assigning weights to different types of constraints~\cite{le2009using}.

% code- / syntax-based features
The methods discussed so far only consider entire programs (after normalization), or focus on epiphenomenal features such as run-time behavior. Defining program features in terms of the program’s code or structure is difficult, as each part of a program can potentially affect -- or be affected by -- any other part. In other words, a statement that is correct in the context of one submission might be erroneous in another.

% text / token runs
Several attempts have been made at defining such features. The simplest option is to consider fragments of program text; the MOSS system for detecting plagiarism is based on $k$-grams~\cite{schleimer2003winnowing}. CCFinder looks for similar token sequences in different programs to discover code clones~\cite{kamiya2002ccfinder}. OverCode uses individual lines of code as features, allowing a teacher to visualize the similarities between many student submissions~\cite{glassman2015overcode}.

% features in programming ITS should be meaningful
While features defined in terms of strings or tokens are easy to implement, they lack the structural information that would allow us to reason about the program’s behavior. Our goal is to define features that capture individual concepts or recurring patterns in student submissions. Nguyen et al. used the Codewebs tool to discover the different AST subtrees that have the same function in a given context~\cite{nguyen2014codewebs}.

% tree-regular expressions
The features used in the present paper are inspired by \emph{tree regular expressions} used in the Tregex tool by Levy and Andrew~\cite{levy2006tregex}, and the \emph{trx} system by Bagrak and Shivers~\cite{bagrak2004trx}. Extending ordinary regular expressions to trees allows us to define structural patterns with varying levels of specificity.


%% dynamic / temporal programming model
%The first major ITS paradigm employs a \emph{dynamic} domain model to support hint generation. Here, the problem-solving process is typically modeled as a search through a state space of partial and complete solutions. Solving a problem is represented by a sequence of well-defined steps from the initial state to one of the possible goal states. Cognitive and other model-tracing tutors use this approach~\cite{anderson1995cognitive,vanlehn2005andes}.
%
%% dynamic model: data-driven
%The Hint Factory uses student data to model the solution space as a Markov decision process~\cite{barnes2008toward}. Several data-driven tutors~\cite{jin2012program,rivers2013automatic,piech2015autonomously} adapt and extend this approach to the programming domain. This allows them to generate hints for errors that students have made (and fixed) in the past.
%
%% dynamic model: issues
%One of the issues when developing a dynamic domain model for programming is the lack of directly observable, interpretable problem-solving steps. Some tutors like the Lisp-ITS deal with this by restricting student input: students must write programs in top-to-bottom order, and fix any erroneous symbols before continuing. Data-driven tutors typically only include submissions -- versions of the program the student submitted for testing -- in the solution space, and do not ascribe any meaning (in domain concepts) to transitions between submissions.
%
%%E.g. model-tracing tutors. Not well-suited for programming tutors because:
%%  - no directly interpretable problem-solving steps
%%  - student traces are basically insane, so learning from them is not a good idea anyway (Piech 2015)
%%  - human tutors are usually able to comment without referring to previous program versions
%%Since there are no meaningful observable actions in a free-form programming tutor, the “programming steps” in these models are simply arrows connecting programs in the solution space of incorrect and correct submissions. Commonly occurring steps are arrows spanning pairs of submissions that were often observed in sequence.
%
%% static programming model
%The other major tutoring paradigm uses a \emph{static} domain model. Instead of representing the entire problem-solving process, this approach only models individual solution states. Since a) programming steps are difficult to define and observe, and b) human teachers are often able to resolve student errors based only on the current program, many programming tutors opt for a static domain model.
%%Avoid the need to understand student actions by only considering each submission individually.
%
%% static model: constraint-based modeling
%Probably the most commonly used approach is constraint-based modeling~\cite{mitrovic2012fifteen}, where programs are checked against a set of constraints. Constraints typically take the form of rules like “if a function has a non-void return type, then it must contain a return statement”~\cite{holland2009j-latte}. Tutors using this approach base hints on unsatisfied constraints in a submitted program. Le’s Prolog tutor improves constraint-based diagnosis by assigning different weights to constraints~\cite{le2009using}.
%
%% static model: ad hoc
%While model-tracing and CBM are established paradigms with a somewhat standardized programming models, many tutors use ad hoc models. These can be defined manually or learned using machine-learning techniques.
%% - plan-based
%Johnson identified the key concepts when writing a Pascal program for a single exercise, encoding them into programming plans that allow discovering student intentions and diagnosing errors~\cite{johnson1990understanding}.
%%  - grammar-based (techniques) (Brna/Huang/Le)
%Hong’s Prolog tutor uses a similar approach to analyze programs, based on programming techniques~\cite{brna1991prolog}.
%%  - strategies (Gerdes)
%Similarly, Ask-Elle by Gerdes et al. uses manually defined programming strategies to automatically derive possible solutions for a given Haskell problem~\cite{gerdes2016ask-elle}.
%
%% data-driven models
%Data-driven methods have also been used for constructing static programming models. The idea is to extract information from a large set of correct and incorrect submissions; the main challenge is how to distinguish superficial variations between programs from semantically important differences.
%% canonicalization (Rivers 2015)
%Rivers et al. reduce syntactic and semantic variations by \emph{canonicalizing} programs into a normal form: inlining functions, computing constant expressions, reordering commutative expressions, and so on~\cite{rivers2015data-driven}. To generate hints, they find the correct program closest to the student’s submission, and construct a token-by-token edit path between those programs.
%%  AST subtrees (Nguyen 2014)
%Nguyen et al. analyzed submissions in a machine-learning course to learn a vocabulary of \emph{code phrases}: subtrees of submissions’ abstract syntax trees that perform the same function in a given context.


%Generate hints based on the differences between incorrect and correct programs. Must distinguish between syntactic (irrelevant) and semantic (relevant) variations.
%  - n-grams or other purely textual attributes
%    - mostly used for plagiarism (MOSS) / code clone detection 
%      
%** Authoring tools (10%)
%An interpretable model can help with classifying errors and discovering concepts.
%  - reference?
%
%** Regex for trees (20%)
%Syntactic n-grams (Sidorov).
%Tgrep / trx for structural patterns in NLP - use the same for describing code!

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "aied2017"
%%% End: