\section{Background}

% solution strategies
Several programming tutors generate hints from differences between the student’s program and a predefined set of possible solutions. The possible solution strategies for each problem can be given as a set of programs, or specified in a domain-specific language. Both Johnson’s Pascal tutor~\cite{johnson1990understanding} and Hong’s Prolog tutor~\cite{hong2004guided} perform hierarchical goal decomposition based on predefined programming \emph{plans} or \emph{techniques} to determine the student’s intent. Gerdes et al. use a small set of annotated model programs to derive solution strategies, which function in a similar way~\cite{gerdes2016ask-elle}.

% canonicalization
Rivers and Koedinger compare students’ programs directly to a database of previous correct submissions~\cite{rivers2015data-driven}. They reduce program variability using equivalence-preserving transformations, such as inlining functions and reordering binary expressions. Hints are generated by suggesting a minimal correct step leading from the current submission to the closest correct program.

% unit tests
Another option is to compare program behavior. Nguyen et al. classify programming mistakes according to results on a preselected set of test inputs~\cite{nguyen2014codewebs}. Li et al. generate test cases to distinguish between programs by selecting inputs that exercise different code paths in the program~\cite{li2016measuring}. Such tutors can point out pertinent failing test cases, which can be very helpful.

% constraints
\emph{Constraints}~\cite{mitrovic2012fifteen} encode domain principles using if-then rules with relevance and satisfaction conditions, e.g. “if a function has a non-void return type, then it must have a return statement”~\cite{holland2009j-latte}. If a program violates a constraint, the tutor displays a predefined message. Le’s Prolog tutor improves constraint-based diagnosis by assigning weights to different types of constraints~\cite{le2009using}.

% data-driven tutors
Jin et al. use \emph{linkage graphs} to describe data dependencies between the program’s statements~\cite{jin2012program}; we use AST patterns in a similar way. Nguyen et al. analyzed submissions in a large machine-learning course to learn a vocabulary of \emph{code phrases}: subtrees of submissions’ abstract syntax trees that perform the same function in a given context~\cite{nguyen2014codewebs}. By swapping parts between different programs, they built up a search library of functionally equivalent AST subtrees within a given context.

% tree-regular expressions
The idea for AST patterns comes from Tregex -- \emph{tree regular expressions}, mainly used in the field of natural-language processing~\cite{levy2006tregex}. Tregex patterns can encode complex relations between nodes, but can become unwieldy; in this paper we use a simpler s-expression syntax. Another language for describing tree patterns using s-expressions is \emph{trx}, which additionally supports choice, repetition and other operators~\cite{bagrak2004trx}.


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "aied2017"
%%% End: