\section{Method} This section explains the three main components of our approach: automatically extracting patterns from student submissions, learning classification rules for correct and incorrect programs, and using those rules to generate hints. \subsection{Extracting patterns} \label{sec:extracting-patterns} We construct patterns by connecting pairs of leaf nodes in a program’s AST. For this paper we always select a pair of nodes from the same clause: either two nodes referring to the same variable (like the examples in Fig.~\ref{fig:sister}), or a value (such as the empty list \code{[]} or \code{0}) and another variable or value in the same \textsf{compound} or \textsf{binop} (like the blue dotted pattern in Fig.~\ref{fig:sum}). For example, in the clause\footnote{The second occurrence of variables \code{A}, \code{B} and \code{C} is marked with ’ for disambiguation.} \begin{Verbatim} a(A,\textsf{ }B):- b(A\textsf{'},\textsf{ }C), B\textsf{'} is C\textsf{'}\textsf{ }+\textsf{ }18. \end{Verbatim} \noindent we select the following pairs of nodes: \{\code{A},\,\code{A\textsf{'}}\}, \{\code{B},\,\code{B\textsf{'}}\}, \{\code{C},\,\code{C\textsf{'}}\}, \{\code{B\textsf{'}},\,\code{18}\} and \{\code{C\textsf{'}},\,\code{18}\}. For each selected pair of leaf nodes $(a,b)$ we build a pattern by walking the AST in depth-first order, and recording nodes that lie on the paths to $a$ and $b$. We omit \textsf{and} nodes, as explained in the previous section. We also include certain nodes that do not lie on a path to any selected leaf. Specifically, we include the functor or operator of all \textsf{compound}, \textsf{binop} or \textsf{unop} nodes containing $a$ or $b$. Patterns constructed in this way form the set of features for rule learning. To keep this set at a reasonable size, we only use patterns that have appeared in programs submitted at least five times. \subsection{Learning rules} %\begin{figure}[t] %\noindent \textsc{Procedure $learn\_rules(D, sig, acc)$} %\\ %\\ %\noindent Given: data \emph{D}, threshold of significance \emph{sig}, threshold of accuracy \emph{acc}, and a %procedure for learning rules $learn\_target\_rules(target, D, D_{acc}, sig, acc)$ that learns a set of rules for %class \emph{target} from data \emph{D}, where: a) each attribute-value pair in the condition part %needs to be significant using the likelihood-ratio test with significance level $p