From ad732e88887a72485ca03fae288bb9dc39911d15 Mon Sep 17 00:00:00 2001
From: Timotej Lazar <timotej.lazar@fri.uni-lj.si>
Date: Tue, 7 Feb 2017 18:49:34 +0100
Subject: Final version for review

---
 aied2017/aied2017.tex   |  2 +-
 aied2017/evaluation.tex | 30 +++++++++++++++-------------
 aied2017/method.tex     | 53 ++++++++++++++++++-------------------------------
 aied2017/patterns.tex   | 36 ++++++++++++++++-----------------
 4 files changed, 54 insertions(+), 67 deletions(-)

diff --git a/aied2017/aied2017.tex b/aied2017/aied2017.tex
index a7ed5e1..3faaaf8 100644
--- a/aied2017/aied2017.tex
+++ b/aied2017/aied2017.tex
@@ -41,7 +41,7 @@
 % motivation
 When implementing a programming tutor, it is often difficult to manually consider all possible errors encountered by students. An alternative is to automatically learn a bug library of erroneous patterns from students’ programs.
 % learning
-We propose using abstract-syntax-tree (AST) patterns as features for learning rules to distinguish between correct and incorrect programs. These rules can be used for debugging student programs: rules for incorrect programs (buggy rules) contain patterns indicating mistakes, whereas each rule for correct programs covers a subset of submissions sharing the same solution strategy.
+We propose using abstract-syntax-tree (AST) patterns as features for learning rules to distinguish between correct and incorrect programs. These rules can be used for debugging student programs: rules for incorrect programs (buggy rules) contain patterns indicating mistakes, whereas rules for correct programs cover subsets of submissions sharing the same solution strategy.
 % generating hints
 To generate hints, we first check all buggy rules and point out incorrect patterns. If no buggy rule matches, rules for correct programs are used to recognize the student’s intent and suggest patterns that still need to be implemented.
 % evaluation
diff --git a/aied2017/evaluation.tex b/aied2017/evaluation.tex
index bb3f4c6..32cb03c 100644
--- a/aied2017/evaluation.tex
+++ b/aied2017/evaluation.tex
@@ -1,10 +1,18 @@
-\section{Evaluation and discussion}
+\section{Evaluation}
 We evaluated our approach on 44 programming assignments. We preselected 70\% of students
 whose submissions were used as learning data for rule learning. The submissions from
 the remaining 30\% of students were used as testing data to evaluate classification accuracy of learned rules,
 and to retrospectively evaluate quality of given hints.
 
 \begin{table}[t]
+\caption{Results on five selected domains and averaged results over 44 domains.
+Columns 2, 3, and 4 contain classification accuracies of our rule learning method, majority classifier,  
+and random forest, respectively. Columns 5 and 6 report the number of all generated buggy hints
+and the number of hints that were actually implemented by students. The following three columns
+contain the number of all generated intent hints (All), 
+the number of implemented hints (Imp) and the number of implemented alternative hints (Alt). 
+The numbers in the last column are student submission where hints could not be generated. 
+The bottom two rows give aggregated results (total and average) over all 44 domains.}
 \centering
  \begin{tabular}{|l|rrr|rr|rrr|r|}
  \hline
@@ -24,22 +32,14 @@ and to retrospectively evaluate quality of given hints.
  Average & 0.857 & 0.663 &  0.908 & 79.73 & 77.34 & 46.75 & 26.36 & 5.55 & 23.75 \\
  \hline
  \end{tabular} 
-\caption{Results on five selected domains and averaged results over 44 domains.
-Columns 2, 3, and 4 contain classification accuracies of our rule learning method, majority classifier,  
-and random forest, respectively. Columns 5 and 6 report the number of all generated buggy hints
-and the number of hints that were actually implemented by students. The following three columns
-contain the number of all generated intent hints (All), 
-the number of implemented hints (Imp) and the number of implemented alternative hints (Alt). 
-The numbers in the last column are student submission where hints could not be generated. 
-The bottom two rows give aggregated results (total and average) over all 44 domains.}
 \label{table:eval} 
 \end{table} 
 
 Table~\ref{table:eval} contains results on five selected problems (each representing a group of problems from one lab session), and averaged results over all 44 problems.\footnote{We report only a subset of results due to space restrictions. Full results and source code can be found at \url{https://ailab.si/ast-patterns/}. } The second, third, and fourth columns provide classification accuracies (CA) of the rule-based, majority, and random-forest classifiers on testing data. The majority classifier and the random forests method,
 which had the best overall performance, serve as references for bad and good CA on particular data sets.
 
-For example, our rules correctly classified 99\% of testing instance for the \code{sister} problem,
-the accuracy of the majority classifier was 66\%, and random forests achieved 98\%. Rule CA is also high for problems \code{del} and \code{sum}. It is lower, however, for \code{is\_sorted} and \code{union}, suggesting that the proposed set of AST patterns is insufficient for certain problems. Indeed, after analyzing the problem \code{is\_sorted},
+For example, our rules correctly classified 99\% of testing instances for the \code{sister} problem,
+the accuracy of the majority classifier was 66\%, and random forests achieved 98\%. CA of rules is also high for problems \code{del} and \code{sum}. It is lower, however, for \code{is\_sorted} and \code{union}, suggesting that the proposed set of AST patterns is insufficient for certain problems. Indeed, after analyzing the problem \code{is\_sorted},
 we observed that our patterns do not cover predicates with a single empty-list (\code{[]}) argument, which occurs as the base case in this problem. For this reason, the rule learning
 algorithm failed to learn any C-rules and therefore all programs were classified as incorrect. In the case of \code{union}, many solutions use the cut (\code{!}) operator, which
 is also ignored by our pattern generation algorithm.
@@ -47,20 +47,22 @@ is also ignored by our pattern generation algorithm.
 We evaluated the quality of hints on incorrect submissions from those student traces
 that resulted in a correct program. In the case of the \code{sister} data set, there were 289 such incorrect submission out of 403 submissions in total.
 
-The columns capped by “Buggy hints” in Table~\ref{table:eval} contain evaluation of hints generated from rules
+The columns captioned “Buggy hints” in Table~\ref{table:eval} contain evaluation of hints generated from rules
 for incorrect programs (I-rules). For each generated buggy hint we checked whether
 it was implemented by the student in the final submission. The column “All” is
 the number of all generated buggy hints, while the column “Imp” is the number of
 implemented hints. The results show high relevance of generated buggy hints, as 97\% (3508 out of 3613) of them were  implemented in the final solution; in other words, the buggy pattern was removed.
 
-The intent hints are generated when the algorithm fails to find any buggy hints. The column “All” contains the number of generated intent hints, “Imp”' the number of implemented main intent hints, and “Alt” is the number
+The intent hints are generated when the algorithm fails to find any buggy hints. The column “All” contains the number of generated intent hints, “Imp” the number of implemented main intent hints, and “Alt” is the number
 of implemented alternative hints. Notice that the percentage of implemented intent hints is significantly lower
 when compared to buggy hints: in the case of problem \code{sister} 84 out of 127 (66\%) hints were implemented, whereas in the case of problem \code{union} only 66 out of 182 (36\%) hints were implemented. On average, 56\% of main intent hints were implemented.
 
 The last column shows the number of submissions where no hints could be generated. This value is relatively high
 for the \code{is\_sorted} problem, because the algorithm could not learn any C-rules and thus no intent hints were generated.
 
-To sum up, buggy hints seem to be good and reliable, since they are always implemented when presented, even when we tested them on past data -- the decisions of students were not influenced by these hints. The percentage of implemented intent hints is, on average, lower (56\%), which is still not a bad result, providing that it is difficult to determine the programmer’s intent. In 12\% (244 out 2057) where main intent hints were not implemented, students implemented an alternative hint that was identified by our algorithm. Overall we were able to generate a hint in 84.5\% of cases. In 73\% of all cases, the hints were also implemented. Last but not least, high classification accuracies in many problems imply that it is possible to correctly determine the correctness of a program by simply verifying the presence of a small number of patterns. Our hypothesis is that there exist some crucial patterns in programs that students need to resolve. When they figure out these patterns, the implementation of the remaining of the program is often straightforward.
+To sum up, buggy hints seem to be good and reliable, since they are always implemented when presented, even when we tested them on past data -- the decisions of students were not influenced by these hints. The percentage of implemented intent hints is, on average, lower (56\%), which is still not a bad result, providing that it is difficult to determine the programmer’s intent. In 12\% (244 out 2057) of generated intent hints, students implemented an alternative hint that was identified by our algorithm. Overall we were able to generate hints for 84.5\% of incorrect submissions. Of those hints, 86\% were implemented (73\% of all incorrect submissions).
+
+Last but not least, high classification accuracies in many problems imply that it is possible to correctly determine the correctness of a program by simply checking for the presence of a small number of patterns. Our hypothesis is that there exist some crucial patterns in programs that students have difficulties with. When they figure out these patterns, implementing the rest of the program is usually straightforward.
 
 
 %%% Local Variables:
diff --git a/aied2017/method.tex b/aied2017/method.tex
index 55537ab..5e27229 100644
--- a/aied2017/method.tex
+++ b/aied2017/method.tex
@@ -5,60 +5,42 @@ This section explains the three main components of our approach: automatically e
 \subsection{Extracting patterns}
 \label{sec:extracting-patterns}
 
-We construct patterns by connecting pairs of leaf nodes in a program’s AST. For this paper we always select a pair of nodes from the same clause: either two nodes referring to the same variable (like the examples in Fig.~\ref{fig:sister}), or a value (such as the empty list \code{[]} or \code{0}) and another variable or value in the same \textsf{compound} or \textsf{binop} (like the blue dotted pattern in Fig.~\ref{fig:sum}). For example, in the clause\footnote{The second occurrence of variables \code{A}, \code{B} and \code{C} is marked with ’ for disambiguation.}
+We construct patterns by connecting pairs of leaf nodes in a program’s AST. For this paper we always select a pair of nodes from the same clause: either two nodes referring to the same variable (like the examples in Fig.~\ref{fig:sister}), or a value (such as the empty list \code{[]} or the number \code{0}) and another variable or value in the same \textsf{compound} or \textsf{binop} (like the blue dotted pattern in Fig.~\ref{fig:sum}). For example, in the clause (the second occurrence of each variable -- \code{A}, \code{B} and \code{C} -- is marked with~’ for disambiguation)
 
 \begin{Verbatim}
 a(A,\textsf{ }B):-
   b(A\textsf{'},\textsf{ }C),
-  B\textsf{'} is C\textsf{'}\textsf{ }+\textsf{ }18.
+  B\textsf{'} is C\textsf{'}\textsf{ }+\textsf{ }1.
 \end{Verbatim}
 
 \noindent
-we select the following pairs of nodes: \{\code{A},\,\code{A\textsf{'}}\}, \{\code{B},\,\code{B\textsf{'}}\}, \{\code{C},\,\code{C\textsf{'}}\}, \{\code{B\textsf{'}},\,\code{18}\} and \{\code{C\textsf{'}},\,\code{18}\}.
+we select the following pairs of nodes: \{\code{A},\,\code{A\textsf{'}}\}, \{\code{B},\,\code{B\textsf{'}}\}, \{\code{C},\,\code{C\textsf{'}}\}, \{\code{B\textsf{'}},\,\code{1}\} and \{\code{C\textsf{'}},\,\code{1}\}.
 
 For each selected pair of leaf nodes $(a,b)$ we build a pattern by walking the AST in depth-first order, and recording nodes that lie on the paths to $a$ and $b$. We omit \textsf{and} nodes, as explained in the previous section. We also include certain nodes that do not lie on a path to any selected leaf. Specifically, we include the functor or operator of all \textsf{compound}, \textsf{binop} or \textsf{unop} nodes containing $a$ or $b$.
 
-Patterns constructed in this way form the set of features for rule learning. To keep this set at a reasonable size, we only use patterns that have appeared in programs submitted at least five times.
+Patterns constructed in this way form the set of features for rule learning. To keep this set at a reasonable size, we only use patterns that have occurred at least five times in submitted programs.
 
 \subsection{Learning rules}
 
-%\begin{figure}[t]
-%\noindent \textsc{Procedure $learn\_rules(D, sig, acc)$}
-%\\
-%\\
-%\noindent Given: data \emph{D}, threshold of significance \emph{sig}, threshold of accuracy \emph{acc}, and a 
-%procedure for learning rules $learn\_target\_rules(target, D, D_{acc}, sig, acc)$ that learns a set of rules for 
-%class \emph{target} from data \emph{D}, where: a) each attribute-value pair in the condition part 
-%needs to be significant using the likelihood-ratio test with significance level $p<sig$,
-%b) classification accuracy of rules on data $D_{acc}$ should be at least $acc$, and c) only positive values of 
-%attributes are mentioned in conditions. 
-%\begin{enumerate}
-%  \item Let \emph{I-rules} = $learn\_target\_rules(incorrect, D, D, sig, acc)$
-%  \item Let $D_c$ be data $D$ without programs covered by \emph{I-rules}
-%  \item Let \emph{C-rules} = $learn\_target\_rules(correct, D, D_c, sig, acc)$
-%  \item Return \emph{I-rules} and \emph{C-rules}
-% \end{enumerate}
-% \caption{An outline of the algorithm for learning rules. The method \emph{learn\_rules}, which induces rules  for a specific class, is a variant of the CN2 algorithm~\cite{cn21991} implemented within the Orange data-mining suite~\cite{demsar2013orange}. In all our experiments, \emph{sig} was set to 0.05 and \emph{acc} was set to 0.9.}
-% \label{figure:algorithm}
-%\end{figure}
-
 We represent students’ programs in the feature space of AST patterns described above. Each pattern corresponds to one binary feature with value $true$ when the pattern is present and $false$ when it is absent. We classify programs as correct or incorrect based on predefined unit tests for each problem, and use these labels for machine learning.
 
-Since programs can be validated with appropriate unit tests, our goal is not classifying new submissions, but to discover patterns related to program correctness. This approach to machine-learning is called \emph{descriptive induction} -- the automatic discovery of patterns describing regularities in data. We use rule learning for this task because conditions of rules can be easily translated to hints.
+Since programs can be validated with appropriate unit tests, our goal is not classifying new submissions, but rather to discover patterns associated with program correctness. This approach to machine learning is called \emph{descriptive induction} -- the automatic discovery of patterns describing regularities in data. We use rule learning for this task, because conditions of rules can be easily translated to hints.
 
-Before explaining the algorithm, let us discuss the reasons why a program can be incorrect. Our teaching experience indicates that bugs in student programs can often be described by 1) some incorrect or \emph{buggy} pattern, which needs to be removed, or 2) some missing relation (pattern) that should be included before the program can be correct. We shall now explain how both types of errors can be identified with rules.
+Before explaining the algorithm, let us discuss the reasons why a program can be incorrect. Our experience indicates that bugs in student programs can often be described by 1) some incorrect or \emph{buggy} pattern, which needs to be removed, or 2) some missing relation (pattern) that should be included before the program can be correct. We shall now explain how both types of errors can be identified with rules.
 
 To discover buggy patterns, the algorithm first learns rules that describe incorrect programs (I-rules). We use a variant of the CN2 algorithm~\cite{cn21991} implemented within the Orange data-mining toolbox~\cite{demsar2013orange}. Since we use rules to generate hints, and since hints should not be presented to students unless they are likely to be correct, we impose additional constraints on the rule learner:
 \begin{enumerate}
- \item The classification accuracy of each learned rule must exceed a certain threshold (in our experiments we used 90\% as threshold).
+ \item The classification accuracy of each learned rule must exceed a certain threshold (90\% in our experiments).
  \item Each conjunct in a condition must be significant with respect to the likelihood-ratio test (in our experiments significance threshold was set to $p=0.05$).
  \item A conjunct can only specify the presence of a pattern: we allow feature-value pairs with only $true$ as value. 
 \end{enumerate}
-\noindent The former two constraints are needed to induce good rules with significant patterns, while the latter constraint assures that rules mention only presence (and not absence) of patterns as reasons for a program to be incorrect. This is important, since these conditions ought to contain frequent patterns symptomatic of incorrect programs to be used by the hint generation procedure. 
 
-With respect to the second type of error, we could try the same approach and learn rules using the above algorithms for the class of correct programs (C-rules). Having accurate rules for correct programs, the conditional part of these rules would define sufficient combinations of patterns that render a program correct. However, it turns out that it is difficult to learn accurate rules for correct programs, since these rules would have to contain all patterns that need to be in the program and would have to specify the absence of all incorrect patterns, yet a conjunct can only specify the presence of a pattern. If specifying absence of patterns was allowed in rules' condition, the learning problem would still be difficult, since there can be many incorrect patterns, resulting in many C-rules.
+\noindent The former two constraints are needed to induce good rules with significant patterns, while the latter constraint assures that rules mention only presence (and not absence) of patterns as reasons for a program to be incorrect. This is important, since conditions of I-rules ought to contain patterns symptomatic of incorrect programs.
+
+With respect to the second type of error, we could try the same approach and learn rules using the above algorithm for the class of correct programs (C-rules). Having accurate rules for correct programs, the conditional part of these rules would define sufficient combinations of patterns that render a program correct.
+It turns out that it is difficult to learn accurate rules for correct programs without specifying absent patterns. If specifying absence of patterns were allowed in rules' condition, learning would result in too many C-rules.
 
-A possible way to solve this problem is to remove programs that are covered by rules for incorrect class. This way all known buggy patterns are removed from the data, therefore will not be included in C-rules. However, removing incorrect patterns also removes the need for relevant patterns. For example, if all incorrect programs were removed, the a single C-rule “$\mathsf{true} ⇒ \mathsf{correct}$” would suffice, which cannot be used to generate hints. We achieved the best results by learning from both data sets: we use the original data set (with all programs) to learn rules, while we use the data set without buggy patterns to test whether a rule achieves the required classification accuracy (90\%).
+A possible way to solve this problem is to remove programs that are covered by rules for incorrect class. This way all known buggy patterns are removed from the data, therefore will not be included in C-rules. However, removing incorrect patterns also removes the need for relevant patterns. For example, if all incorrect programs were removed, the single C-rule “$\mathsf{true} ⇒ \mathsf{correct}$” would suffice, which cannot be used to generate hints. We achieved the best results by learning from both data sets, using the original data set (with all programs) to learn rules, and the reduced data set to test whether a rule achieves the required classification accuracy (90\%).
 
 Even though our main interest is discovery of patterns, we can still use induced rules to classify new programs, for example to evaluate the quality of rules. The classification procedure has three steps: 1) if an I-rule covers the program, classify it as incorrect; 2) else if a C-rule covers the program, classify it as correct; 3) otherwise, if no rule covers the program, classify it as incorrect.
 
@@ -66,7 +48,7 @@ Even though our main interest is discovery of patterns, we can still use induced
 
 Once we have induced the rules for a given problem, we can use them to provide hints based on buggy or missing patterns. To generate a hint for an incorrect program, each rule is considered in turn. We consider two types of feedback: \emph{buggy} hints based on I-rules, and \emph{intent} hints based on C-rules.
 
-First, all I-rules are checked to find any known incorrect patterns in the program. To find the most likely incorrect patterns, the rules are considered in the order of decreasing quality. If all patterns in the rule “$p_1 ∧ ⋯ ∧ p_k ⇒ \mathsf{incorrect}$” match, we highlight the relevant leaf nodes. As an aside, we found that most I-rules are based on a single pattern. For the incorrect \code{sum} program from the previous section, our method produces the following highlight:
+First, all I-rules are checked to find any known incorrect patterns in the program. To find the most likely incorrect patterns, the rules are considered in the order of decreasing quality. If all patterns in the rule “$p_1 ∧ ⋯ ∧ p_k ⇒ \mathsf{incorrect}$” match, we highlight the relevant leaf nodes. As an aside, we found that most I-rules are based on the presence of a single pattern. For the incorrect \code{sum} program from the previous section, our method produces the following highlight
 
 \begin{Verbatim}
 sum([],0).          % \textit{base case:} the empty list sums to zero
@@ -75,18 +57,21 @@ sum([H|T],\red{\underline{Sum}}):-    % \textit{recursive case:}
   Sum is Sum + H.   %  add first element (\textit{bug:} reused variable)
 \end{Verbatim}
 
-If no I-rule matches the program, we use C-rules to determine the student’s intent. C-rules group patterns that together indicate a high likelihood that the program is correct. Each C-rule thus defines a particular “solution strategy” defined in terms of AST patterns. We reason that a hint alerting the student to a missing pattern could help them complete the program without revealing the whole solution.
+\noindent
+based on the rule “$p ⇒ \mathsf{incorrect}$”, where $p$ corresponds to the solid red pattern in Fig.~\ref{fig:sum}. This rule covers 36 incorrect programs, and one correct program using an unusual solution strategy.
+
+If no I-rule matches the program, we use C-rules to determine the student’s intent. C-rules group patterns that together indicate a high likelihood that the program is correct. Each C-rule thus defines a particular “solution strategy” in terms of AST patterns. We reason that alerting the student to a missing pattern could help them complete the program without revealing the whole solution.
 
 When generating a hint from C-rules, we consider all \emph{partially matching} rules “$p_1 ∧ ⋯ ∧ p_k ⇒ \mathsf{correct}$”, where the student’s program matches some (but not all) patterns $p_i$. For each such rule we store the number of matching patterns, and the set of missing patterns. We then return the most common missing pattern among the rules with most matching patterns.
 
 For example, if we find the following missing pattern for an incorrect program implementing the \code{sister} predicate:
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (head (compound (functor "\code{sister}") (args var))) (binop var "\code{\textbackslash{}=}"))\textrm{,}
+(clause (head (compound (functor ‘\code{sister}’) (args var))) (binop var ‘\code{\textbackslash{}=}’))\textrm{,}
 \end{Verbatim}
 
 \noindent
-we could display a message to the student saying “you are missing a comparison between \code{X} and some other value, with the form \code{X} \code{\textbackslash{}=} \code{?}”.
+we could display a message to the student saying “comparison between \code{X} and some other value is missing”, or “your program is missing the goal \code{X} \code{\textbackslash{}=} \code{?}”.
 
 This method can find more than one missing pattern for a given partial program. In such cases we return the most commonly occurring pattern as the main hint, and other candidate patterns as alternative hints. We use main and alternative intent hints to establish the upper and lower bounds when evaluating automatic hints.
 
diff --git a/aied2017/patterns.tex b/aied2017/patterns.tex
index 11909bf..e1aa767 100644
--- a/aied2017/patterns.tex
+++ b/aied2017/patterns.tex
@@ -13,8 +13,8 @@ sister(X,Y):-     % X is Y’s sister when:
 Figure~\ref{fig:sister} shows the program’s AST with two patterns. The pattern drawn with blue dotted arrows encodes the fact that the first argument to the \code{sister} predicate also appears in the call to \code{female}. In other words, this pattern states that \code{X} must be female to be a sister. We write this pattern as the s-expression
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (head (compound (functor \code{sister}) (args var)))
-    (compound (functor \code{female}) (args var)))
+(clause (head (compound (functor ‘\code{sister}’) (args var)))
+    (compound (functor ‘\code{female}’) (args var)))
 \end{Verbatim}
 
 \begin{figure}[htbp]
@@ -70,26 +70,26 @@ Figure~\ref{fig:sister} shows the program’s AST with two patterns. The pattern
 
 Every pattern used in this paper has the same basic structure, and describes paths from a \textsf{clause} node to one or two leaf nodes containing variables or values. All patterns in Figs.~\ref{fig:sister} and~\ref{fig:sum} are induced from such node pairs. For each leaf we also include some local context, for example the predicate name (e.g. \texttt{parent}).
 
-We regard these patterns as the smallest units of meaning in Prolog programs: each pattern encodes some interaction between two parts of the program. Including more than two leaf nodes in a pattern could make it difficult to pinpoint the exact error when generating hints. Since a pattern contains at most two \textsf{var} nodes, we require they both refer to the same variable, since relating two nodes corresponding to different variables would not tell us much about the program. This allows us to omit variable names from patterns.
+We regard these patterns as the smallest units of meaning in Prolog programs: each pattern encodes some interaction between two parts of the program. Including more than two leaf nodes in a pattern could make it difficult to pinpoint the exact error when generating hints. Each pattern contains at most two \textsf{var} nodes, so we require they both refer to the same variable -- relating two nodes with different variables would not tell us much about the program. This allows us to omit variable names from patterns.
 
-We handle syntactic variations in programs by omitting certain nodes from patterns. For example, by not including \textsf{and} nodes, the above pattern can match a clause regardless of the presence (or order) of other goals in its body (any arrangement of \textsf{and} nodes in the AST). Order \emph{is} important for the nodes specified in the pattern; this is explained below.
+We handle syntactic variations in programs by omitting certain nodes from patterns. For example, by not including \textsf{and} nodes, the above pattern can match a clause regardless of the presence (or order) of other goals in its body (i.e., with any arrangement of \textsf{and} nodes in the AST). Order \emph{is} important for the nodes specified in the pattern; this is explained below.
 
 The second pattern in Fig.~\ref{fig:sister}, drawn with solid red arrows, encodes the fact that the two calls to \code{parent} share the first argument. In other words, \code{X}~and~\code{Y} must have the same parent~\code{P}.
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (compound (functor \code{parent}) (args var))
-    (compound (functor \code{parent}) (args var)))
+(clause (compound (functor ‘\code{parent}’) (args var))
+    (compound (functor ‘\code{parent}’) (args var)))
 \end{Verbatim}
 
 Patterns describe relations between nodes in a program’s AST. Specifically, the pattern ($a$ $b$ $c$) means that the nodes $b$ and $c$ are descended from $a$, and that $b$ precedes $c$ in a depth-first tree walk. In general, an AST matches the pattern (\textsf{name} $p_1$ … $p_k$) if it contains a node $n$ labeled \textsf{name}; the subtree rooted at $n$ must also contain, in depth-first order, distinct nodes $n_1$ to $n_k$ matching subpatterns $p_1$ to $p_k$. The above pattern, for example, matches only the last of the following programs (the first program is missing one call to \code{parent}, and the second has different variables in positions encoded by the pattern):
 
 \begin{Verbatim}
-\textit{\red{% nonmatching}}         \textit{\red{% nonmatching}}       \textit{\green{% matching}}
-sister(X,Y):-         sister(X,Y):-       sister(X,Y):-
-  female(X),            female(X),          parent(A,X),
-  parent(P,X),          parent(A,X),        female(X),
-  X \textbackslash{}= Y.               parent(B,Y),        parent(A,Y),
-                        X \textbackslash{}= Y.             X \textbackslash{}= Y.
+\textit{\red{% nonmatching}}          \textit{\red{% nonmatching}}          \textit{\green{% matching}}
+sister(X,Y):-          sister(X,Y):-          sister(X,Y):-
+  female(X),             female(X),             parent(A,X),
+  parent(P,X),           parent(A,X),           female(X),
+  X \textbackslash{}= Y.                parent(B,Y),           parent(A,Y),
+                         X \textbackslash{}= Y.                X \textbackslash{}= Y.
 \end{Verbatim}
 
 A relation between any two objects in a program is insufficient to reason about the program’s behavior on the whole. In the tutoring context, however, there are patterns that strongly indicate the presence of certain bugs. Take for example the following incorrect program to sum a list:
@@ -101,7 +101,7 @@ sum([H|T],Sum):-    % \textit{recursive case:}
   Sum is Sum + H.   %  add first element (\textit{bug:} reused variable)
 \end{Verbatim}
 
-This error is fairly common with Prolog novices: the variable \code{Sum} is used to represent both the sum of the whole list (line 2), and the sum of only the tail elements (line 3). The last line fails since Prolog cannot unify \code{Sum} with a (usually) different value of \code{Sum\,+\,H}. The program’s AST is displayed in Fig.~\ref{fig:sum}.
+This error is fairly common with Prolog novices: the variable \code{Sum} is used to represent both the sum of the whole list (line 2), and the sum of only the tail elements (line 3). The last line fails since Prolog cannot unify \code{Sum} with a (generally) different value of \code{Sum\,+\,H}. The program’s AST is displayed in Fig.~\ref{fig:sum}.
 
 \begin{figure}[htbp]
   \centering
@@ -161,24 +161,24 @@ for tree={
   \label{fig:sum}
 \end{figure}
 
-Several patterns capture this mistake. Solid red arrows in Fig.~\ref{fig:sum} show one example -- \code{Sum} returned by the predicate should not be the same as the \code{Sum} from the recursive call:
+Various patterns capture this mistake. Solid red arrows in Fig.~\ref{fig:sum} show one example -- \code{Sum} returned by the predicate should not be the same as the \code{Sum} from the recursive call:
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (head (compound (functor \code{sum}) (args (args var))))
-    (compound (functor \code{sum}) (args (args (var)))))
+(clause (head (compound (functor ‘\code{sum}’) (args (args var))))
+    (compound (functor ‘\code{sum}’) (args (args var))))
 \end{Verbatim}
 
 \noindent
 The second pattern, drawn with dashed orange arrows in the figure, indicates the likely error in the arithmetic expression:
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (binop var \code{is} (binop var \code{+})))
+(clause (binop var ‘\code{is}’ (binop var ‘\code{+}’)))
 \end{Verbatim}
 
 The leftmost pattern in Fig.~\ref{fig:sum}, drawn with dotted blue arrows, describes the correct relation between the two constants in the base-case rule:
 
 \begin{Verbatim}[fontfamily=sf]
-(clause (head (compound (functor \code{sum}) (args \code{[]} (args \code{0})))))
+(clause (head (compound (functor ‘\code{sum}’) (args \code{[]} (args \code{0})))))
 \end{Verbatim}
 
 \noindent
-- 
cgit v1.2.1