summaryrefslogtreecommitdiff
path: root/aied2017/patterns.tex
diff options
context:
space:
mode:
Diffstat (limited to 'aied2017/patterns.tex')
-rw-r--r--aied2017/patterns.tex21
1 files changed, 12 insertions, 9 deletions
diff --git a/aied2017/patterns.tex b/aied2017/patterns.tex
index 25a75e8..fe2ae6c 100644
--- a/aied2017/patterns.tex
+++ b/aied2017/patterns.tex
@@ -1,6 +1,6 @@
\section{AST patterns}
-In this section we describe AST patterns through examples, while Sect.~\ref{sec:extracting-patterns} explains how patterns are extracted from student programs. Consider the following Prolog program implementing the relation \code{sister(X,Y)}\footnote{Meaning “\code{X} is a sister of \code{Y}”.}:
+In this section we describe AST patterns through examples, while Sect.~\ref{sec:extracting-patterns} explains how patterns are extracted from student programs. Consider the following Prolog program implementing the relation \code{sister(X,Y)}\footnote{Binary relations like this one should be read as “\code{X} is a sister/parent/… of \code{Y}”.}:
\begin{Verbatim}
sister(X,Y):- % X is Y’s sister when:
@@ -10,7 +10,7 @@ sister(X,Y):- % X is Y’s sister when:
X \textbackslash{}= Y. % X and Y are not the same person.
\end{Verbatim}
-Figure~\ref{fig:sister} shows the program’s AST with two patterns. The pattern drawn with blue dotted arrows encodes the fact that the first argument to the \code{sister} predicate also appears in the call to \code{female}. In other words, this pattern states that \code{X} must be female to be a sister. We write the pattern as the s-expression
+Figure~\ref{fig:sister} shows the program’s AST with two patterns. The pattern drawn with blue dotted arrows encodes the fact that the first argument to the \code{sister} predicate also appears in the call to \code{female}. In other words, this pattern states that \code{X} must be female to be a sister. We write this pattern as the s-expression
\begin{Verbatim}[fontfamily=sf]
(clause (head (compound (functor \code{sister}) (args var)))
@@ -68,9 +68,11 @@ Figure~\ref{fig:sister} shows the program’s AST with two patterns. The pattern
\label{fig:sister}
\end{figure}
-Patterns describe relations between nodes in a program’s AST. Specifically, the pattern ($a$ $b$ $c$) means that the nodes $b$ and $c$ are descended from $a$, and that $b$ precedes $c$ in a depth-first tree walk. In general, an AST matches the pattern (\textsf{name} $p_1$ … $p_k$) if it contains a node $n$ labeled \textsf{name}; the subtree rooted at $n$ must also contain, in depth-first order, distinct nodes $n_1$ to $n_k$ matching subpatterns $p_1$ to $p_k$.
+Every pattern used in this paper has the same basic structure, and describes paths from a \textsf{clause} node to one or two leaf nodes containing variables or values. All patterns in Figs.~\ref{fig:sister} and~\ref{fig:sum} are induced from such node pairs. For each leaf we also include some local context, for example the predicate name (e.g. \texttt{parent}).
+
+We regard these patterns as the smallest units of meaning in Prolog programs: each pattern encodes some interaction between two parts of the program. Including more than two leaf nodes in a pattern could make it difficult to pinpoint the exact error when generating hints. Since a pattern contains at most two \textsf{var} nodes, we require they both refer to the same variable, since relating two nodes corresponding to different variables would not tell us much about the program. This allows us to omit variable names from patterns.
-We handle syntactic variations in programs by omitting certain nodes from patterns. For example, by not including \code{and} nodes, the above pattern can match predicates with goals in any order (any arrangement of \code{and} nodes in the AST). This way, the same pattern will match any clause containing the “\textsf{sister}-\textsf{female}” relation, regardless of other goals.
+We handle syntactic variations in programs by omitting certain nodes from patterns. For example, by not including \textsf{and} nodes, the above pattern can match a clause regardless of the presence (or order) of other goals in its body (any arrangement of \textsf{and} nodes in the AST). Order \emph{is} important for the nodes specified in the pattern; this is explained below.
The second pattern in Fig.~\ref{fig:sister}, drawn with solid red arrows, encodes the fact that the two calls to \code{parent} share the first argument. In other words, \code{X}~and~\code{Y} must have the same parent~\code{P}.
@@ -79,7 +81,7 @@ The second pattern in Fig.~\ref{fig:sister}, drawn with solid red arrows, encode
(compound (functor \code{parent}) (args var)))
\end{Verbatim}
-All patterns used in this paper have the same basic structure: a union of paths from some \textsf{clause} to a pair of leaf nodes with variables (or values). Some local context is included with each leaf, such as the functor of a compound term. Each pattern only refers to a single variable, so we do not include its name with the \textsf{var} nodes. We regard such patterns as the basic units of meaning in Prolog programs: each pattern encodes an interaction between two parts of the program accessing the same object (variable).
+Patterns describe relations between nodes in a program’s AST. Specifically, the pattern ($a$ $b$ $c$) means that the nodes $b$ and $c$ are descended from $a$, and that $b$ precedes $c$ in a depth-first tree walk. In general, an AST matches the pattern (\textsf{name} $p_1$ … $p_k$) if it contains a node $n$ labeled \textsf{name}; the subtree rooted at $n$ must also contain, in depth-first order, distinct nodes $n_1$ to $n_k$ matching subpatterns $p_1$ to $p_k$.
A relation between any two objects in a program is insufficient to reason about the program’s behavior on the whole. In the tutoring context, however, there are patterns that strongly indicate the presence of certain bugs. Take for example the following incorrect program to sum a list:
@@ -90,7 +92,7 @@ sum([H|T],Sum):- % \textit{recursive case:}
Sum is Sum + H. % add first element (\textit{bug:} reused variable)
\end{Verbatim}
-This error is fairly common with Prolog novices: the variable \code{Sum} is used to represent both the sum of the whole list, and the sum of only the tail elements. The last line fails since Prolog cannot unify \code{Sum} with a (usually) different value of \code{Sum\,+\,H}. The program’s AST is displayed in Fig.~\ref{fig:sum}.
+This error is fairly common with Prolog novices: the variable \code{Sum} is used to represent both the sum of the whole list (line 2), and the sum of only the tail elements (line 3). The last line fails since Prolog cannot unify \code{Sum} with a (usually) different value of \code{Sum\,+\,H}. The program’s AST is displayed in Fig.~\ref{fig:sum}.
\begin{figure}[htbp]
\centering
@@ -146,7 +148,7 @@ for tree={
\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,dashed,orange] (c2bb) edge[transform canvas={xshift=-1mm}] (c2bbv);
\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,dashed,orange] (c2bb) edge[transform canvas={xshift=0.9mm}] (c2bbo);
\end{forest}
- \caption{AST for the buggy \texttt{sum} program. Dotted arrows relate the correct values in the base case. Solid and dashed arrows denote two patterns describing incorrect reuse of the \code{Sum} variable.}
+ \caption{The AST for the buggy \texttt{sum} program. Dotted arrows relate the correct values in the base case. Solid and dashed arrows denote two patterns describing incorrect reuse of the \code{Sum} variable in the recursive case.}
\label{fig:sum}
\end{figure}
@@ -161,7 +163,7 @@ Several patterns capture this mistake. Solid red arrows in Fig.~\ref{fig:sum} sh
The second pattern, drawn with dashed orange arrows in the figure, indicates the likely error in the arithmetic expression:
\begin{Verbatim}[fontfamily=sf]
-(clause (binop (var \code{Sum}) \code{is} (binop (var \code{Sum}) \code{+})))
+(clause (binop var \code{is} (binop var \code{+})))
\end{Verbatim}
The leftmost pattern in Fig.~\ref{fig:sum}, drawn with dotted blue arrows, describes the correct relation between the two constants in the base-case rule:
@@ -173,7 +175,8 @@ The leftmost pattern in Fig.~\ref{fig:sum}, drawn with dotted blue arrows, descr
\noindent
We include such patterns in our feature set to cover the base-case clauses in recursive programs, which often include no variables.
-While the patterns used in this paper appear to be useful for analyzing Prolog programs, it is likely that other kinds of patterns will be needed for other programming languages. In Python, for example, variables can take on different values and be accessed from many places. This will likely require patterns relating more than two instances of a variable, or multiple variables.
+% TODO move to conclusion
+%While the patterns used in this paper appear to be useful for analyzing Prolog programs, it is likely that other kinds of patterns will be needed for other programming languages. In Python, for example, variables can take on different values and be accessed from many places. This will likely require patterns relating more than two instances of a variable, or multiple variables.
%%% Local Variables:
%%% mode: latex