diff options
Diffstat (limited to 'aied2018')
-rw-r--r-- | aied2018/aied2018.tex | 1 | ||||
-rw-r--r-- | aied2018/patterns.tex | 33 | ||||
-rw-r--r-- | aied2018/rules.tex | 3 |
3 files changed, 17 insertions, 20 deletions
diff --git a/aied2018/aied2018.tex b/aied2018/aied2018.tex index 26cfe6e..2214c6e 100644 --- a/aied2018/aied2018.tex +++ b/aied2018/aied2018.tex @@ -9,6 +9,7 @@ \newunicodechar{⋯}{\ensuremath{\cdots}} \usepackage{bold-extra} +\usepackage{bm} \usepackage{hyperref} \usepackage[normalem]{ulem} diff --git a/aied2018/patterns.tex b/aied2018/patterns.tex index f6ef46e..d50a62e 100644 --- a/aied2018/patterns.tex +++ b/aied2018/patterns.tex @@ -2,14 +2,15 @@ We encode structural patterns in ASTs using tree regular expressions (TREs). An ordinary regular expression describes the set of strings matching certain constraints; similarly, a TRE describes the set of trees containing certain nodes and relations. Since TREs describe structure, they are themselves represented as trees. More specifically, both ASTs and TREs are ordered rooted trees. -In this work we used TREs to encode (only) child and sibling relations in ASTs. We write them as S-expressions, such as \pattern{(a (b \textbf{\textasciicircum}~d~\textbf{.}~e~\$) c)}. This expression matches any tree satisfying the following constraints: +In this work we used TREs to encode (only) child and sibling relations in ASTs. We write them as S-expressions, such as \pattern{(a (b \textbf{\textasciicircum}~d~\textbf{.}~e~\$) c)}. This expression matches any tree satisfying the following constraints (see Fig.~\ref{fig:tre-example} for an example): \begin{itemize} -\item the root node \pattern{a} has at least two children, \pattern{b} and \pattern{c}, in that order; and +\item the root \pattern{a} has at least two children, \pattern{b} and \pattern{c}, adjacent and in that order; and \item the node \pattern{b} has three children: \pattern{d}, followed by any node, followed by \pattern{e}. \end{itemize} -Figure~\ref{fig:tre-example} shows one instance of a matching tree, with arrows indicating the pattern. Analogous to ordinary regular expressions, caret (\texttt{\textbf{\textasciicircum}}) and dollar sign (\texttt{\$}) anchor a node to be respectively the first or last child of its parent. A period (\texttt{\textbf{.}}) is a wildcard that matches any node. +\noindent +Analogous to ordinary regular expressions, caret (\texttt{\textbf{\textasciicircum}}) and dollar sign (\texttt{\$}) anchor a node to be respectively the first or last child of its parent. A period (\texttt{\textbf{.}}) is a wildcard that matches any node. \begin{figure}[htbp] \centering @@ -21,24 +22,25 @@ Figure~\ref{fig:tre-example} shows one instance of a matching tree, with arrows l=0.9cm, } [a,name=a + [f] [b,name=b [d,name=d] [g,name=g] [e,name=e]] - [f] [c,name=c [j] [k]] + [h] ] -\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=-1.1mm,yshift=0.2mm}] (b); -\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=1.1mm,yshift=0.2mm}] (c); +\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=0.8mm,yshift=-0.2mm}] (b); +\path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=-0.9mm,yshift=-0.3mm}] (c); \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (b) edge[transform canvas={xshift=-1.1mm,yshift=0.2mm}] (d); \draw[opacity=0] (b) -- node[anchor=east,thick,blue,opacity=1,font=\large] {\texttt{\textasciicircum}} (d); \path[draw,thick,relative,blue,transform canvas={xshift=-0.7mm}] (b) -- ($ (b)!0.6!(g) $); \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (b) edge[transform canvas={xshift=1.1mm,yshift=0.2mm}] (e); \draw[opacity=0] (b) -- node[anchor=west,thick,blue,opacity=1,font=\scriptsize,transform canvas={xshift=0mm,yshift=1mm}] {\texttt{\$}} (e); \end{forest} - \caption{A tree matching a pattern (in blue besides the edges). In the pattern, an arrow $x→y$ means that node $x$ has a child $y$. A shorter line without an arrowhead (e.g. \pattern{b}—\pattern{g}) indicates a wildcard, where the child can be any node. Anchors \texttt{\textbf{\textasciicircum}} and \texttt{\$} mean that the pattern will match only the first or last child.} + \caption{A tree matching a pattern (blue arrows besides the edges). In the pattern, each arrow $x→y$ means that node $x$ has a child $y$. A shorter line without an arrowhead (e.g. \pattern{b}$\boldsymbol{-}$ \pattern{g}) indicates a wildcard, where the child can be any node. Anchors \texttt{\textbf{\textasciicircum}} and \texttt{\$} mean that the pattern will match only the first or last child.} \label{fig:tre-example} \end{figure} -With TREs we encode interesting patterns in a program while disregarding irrelevant parts. Take for example the following Python function that prints the divisors of its argument $n$: +With TREs we encode interesting patterns in a program while disregarding irrelevant parts. Take for example the following, nearly correct Python function that prints the divisors of its argument $n$: \begin{Verbatim} def divisors(n): @@ -47,7 +49,7 @@ def divisors(n): print(d) \end{Verbatim} -Figure~\ref{fig:patterns-example} shows the simplified AST for this program, with two patterns overlaid. As described below, we construct our TREs automatically from certain subsets of nodes in the AST. The dashed pattern in Fig.~\ref{fig:patterns-example} was induced from the control-flow nodes in the program, while the solid pattern was induced from a pair of leaf nodes that represent the same variable. +Figure~\ref{fig:patterns-example} shows the simplified AST for this program, with two patterns overlaid. These patterns are represented by the S-expressions \begin{figure}[htb] \centering @@ -55,11 +57,8 @@ Figure~\ref{fig:patterns-example} shows the simplified AST for this program, wit for tree={ font=\sf, edge=darkgray, - %s sep=0.05cm, l sep=0, l=0.9cm, - %s sep=0, - %s=1.5cm, } [Function, name=def, draw,rectangle,red,dashed [name, name=name1 [divisors, name=divisors, font=\bfseries\sffamily]] @@ -113,16 +112,13 @@ Figure~\ref{fig:patterns-example} shows the simplified AST for this program, wit \label{fig:patterns-example} \end{figure} -These two patterns are represented by the respective S-expressions - \begin{enumerate} \item \pattern{(Function (body (For (body If))))} and \item \pattern{(Function (name divisors) (args \textbf{\textasciicircum}~Var \$)} \item[] ~~~~\pattern{(body (For (iter (Call (func range) (args \textbf{\textasciicircum}~\textbf{.} Var \$))))))}. \end{enumerate} -\noindent -Since this representation is not easy to read, we will instead represent TREs by highlighting relevant text in examples of matching programs. For instance +The first TRE encodes a single path in the AST and describes the program’s block structure: \textsf{Function}$\boldsymbol{-}$\textsf{For}$\boldsymbol{-}$\textsf{If}. The second TRE relates the argument in the definition of \code{divisors} with the last argument to \code{range} that provides the iterator in the for loop. Since S-expressions are not easy to read, we will instead represent TREs by highlighting relevant text in examples of matching programs: \begin{Verbatim} \red{\dashuline{\textbf{def}}} divisors(\blue{\underline{\textbf{n}}}): @@ -131,7 +127,7 @@ Since this representation is not easy to read, we will instead represent TREs by print(d) \end{Verbatim} -While the pattern encoding control flow is correct, the second pattern describes a common mistake for this problem: \code{range(1,n)} will only generate values up to $n-1$, so $n$ will not be printed as its own divisor. A correct pattern would include the binary operator \code{+} on the path to $n$, indicating a call to \code{range(1,n+1)}. +The second pattern describes a common mistake for this problem: \code{range(1,n)} will only generate values up to \code{n-1}, so \code{n} will not be printed as its own divisor. A correct pattern would include the binary operator \code{+} on the path to \code{n}, indicating a call to \code{range(1,n+1)}. \subsection{Constructing patterns} @@ -150,5 +146,4 @@ For each control-flow node $n$ we construct a pattern from the set $\{n\}$; we d For each expression (such as \code{(F-32)*5/9}) we select the different combinations of literal and variable nodes in the expression. In these patterns we include at most one node referring to a variable. \end{enumerate} -We found that patterns constructed from such nodesets are useful for discriminating between programs. As we show in Sect.~\ref{sec:interpreting-rules}, they are also easily interpreted in terms of bugs and strategies for a given problem. Note that in any pattern constructed in this way, all \textsf{Var} nodes refer to the same variable, simplifying interpretation. - +Note that in every constructed pattern, all \textsf{Var} nodes refer to the same variable. We found that patterns constructed from such nodesets are useful for discriminating between programs. As we show in Sect.~\ref{sec:interpreting-rules}, they are also easily interpreted in terms of bugs and strategies for a given problem. diff --git a/aied2018/rules.tex b/aied2018/rules.tex index 3da87ea..ff47939 100644 --- a/aied2018/rules.tex +++ b/aied2018/rules.tex @@ -24,6 +24,7 @@ We use the same constraints as in the case of n-rules and learn rules for correc \section{Interpreting rules} +\label{sec:interpreting-rules} Learned rules can be used to analyze student programing. This section describes several rules induced for two Python exercises: \textsf{Fahrenheit to Celsius}, which reads a value from standard input and calculates the result, and \textsf{Greatest Absolutist}, one of the introductory exercises for functions. @@ -77,7 +78,7 @@ The two most accurate n-rules with missing patterns in their conditions are: \end{Verbatim} \noindent -Pattern \textsf{P0} matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything. +Pattern \textsf{P0} matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything. The second rule covers programs with \textsf{P1} missing and \textsf{P16} present. \textsf{P16} matches programs with a call to the \texttt{print} function, where the argument contains a formula which subtracts 32 from a variable and then further multiplies the result. \textsf{P1} describes a call to the function \texttt{float} as the first item in an expression, i.e. \texttt{= float(...)}. This rule therefore represents programs that have the formula in the \texttt{print} function (\textsf{P16} is present), however fail to cast input from string to float (\textsf{P1} is missing). |