summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--aied2018/aied2018_short.tex295
1 files changed, 295 insertions, 0 deletions
diff --git a/aied2018/aied2018_short.tex b/aied2018/aied2018_short.tex
new file mode 100644
index 0000000..1272be8
--- /dev/null
+++ b/aied2018/aied2018_short.tex
@@ -0,0 +1,295 @@
+\documentclass{llncs}
+
+\usepackage[utf8]{inputenc}
+\usepackage{newunicodechar}
+\newunicodechar{∧}{\ensuremath{\land}}
+\newunicodechar{¬}{\ensuremath{\lnot}}
+\newunicodechar{⇒}{\ensuremath{\Rightarrow}}
+\newunicodechar{→}{\ensuremath{\rightarrow}}
+\newunicodechar{⋯}{\ensuremath{\cdots}}
+
+\usepackage{bold-extra}
+\usepackage{bm}
+\usepackage{hyperref}
+\usepackage[normalem]{ulem}
+
+\usepackage{color}
+\newcommand\red[1]{{\begingroup\color[rgb]{0.8,0.15,0.15}#1\endgroup}}
+\newcommand\blue[1]{{\begingroup\color[rgb]{0.15,0.15,0.8}#1\endgroup}}
+\newcommand\green[1]{{\begingroup\color[rgb]{0.15,0.8,0.15}#1\endgroup}}
+
+\usepackage{fancyvrb}
+\fvset{commandchars=\\\{\},baselinestretch=0.98,samepage=true,xleftmargin=2.5mm}
+% WAT — I don’t even…
+\makeatletter
+\begingroup
+\catcode`\`=\active
+\gdef\FV@fontfamily@sf{%
+ \def\FV@FontScanPrep{\FV@MakeActive\`}%
+ \def\FV@FontFamily{\sffamily\edef`{{\string`}}}}
+\endgroup
+\makeatother
+
+\usepackage{tikz}
+\usepackage{forest}
+\usetikzlibrary{arrows.meta,calc}
+
+\newcommand\code[1]{\texttt{#1}}
+\newcommand\pattern[1]{\textsf{#1}}
+
+\begin{document}
+\title{Syntax-based analysis of programming concepts in Python}
+\author{Martin Možina, Timotej Lazar}
+\institute{University of Ljubljana, Faculty of Computer and Information Science, Slovenia}
+\maketitle
+
+\begin{abstract}
+% background / problem
+Writing programs is essential to learning programming. Most programming courses encourage students to practice with lab and homework assignments. By analyzing solutions to these exercises teachers can discover mistakes and concepts students are struggling with, and use that knowledge to improve the content and presentation of the course. Students however tend to submit many different programs even for simple exercises, making such analysis difficult.
+% solution
+We propose using tree regular expressions to encode common patterns in programs. Based on these patterns we induce rules describing common approaches and mistakes for a given assignment. In this paper we present a case study of rule-based analysis for an introductory Python exercise. We show that our rules are easy to interpret, and can be learned from a relatively small set of programs.
+\\\\
+\textbf{Keywords:} Learning programming · Educational data analysis · Error diagnosis · Abstract syntax tree · Tree regular expressions
+\end{abstract}
+
+\section{Introduction}
+
+Providing feedback to students is among the most time-consuming tasks when teaching programming. In large courses with hundreds of students, feedback is therefore often limited to automated program testing. While test cases can reliably determine whether a program is correct or not, they cannot easily be associated with specific errors in the code.
+
+Several attempts have been made to automatically discover commonalities in a set of programs~\cite{jin2012program,rivers2015data-driven,nguyen2014codewebs,hovemeyer2016control}. This would allow a teacher to annotate a representative subset of submissions with feedback messages, which could then be automatically propagated to similar programs. These techniques are used for instance by the OverCode tool to visualize variations in student programs~\cite{glassman2015overcode}.
+
+This paper presents a new language for describing patterns in student code. Our approach is based on \emph{tree regular expressions} (TREs) used in natural language processing~\cite{levy2006tregex}. TREs are similar to ordinary regular expressions: they allow us to specify important patterns in a program’s abstract syntax tree (AST) while disregarding irrelevant parts. We found that TREs are sufficiently expressive to represent various concepts and errors in novice programs.
+
+We have previously demonstrated this approach with Prolog programs~\cite{lazar2017automatic}. Here we refine the definition of AST patterns, and show that they can be applied to Python -- representing a different programming paradigm -- with only a few language-specific modifications. The exercises and the solutions were obtained from the online programming environment CodeQ\footnote{Available at \url{https://codeq.si}. Source under AGPL3+ at \url{https://codeq.si/code}.}. In the case study, we demonstrate that rules learned from such patterns can be easily interpreted.
+
+\section{AST patterns}
+
+We encode structural patterns in ASTs using tree regular expressions (TREs). An ordinary regular expression describes the set of strings matching certain constraints; similarly, a TRE describes the set of trees containing certain nodes and relations. Since TREs describe structure, they are themselves represented as trees. More specifically, both ASTs and TREs are ordered rooted trees.
+
+In this work we used TREs to encode (only) child and sibling relations in ASTs. We write them as S-expressions, such as \pattern{(a (b \textbf{\textasciicircum}~d~\textbf{.}~e~\$) c)}. This expression matches any tree satisfying the following constraints (see Fig.~\ref{fig:tre-example} for an example):
+
+\begin{itemize}
+ \item the root \pattern{a} has at least two children, \pattern{b} and \pattern{c}, adjacent and in that order; and
+ \item the node \pattern{b} has three children: \pattern{d}, followed by any node, followed by \pattern{e}.
+\end{itemize}
+
+\noindent
+Analogous to ordinary regular expressions, caret (\texttt{\textbf{\textasciicircum}}) and dollar sign (\texttt{\$}) anchor a node to be respectively the first or last child of its parent. A period (\texttt{\textbf{.}}) is a wildcard that matches any node.
+
+\begin{figure}[tbp]
+ \centering
+ \begin{forest}
+ for tree={
+ font=\sf,
+ edge=darkgray,
+ l sep=0,
+ l=0.9cm,
+ }
+ [a,name=a
+ [f]
+ [b,name=b
+ [d,name=d] [g,name=g] [e,name=e]]
+ [c,name=c [j] [k]]
+ [h]
+ ]
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=0.8mm,yshift=-0.2mm}] (b);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (a) edge[transform canvas={xshift=-0.9mm,yshift=-0.3mm}] (c);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (b) edge[transform canvas={xshift=-1.1mm,yshift=0.2mm}] (d);
+ \draw[opacity=0] (b) -- node[anchor=east,thick,blue,opacity=1,font=\large] {\texttt{\textasciicircum}} (d);
+ \path[draw,thick,relative,blue,transform canvas={xshift=-0.7mm}] (b) -- ($ (b)!0.6!(g) $);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (b) edge[transform canvas={xshift=1.1mm,yshift=0.2mm}] (e);
+ \draw[opacity=0] (b) -- node[anchor=west,thick,blue,opacity=1,font=\scriptsize,transform canvas={xshift=0mm,yshift=1mm}] {\texttt{\$}} (e);
+ \end{forest}
+ \caption{A tree matching a pattern (blue arrows besides the edges). In the pattern, each arrow $x→y$ means that node $x$ has a child $y$. A shorter line without an arrowhead (e.g. \pattern{b}$\boldsymbol{-}$ \pattern{g}) indicates a wildcard, where the child can be any node. Anchors \texttt{\textbf{\textasciicircum}} and \texttt{\$} mean that the pattern will match only the first or last child.}
+ \label{fig:tre-example}
+\end{figure}
+
+With TREs we encode interesting patterns in a program while disregarding irrelevant parts. Take for example the following, nearly correct Python function that prints the divisors of its argument $n$:
+
+\begin{Verbatim}
+def divisors(n):
+for d in range(1, n):
+if n % d == 0:
+print(d)
+\end{Verbatim}
+
+Figure~\ref{fig:patterns-example} shows the simplified AST for this program, with two patterns overlaid. These patterns are represented by the S-expressions
+
+\begin{figure}[htb]
+ \centering
+ \begin{forest}
+ for tree={
+ font=\sf,
+ edge=darkgray,
+ l sep=0,
+ l=0.9cm,
+ }
+ [Function, name=def, draw,rectangle,red,dashed
+ [name, name=name1 [divisors, name=divisors, font=\bfseries\sffamily]]
+ [args, name=args1
+ [Var, name=args1-1, draw,rectangle,blue [n, font=\bfseries\sffamily, l=0.7cm]]]
+ [body, name=body1, before computing xy={s=2cm}
+ [For, name=for, l=1.2cm, draw,rectangle,red,dashed
+ [target
+ [Var, l=0.9cm [d, font=\bfseries\sffamily, l=0.7cm]]]
+ [iter, name=iter
+ [Call, name=call, l=0.9cm
+ [func, name=func, l=0.8cm [range, name=range, font=\bfseries\sffamily, l=0.7cm]]
+ [args, name=args2, l=0.8cm
+ [Num, name=args2-1, l=1cm [1, font=\bfseries\sffamily, l=0.7cm]]
+ [Var, name=args2-2, l=1cm,draw,rectangle,blue [n, font=\bfseries\sffamily, l=0.7cm]]]]]
+ [body, name=body2
+ [If, name=if, draw,rectangle,red,dashed
+ [test, l=0.7cm [Compare, l=0.8cm
+ [BinOp
+ [Var [n, font=\bfseries\sffamily, l=0.6cm]]
+ [\%, font=\bfseries\sffamily]
+ [Var [d, font=\bfseries\sffamily, l=0.6cm]]]
+ [{==}, font=\bfseries\sffamily, l=0.7cm]
+ [Num [0, font=\bfseries\sffamily, l=0.7cm]]]]
+ [body, l=0.7cm [Call, l=0.8cm
+ [func [print, font=\bfseries\sffamily, l=0.7cm]]
+ [args [Var, l=0.7cm [d, font=\bfseries\sffamily, l=0.6cm]]]]]]]]]]
+ % first pattern
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,red,dashed] (def) edge[transform canvas={xshift=1.1mm,yshift=0.2mm}] (body1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,red,dashed] (body1) edge[transform canvas={xshift=0.8mm}] (for);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,red,dashed] (for) edge[transform canvas={xshift=1.1mm,yshift=0.5mm}] (body2);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,red,dashed] (body2) edge[transform canvas={xshift=0.8mm}] (if);
+ % second pattern
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (def) edge[transform canvas={xshift=-1.1mm,yshift=0.2mm}] (name1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (def) edge[transform canvas={xshift=-0.8mm}] (args1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (args1) edge[transform canvas={xshift=-0.8mm}] (args1-1);
+ \draw[opacity=0] (args1) -- node[anchor=east,thick,blue,opacity=1,font=\large] {\texttt{\textasciicircum}} (args1-1);
+ \draw[opacity=0] (args1) -- node[anchor=west,thick,blue,opacity=1,font=\scriptsize,transform canvas={xshift=-0.5mm,yshift=0.6mm}] {\texttt{\$}} (args1-1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (def) edge[transform canvas={xshift=-1mm,yshift=-0.3mm}] (body1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (body1) edge[transform canvas={xshift=-0.8mm}] (for);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (for) edge[transform canvas={xshift=1mm,yshift=-0.5mm}] (iter);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (iter) edge[transform canvas={xshift=0.8mm}] (call);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (call) edge[transform canvas={xshift=-1.2mm}] (func);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (call) edge[transform canvas={xshift=1.2mm}] (args2);
+ \path[draw,thick,relative,blue,transform canvas={xshift=-0.7mm}] (args2) -- ($ (args2)!0.55!(args2-1) $);
+ \draw[opacity=0] (args2) -- node[anchor=east,thick,blue,opacity=1,font=\large,transform canvas={xshift=0.6mm,yshift=0.4mm}] {\texttt{\textasciicircum}} (args2-1);
+ \path[-{Latex[length=1.5mm,width=1mm]},thick,relative,blue] (args2) edge[transform canvas={xshift=1mm}] (args2-2);
+ \draw[opacity=0] (args2) -- node[anchor=west,thick,blue,opacity=1,font=\scriptsize,transform canvas={xshift=0mm,yshift=1mm}] {\texttt{\$}} (args2-2);
+ \end{forest}
+ \caption{The AST for the \code{divisors} program with two patterns. Leaf nodes (in bold) correspond to terminals in the program, i.e. names and values. Dashed red arrows represent the pattern describing the control structure of the program. Solid blue arrows encode the incorrect second argument to the \code{range} function.}
+ \label{fig:patterns-example}
+\end{figure}
+
+\begin{enumerate}
+ \item \pattern{(Function (body (For (body If))))} and
+ \item \pattern{(Function (name divisors) (args \textbf{\textasciicircum}~Var \$)}
+ \item[] ~~~~\pattern{(body (For (iter (Call (func range) (args \textbf{\textasciicircum}~\textbf{.} Var \$))))))}.
+\end{enumerate}
+
+The first TRE encodes a single path in the AST and describes the program’s block structure: \textsf{Function}$\boldsymbol{-}$\textsf{For}$\boldsymbol{-}$\textsf{If}. The second TRE relates the argument in the definition of \code{divisors} with the last argument to \code{range} that provides the iterator in the for loop. Since S-expressions are not easy to read, we will instead represent TREs by highlighting relevant text in examples of matching programs:
+
+\begin{Verbatim}
+\red{\dashuline{\textbf{def}}} divisors(\blue{\underline{\textbf{n}}}):
+\red{\dashuline{\textbf{for}}} d in range(1, \blue{\underline{\textbf{n}}}):
+\red{\dashuline{\textbf{if}}} n % d == 0:
+print(d)
+\end{Verbatim}
+
+The second pattern shows a common mistake for this problem: \code{range(1,n)} will only generate values up to \code{n-1}, so \code{n} will not be printed as its own divisor. A correct pattern would include the binary operator \code{+} on the path to \code{n}, indicating a call to \code{range(1,n+1)}.
+
+\subsection{Constructing patterns}
+
+Patterns are extracted automatically from student programs. We first canonicalize each program~\cite{rivers2015data-driven} using code from ITAP\footnote{Available at \url{https://github.com/krivers/ITAP-django}.}. To construct TREs describing individual patterns, we select a subset of nodes in the AST, and walk the tree from each selected node to the root, including all nodes along those paths.
+
+Depending on node type we also include some nodes adjacent to such paths. For each comparison and unary/binary expression on the path we include the corresponding operator. For function definitions and calls we include the function name. Finally, in all argument lists we include the anchors (\textbf{\texttt{\textasciicircum}}~and~\code{\$}) and a wildcard (\textbf{\texttt{.}}) for each argument not on the path. This allows our TREs to discriminate between e.g. the first and second argument to a function.
+
+While pattern extraction is completely automated, we have manually defined the kinds of node subsets that are selected. After analyzing solutions to several programming problems, we decided to use the following kinds of patterns. Figure~\ref{fig:patterns-example} shows two examples of the first two kinds of patterns.
+
+\begin{enumerate}
+ \item
+ We select each pair of leaf nodes referring to the same variable.
+ \item
+ For each control-flow node $n$ we construct a pattern from the set $\{n\}$; we do the same for each \textsf{Call} node representing a function call.
+ \item
+ For each expression (such as \code{(F-32)*5/9}) we select the different combinations of literal and variable nodes in the expression. In these patterns we include at most one node referring to a variable.
+\end{enumerate}
+
+Note that in every constructed pattern, all \textsf{Var} nodes refer to the same variable. We found that patterns constructed from such nodesets are useful for discriminating between programs. As we show in Sect.~\ref{sec:interpreting-rules}, they are also easily interpreted in terms of bugs and strategies for a given problem.
+
+\section{Interpreting learned rules: a case study}
+\label{sec:rules}
+
+Learned rules can be used to analyze student programing. This section describes several rules induced for the \textsf{Fahrenheit to Celsius} Python exercise, which reads a value from standard input and calculates the result.
+
+We used a similar rule learner to the one described in~\cite{lazar2017automatic}, implemented within the Orange data mining library~\cite{demsar2013orange}. Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. Rules explaining incorrect programs are called \emph{n-rules} and rules explaining correct programs are called \emph{p-rules}.
+
+The program that solves the \textsf{Fahrenheit to Celsius} exercise should ask the user to input a temperature, and print the result. A sample correct program is:
+
+\begin{Verbatim}
+F = float(input("Fahrenheit: "))
+C = 5 / 9 * (F - 32)
+print("Celsius: ", C)
+\end{Verbatim}
+
+In the CodeQ environment, students have submitted 1177 programs for this problem, with 495 correct and 682 incorrect programs. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules and 16 p-rules.
+
+Two examples of highly accurate n-rules are:
+
+\begin{Verbatim}[fontfamily=sf]
+P20 ⇒ incorrect [208, 1]
+P5 ∧ P35 ⇒ incorrect [72, 0]
+\end{Verbatim}
+
+\noindent
+The first rule covers programs where the pattern \textsf{P20} is present. The rule implies an incorrect program, and covers 208 incorrect and one correct program. \textsf{P20} is the AST pattern describing a call to the \texttt{int} function:
+
+\begin{Verbatim}[fontfamily=sf]
+(Module (body (Assign (value (Call (func (Name (id int) (ctx Load))))))))
+\end{Verbatim}
+
+The pattern \textsf{P5} in the second rule matches programs where the result of the \texttt{input} call is not cast to \texttt{float} but stored as a string. Pattern \textsf{P35} matches programs where the value 32 is subtracted from a variable on the left-hand side of a multiplication. Sample programs matching the first rule (left) and the second rule (right) are:
+
+\begin{Verbatim}
+g2 = input() g2 = \blue{\underline{input}}('Temperature [F]? ')
+g1 = \blue{\underline{int}}(g2) g1 = (\red{\dashuline{(g2 - 32) *}} (5 / 9))
+print(((g1-32)*(5/9))) print(g2, 'F equals', g1, 'C')
+\end{Verbatim}
+
+These rules describe two common student errors. The left program is incorrect, since it fails when the user inputs a decimal. The right program is incorrect because the input string must be cast to a number. Not casting it (pattern \textsf{P5}) and then using it in an expression (pattern \textsf{P35}) will raise an exception.
+
+In some cases, n-rules imply a missing pattern. For example:
+\begin{Verbatim}[fontfamily=sf]
+¬P0 ⇒ incorrect [106, 0]
+\end{Verbatim}
+
+\noindent
+Pattern \textsf{P0} matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything.
+
+Let us now examine the other type of rules. Two induced p-rules were:
+\begin{Verbatim}[fontfamily=sf]
+P2 ∧ P8 ⇒ correct [1, 200]
+P80 ⇒ correct [0, 38]
+\end{Verbatim}
+
+\noindent
+Patterns in the condition of the first rule, \textsf{P2} and \textsf{P8}, correspond respectively to expressions of the form \texttt{float(input(?))} and \texttt{print((?-32)*?)}. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}.
+
+This first rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. Once students have figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is:
+
+\begin{Verbatim}
+g1 = \blue{\underline{float(input(}}'Temperature [F]: '))
+print(((g1 \red{\dashuline{- 32) *}} (5 / 9)))
+\end{Verbatim}
+
+\noindent The second rule describes programs that subtract 32 from a variable cast to float. The following program matches pattern \textsf{P80}:
+
+\begin{Verbatim}
+g1 = input('Fahrenheit?')
+g0 = (\blue{\underline{(float(g1) - 32)}} * (5 / 9))
+print(g0)
+\end{Verbatim}
+
+\section{Discussion and further work}
+Our primary interest in this paper is to help manual analysis of student submissions. We proposed to first represent submitted programs with patterns extracted from abstract syntax trees and then learn classification rules that distnguish between correct and incorrect programs. We showed that both rules and patterns are easy to interpret and can be used to explain typical mistakes and approaches. The accuracy of automatic classification plays a secondary role, however it is a good measure to estimate the expressiveness of patterns. Over 12 exercises our model achieved for about 17\% overall higher accuracy than the majority classifier. This result indicates that a significant amount of information can be gleaned from simple syntax-oriented analysis. To further improve the quality of patterns, we intend to analyze misclassified programs in exercises and derive new formats of patterns, which should enable better learning.
+
+We have demonstrated how AST patterns can be described with TREs, and how such patterns can be combined to discover important concepts and errors in student programs. Currently, analyzing patterns and rules is quite cumbersome. We plan on developing a tool to allow teachers to easily construct and refine patterns and rules based on example programs. Ideally we would integrate our approach into an existing analysis tool such as OverCode~\cite{glassman2015overcode}.
+
+\bibliographystyle{splncs}
+\bibliography{aied2018}
+\end{document}