From 1dbb57208d2b8163a1c007ad0931f859651fc1c2 Mon Sep 17 00:00:00 2001 From: Timotej Lazar Date: Wed, 4 Apr 2018 18:02:51 +0200 Subject: AIED2018: minor tweaks --- aied2018/aied2018_short.tex | 55 +++++++++++++++++++-------------------------- 1 file changed, 23 insertions(+), 32 deletions(-) diff --git a/aied2018/aied2018_short.tex b/aied2018/aied2018_short.tex index 21a5c24..c5d3265 100644 --- a/aied2018/aied2018_short.tex +++ b/aied2018/aied2018_short.tex @@ -20,19 +20,6 @@ \usepackage{fancyvrb} \fvset{commandchars=\\\{\},baselinestretch=0.98,samepage=true,xleftmargin=2.5mm} -% WAT — I don’t even… -\makeatletter -\begingroup -\catcode`\`=\active -\gdef\FV@fontfamily@sf{% - \def\FV@FontScanPrep{\FV@MakeActive\`}% - \def\FV@FontFamily{\sffamily\edef`{{\string`}}}} -\endgroup -\makeatother - -\usepackage{tikz} -\usepackage{forest} -\usetikzlibrary{arrows.meta,calc} \newcommand\code[1]{\texttt{#1}} \newcommand\pattern[1]{\textsf{#1}} @@ -58,12 +45,12 @@ Providing feedback to students is among the most time-consuming tasks when teach Several attempts have been made to automatically discover common errors in student programs~\cite{jin2012program,rivers2015data-driven,nguyen2014codewebs,hovemeyer2016control}. This would allow a teacher to annotate a representative subset of submissions with feedback messages, which could then be automatically propagated to similar programs. These techniques are used for instance by the OverCode tool to visualize variations in student programs~\cite{glassman2015overcode}. -We use \emph{tree regular expressions} (TREs) to specify important patterns in a program’s abstract syntax tree (AST) while disregarding irrelevant parts. +We use \emph{tree regular expressions} to specify important patterns in a program’s abstract syntax tree (AST) while disregarding irrelevant parts. We have previously demonstrated this approach with Prolog programs~\cite{lazar2017automatic}. Here we refine the AST patterns and show that they can be applied to Python -- a different programming paradigm -- with only a few modifications. \section{AST patterns} -We encode structural patterns in ASTs using tree regular expressions (TREs). In this work we consider (only) child and sibling relations in ASTs. We write them as S-expressions, such as \pattern{(a (b \textbf{\textasciicircum}~d~\textbf{.}~e~\$) c)}. This expression matches any tree satisfying the following constraints: +We encode structural patterns in ASTs using tree regular expressions (TREs). In this work we consider (only) patterns describing child and sibling relations in an AST. We write them as S-expressions, such as \pattern{(a (b \textbf{\textasciicircum}~d~\textbf{.}~e~\$) c)}. This expression matches any tree satisfying the following constraints: \begin{itemize} \item the root \pattern{a} has at least two children, \pattern{b} and \pattern{c}, adjacent and in that order; and @@ -71,9 +58,9 @@ We encode structural patterns in ASTs using tree regular expressions (TREs). In \end{itemize} \noindent -As in ordinary regular expressions, caret (\texttt{\textbf{\textasciicircum}}) and dollar sign (\texttt{\$}) anchor a node to be respectively the first or last child, and a period (\texttt{\textbf{.}}) matches any node. +As in ordinary regular expressions, caret (\texttt{\textbf{\textasciicircum}}) and dollar sign (\texttt{\$}) anchor a node to be respectively the first or last sibling, and a period (\texttt{\textbf{.}}) matches any node. -With TREs we encode interesting patterns in a program while disregarding irrelevant parts. Take for example the following, nearly correct Python function that prints the divisors of its argument $n$: +Using TREs we can encode interesting patterns in a program while disregarding irrelevant parts. Take for example the following, nearly correct Python function that prints the divisors of its argument $n$: \begin{Verbatim} \red{\dashuline{\textbf{def}}} divisors(\blue{\underline{\textbf{n}}}): @@ -82,7 +69,8 @@ With TREs we encode interesting patterns in a program while disregarding irrelev print(d) \end{Verbatim} -The highlighted fragments correspond to the following two patterns: +\noindent +The highlighted fragments correspond to the following patterns: \begin{enumerate} \item \pattern{(Function (body (For (body If))))} and @@ -90,29 +78,30 @@ The highlighted fragments correspond to the following two patterns: \item[] ~~~~\pattern{(body (For (iter (Call (func range) (args \textbf{\textasciicircum}~\textbf{.} Var \$))))))}. \end{enumerate} -The first TRE encodes a single path in the AST and describes the program’s structure: \textsf{Function}$\boldsymbol{-}$\textsf{For}$\boldsymbol{-}$\textsf{If}. The second TRE relates the argument in the definition of \code{divisors} to the last argument to \code{range}. -The second pattern shows a common mistake for this problem: \code{range(1,n)} will only generate values up to \code{n-1}, so \code{n} will not be printed as its own divisor. A correct pattern would include the binary operator \code{+} on the path to \code{n}, indicating a call to \code{range(1,n+1)}. +The first TRE encodes a single path in the AST and describes the program’s control-flow structure: \textsf{Function}$\boldsymbol{-}$\textsf{For}$\boldsymbol{-}$\textsf{If}~\cite{hovemeyer2016control}. The second TRE relates the argument in the definition of \code{divisors} to the last argument to \code{range}. +This pattern shows a common mistake for this problem: \code{range(1,n)} will only generate values up to \code{n-1}, so \code{n} will not be printed as its own divisor. A correct pattern would include the operator \code{+} on the AST path to \code{n}, indicating a call to \code{range(1,n+1)}. -Patterns are extracted automatically from student programs. We first canonicalize each program~\cite{rivers2015data-driven} using code from ITAP\footnote{Available at \url{https://github.com/krivers/ITAP-django}.}. For each pattern we select a subset of nodes in the AST, then walk the tree from each selected node to the root and construct the TRE by including all nodes along those paths. +Patterns are extracted automatically from student programs. We first canonicalize~\cite{rivers2015data-driven} each program using code from ITAP\footnote{Available at \url{https://github.com/krivers/ITAP-django}.}. For each pattern we select a subset of nodes in the AST, then construct the TRE by walking the tree from each selected node to the root. While pattern extraction is completely automated, we have manually defined the kinds of node subsets that are selected. After analyzing solutions to several programming problems, we decided to use the following kinds of patterns: \begin{enumerate} \item - We select each pair of leaf nodes referring to the same variable. + We select each pair of leaf nodes $\{a,b\}$ that refer to the same variable. \item For each control-flow node $n$ we construct a pattern from the set $\{n\}$; we do the same for each \textsf{Call} node representing a function call. \item For each expression -- such as \code{(F-32)*5/9} -- we select the different combinations of literal and variable nodes in the expression. In these patterns we include at most one node referring to a variable. \end{enumerate} -Note that in every constructed pattern, all \textsf{Var} nodes refer to the same variable. We found that patterns constructed from such nodesets are useful for discriminating between programs. As we show in the next section, they are also easily interpreted in terms of bugs and strategies for a given problem. +We found that patterns constructed from such node subsets are useful for discriminating between programs. Note that every constructed pattern refers to at most one variable. We used patterns to induce classification rules for predicting program correctness. The next section demonstrates that these rules are easy to interpret in terms of bugs and strategies for a given problem. -\section{Interpreting learned rules: a case study} +\section{Rules: a case study} \label{sec:rules} -The goal of learning rules in this paper is to discover and explain common approaches and mistakes in student programs. We use a similar rule learner to the one described in~\cite{lazar2017automatic}, implemented within the Orange data mining library~\cite{demsar2013orange}. Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. Rules are then learned to explain why a program is correct or incorrect. Rules for incorrect programs are called \emph{n-rules} and rules for correct programs are called \emph{p-rules}. The patterns mentioned within the condition part of a rule can be used to analyze student programing. For example, the patterns from a rule for incorrect programs are more often present in incorrect than in correct programs, therefore they likely correspond to an erroneous pattern. -This section describes several rules induced for the \textsf{Fahrenheit to Celsius} Python exercise, which reads a value from standard input and calculates the result. The program that solves the \textsf{Fahrenheit to Celsius} exercise should ask the user to input a temperature, and print the result. A sample correct program is: +The goal of learning rules in this paper is to discover and explain common approaches and mistakes in student programs. We use a similar rule learner to the one described in~\cite{lazar2017automatic}, implemented within the Orange data mining library~\cite{demsar2013orange}. Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. Rules are then learned to explain why a program is correct or incorrect. Rules for incorrect programs are called \emph{n-rules} and rules for correct programs are called \emph{p-rules}. Patterns mentioned within the condition part of a rule can be used to analyze student programming. For example, patterns from a rule for incorrect programs are more often present in incorrect than in correct programs, therefore they likely correspond to errors. + +This section describes several rules induced for the \textsf{Fahrenheit to Celsius} Python exercise, which reads a value from standard input and calculates the result. To solve this exercise, a student must ask the user to input a temperature, and print the result. A sample correct program is: \begin{Verbatim} F = float(input("Fahrenheit: ")) @@ -120,8 +109,7 @@ C = 5 / 9 * (F - 32) print("Celsius: ", C) \end{Verbatim} -In the CodeQ environment, students have submitted 1177 programs for this problem, with 495 correct and 682 incorrect programs. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules and 16 p-rules. - +Students have submitted 1177 programs for this problem, with 495 correct and 682 incorrect programs. Our system extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules and 16 p-rules. Two examples of highly accurate n-rules were: \begin{Verbatim}[fontfamily=sf] @@ -144,7 +132,7 @@ g1 = \blue{\underline{int}}(g2) g1 = (\red{\dashuline{(g2 - 32 print(((g1-32)*(5/9))) print(g2, 'F equals', g1, 'C') \end{Verbatim} -These rules describe two common student errors. The left program is incorrect, since it fails when the user inputs a decimal. The right program is incorrect because the input string must be cast to a number. Not casting it (pattern \textsf{P5}) and then using it in an expression (pattern \textsf{P35}) will raise an exception. +These rules describe two common student errors. The left program fails when the user inputs a decimal. The right program is incorrect because the input string must be cast to a number. Not casting it (pattern \textsf{P5}) and then using it in an expression (pattern \textsf{P35}) will raise an exception. In some cases, n-rules imply a missing pattern. For example: \begin{Verbatim}[fontfamily=sf] @@ -162,7 +150,7 @@ P2 ∧ P8 ⇒ correct [1, 200] \noindent Patterns in the condition of the rule, \textsf{P2} and \textsf{P8}, correspond respectively to expressions of the form \texttt{float(input(?))} and \texttt{print((?-32)*?)}. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}. -This rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. Once students have figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is: +This rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent critical elements of the solution. Once students have figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is: \begin{Verbatim} g1 = \blue{\underline{float(input(}}'Temperature [F]: ')) @@ -170,9 +158,12 @@ print(((g1 \red{\dashuline{- 32) *}} (5 / 9))) \end{Verbatim} \section{Discussion and further work} -Our primary interest in this paper is to help manual analysis of student submissions. We proposed to first represent submitted programs with patterns extracted from abstract syntax trees and then learn classification rules that distnguish between correct and incorrect programs. We showed that both rules and patterns are easy to interpret and can be used to explain typical mistakes and approaches. The accuracy of automatic classification plays a secondary role, however it is a good measure to estimate the expressiveness of patterns. Over 12 exercises a random forest model achieved for about 17\% overall higher accuracy than the majority classifier. This result indicates that a significant amount of information can be gleaned from simple syntax-oriented analysis. To further improve the quality of patterns, we intend to analyze misclassified programs in exercises and derive new formats of patterns, which should enable better learning. -We have demonstrated how AST patterns can be described with TREs, and how such patterns can be combined to discover important concepts and errors in student programs. Currently, analyzing patterns and rules is quite cumbersome. We plan on developing a tool to allow teachers to easily construct and refine patterns and rules based on example programs. Ideally we would integrate our approach into an existing analysis tool such as OverCode~\cite{glassman2015overcode}. +Our primary interest in this paper is to help manual analysis of student submissions. We proposed to first represent submitted programs with patterns extracted from abstract syntax trees and then learn classification rules that distinguish between correct and incorrect programs. We showed that both rules and patterns are easy to interpret and can be used to explain typical mistakes and approaches. + +The accuracy of automatic classification plays a secondary role, but it is a good measure to estimate the expressiveness of patterns. Over 12 exercises a random forest model achieved about 17\% overall higher accuracy than the majority classifier. This result indicates that a significant amount of information can be gleaned from simple syntax-oriented analysis. To further improve the quality of patterns, we intend to analyze misclassified programs in exercises and derive new formats of patterns, which should enable better learning. + +We have demonstrated how AST patterns can be encoded with TREs, and how patterns can be combined to discover important concepts and errors in student programs. Currently, analyzing patterns and rules is quite cumbersome. We plan on developing a tool to allow teachers to easily construct and refine patterns based on example programs. Ideally we would integrate our approach into an existing analysis tool such as OverCode~\cite{glassman2015overcode}. \bibliographystyle{splncs} \bibliography{aied2018} -- cgit v1.2.1