summaryrefslogtreecommitdiff
path: root/aied2018
diff options
context:
space:
mode:
authorMartin Mozina <martin.mozina@fri.uni-lj.si>2018-04-04 10:14:41 +0200
committerMartin Mozina <martin.mozina@fri.uni-lj.si>2018-04-04 10:14:41 +0200
commitad23c3405e0026933d8a8f81ccaafb1f649f1fdf (patch)
tree9fa5041e2c6ffe805a988f3a4ec2d2aa91f37b8c /aied2018
parent3dda20d23d5ea18f74568ea99ad6a27ce66c5d91 (diff)
Shorter rules section, better transition from section 2 to 3.
Diffstat (limited to 'aied2018')
-rw-r--r--aied2018/aied2018_short.tex26
1 files changed, 7 insertions, 19 deletions
diff --git a/aied2018/aied2018_short.tex b/aied2018/aied2018_short.tex
index 1a371fa..21a5c24 100644
--- a/aied2018/aied2018_short.tex
+++ b/aied2018/aied2018_short.tex
@@ -110,12 +110,9 @@ Note that in every constructed pattern, all \textsf{Var} nodes refer to the same
\section{Interpreting learned rules: a case study}
\label{sec:rules}
+The goal of learning rules in this paper is to discover and explain common approaches and mistakes in student programs. We use a similar rule learner to the one described in~\cite{lazar2017automatic}, implemented within the Orange data mining library~\cite{demsar2013orange}. Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. Rules are then learned to explain why a program is correct or incorrect. Rules for incorrect programs are called \emph{n-rules} and rules for correct programs are called \emph{p-rules}. The patterns mentioned within the condition part of a rule can be used to analyze student programing. For example, the patterns from a rule for incorrect programs are more often present in incorrect than in correct programs, therefore they likely correspond to an erroneous pattern.
-Learned rules can be used to analyze student programing. This section describes several rules induced for the \textsf{Fahrenheit to Celsius} Python exercise, which reads a value from standard input and calculates the result.
-
-We used a similar rule learner to the one described in~\cite{lazar2017automatic}, implemented within the Orange data mining library~\cite{demsar2013orange}. Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. Rules explaining incorrect programs are called \emph{n-rules} and rules explaining correct programs are called \emph{p-rules}.
-
-The program that solves the \textsf{Fahrenheit to Celsius} exercise should ask the user to input a temperature, and print the result. A sample correct program is:
+This section describes several rules induced for the \textsf{Fahrenheit to Celsius} Python exercise, which reads a value from standard input and calculates the result. The program that solves the \textsf{Fahrenheit to Celsius} exercise should ask the user to input a temperature, and print the result. A sample correct program is:
\begin{Verbatim}
F = float(input("Fahrenheit: "))
@@ -125,7 +122,7 @@ print("Celsius: ", C)
In the CodeQ environment, students have submitted 1177 programs for this problem, with 495 correct and 682 incorrect programs. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules and 16 p-rules.
-Two examples of highly accurate n-rules are:
+Two examples of highly accurate n-rules were:
\begin{Verbatim}[fontfamily=sf]
P20 ⇒ incorrect [208, 1]
@@ -157,32 +154,23 @@ In some cases, n-rules imply a missing pattern. For example:
\noindent
Pattern \textsf{P0} matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything.
-Let us now examine the other type of rules. Two induced p-rules were:
+Let us now examine the other type of rules. A sample p-rule was:
\begin{Verbatim}[fontfamily=sf]
P2 ∧ P8 ⇒ correct [1, 200]
-P80 ⇒ correct [0, 38]
\end{Verbatim}
\noindent
-Patterns in the condition of the first rule, \textsf{P2} and \textsf{P8}, correspond respectively to expressions of the form \texttt{float(input(?))} and \texttt{print((?-32)*?)}. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}.
+Patterns in the condition of the rule, \textsf{P2} and \textsf{P8}, correspond respectively to expressions of the form \texttt{float(input(?))} and \texttt{print((?-32)*?)}. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}.
-This first rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. Once students have figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is:
+This rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. Once students have figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is:
\begin{Verbatim}
g1 = \blue{\underline{float(input(}}'Temperature [F]: '))
print(((g1 \red{\dashuline{- 32) *}} (5 / 9)))
\end{Verbatim}
-\noindent The second rule describes programs that subtract 32 from a variable cast to float. The following program matches pattern \textsf{P80}:
-
-\begin{Verbatim}
-g1 = input('Fahrenheit?')
-g0 = (\blue{\underline{(float(g1) - 32)}} * (5 / 9))
-print(g0)
-\end{Verbatim}
-
\section{Discussion and further work}
-Our primary interest in this paper is to help manual analysis of student submissions. We proposed to first represent submitted programs with patterns extracted from abstract syntax trees and then learn classification rules that distnguish between correct and incorrect programs. We showed that both rules and patterns are easy to interpret and can be used to explain typical mistakes and approaches. The accuracy of automatic classification plays a secondary role, however it is a good measure to estimate the expressiveness of patterns. Over 12 exercises our model achieved for about 17\% overall higher accuracy than the majority classifier. This result indicates that a significant amount of information can be gleaned from simple syntax-oriented analysis. To further improve the quality of patterns, we intend to analyze misclassified programs in exercises and derive new formats of patterns, which should enable better learning.
+Our primary interest in this paper is to help manual analysis of student submissions. We proposed to first represent submitted programs with patterns extracted from abstract syntax trees and then learn classification rules that distnguish between correct and incorrect programs. We showed that both rules and patterns are easy to interpret and can be used to explain typical mistakes and approaches. The accuracy of automatic classification plays a secondary role, however it is a good measure to estimate the expressiveness of patterns. Over 12 exercises a random forest model achieved for about 17\% overall higher accuracy than the majority classifier. This result indicates that a significant amount of information can be gleaned from simple syntax-oriented analysis. To further improve the quality of patterns, we intend to analyze misclassified programs in exercises and derive new formats of patterns, which should enable better learning.
We have demonstrated how AST patterns can be described with TREs, and how such patterns can be combined to discover important concepts and errors in student programs. Currently, analyzing patterns and rules is quite cumbersome. We plan on developing a tool to allow teachers to easily construct and refine patterns and rules based on example programs. Ideally we would integrate our approach into an existing analysis tool such as OverCode~\cite{glassman2015overcode}.