diff options
Diffstat (limited to 'aied2018')
-rw-r--r-- | aied2018/aied2018.tex | 4 | ||||
-rw-r--r-- | aied2018/rules.tex | 76 |
2 files changed, 75 insertions, 5 deletions
diff --git a/aied2018/aied2018.tex b/aied2018/aied2018.tex index 64f66a8..cbaae94 100644 --- a/aied2018/aied2018.tex +++ b/aied2018/aied2018.tex @@ -52,10 +52,10 @@ We propose using tree regular expressions to encode common patterns in programs. %\input{introduction} %\input{background} \input{patterns} -%\input{rules} +\input{rules} %\input{evaluation} %\input{conclusion} \bibliographystyle{splncs} -%\bibliography{aied2018} +\bibliography{aied2018} \end{document} diff --git a/aied2018/rules.tex b/aied2018/rules.tex index fc69828..7821ec4 100644 --- a/aied2018/rules.tex +++ b/aied2018/rules.tex @@ -1,7 +1,7 @@ \section{Rules} \subsection{The learning algorithm} -The goal of learning rules in this paper is to extract and explain common approaches and mistakes in student programs. We use a rule learner called ABCN2e implemented within the Orange data mining library~\cite{demsar2013orange}. ABCN2e is an improvement of the classical CN2 algorithm~\cite{clarkECML1991} for learning unordered rules. The differences between CN2 and ABCN2e are described in a technical report found at \url{https://ailab.si/abml}.} +The goal of learning rules in this paper is to extract and explain common approaches and mistakes in student programs. We use a rule learner called ABCN2e implemented within the Orange data mining library~\cite{demsar2013orange}. ABCN2e is an improvement of the classical CN2 algorithm~\cite{clarkECML1991} for learning unordered rules. The differences between CN2 and ABCN2e are described in a technical report found at \url{https://ailab.si/abml.} General rule-learning algorithms, such as CN2, tend to generate large amounts of specific rules, which leads to more accurate results, however this makes them less appropriate for explaining. We will now describe a problem specific configuration of the rule-learning algorithm that extracts relevant and explainable patterns from student programs. @@ -11,8 +11,9 @@ Both reasons can be expressed with classification rules. In the case of buggy pa To learn explainable, meaningful and non-redundant rules, we need to impose the following additional constraints on the rule learner: \begin{itemize} - \item classification accuracy of each learned rule must exceed 90\%, because we accept a 10\% false-positive error as acceptable. - \item each conjunct in the condition of a rule must be significant according to the likelihood test, meaning that each pattern in the condition part is indeed relevant (we set the significance threshold to p=0.05) + \item classification accuracy of each learned rule must exceed 90\%, because we accept a 10\% false-positive error as acceptable, + \item each conjunct in the condition of a rule must be significant according to the likelihood test, meaning that each pattern in the condition part is indeed relevant (we set the significance threshold to p=0.05), + \item a condition can only have at most 3 patterns, and \item each rule should cover at least 5 unique programs to prevent learning redundant rules that would represent the same error with a different combination of patterns. \end{itemize} @@ -21,7 +22,76 @@ Different approaches can be represented with rules explaining correct programs. We use the same constraints as in the case of n-rules and learn rules for correct programs called \emph{p-rules}. In this case, we always require that conditions mention the presence of patterns, since it is easier to explain different approaches of students with something they have written and not with something they have not. To account for possible buggy patterns, the requirement to achieve 90\% classification accuracy was not evaluated on full data, but only on data not covered by n-rules. Hence, a rule can cover an example with a specific approach even though if it contains a buggy pattern. \section{Interpretation of rules} +Learned rules can be used to analyse student programing. For demonstration we will describe and interpret several learned rules from three Python exercises. The first exercise is \textit{Fahrenheit to Celsius}, which requires a simple user-computer interaction and a single expression. The second is \textit{Buy Five}, where students use loops for the first time. The third exercise is \textit{Greatest Absolutist}, which is one of the introductory Python exercises for functions. + +\subsection{Fahrenheit to Celsius} +The first problem in CodeQ Python class is to implement a program that converts from degrees Fahrenheit to degrees Celsius. The program should ask the user to input temperature in Fahrenheit degrees, and then output the temperature in Celsius. A sample correct program is: +\begin{Verbatim}[fontfamily=tt] +F = float(input("Fahrenheit: ")) +C = 5 / 9 * (F - 32) +print("Celsius: ", C) +\end{Verbatim} + +Students have so far submitted 1177 programs for this problem, 495 of them were correct and 682 incorrect. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules, 14 of those mention only presence of patterns, and 16 p-rules. + +We shall first examine n-rules that mention only presence of patterns in their conditions. The most accurate rule according to the rule learner was: +\begin{Verbatim}[fontfamily=tt] +IF regex-20==T THEN correct=F [208, 1] +\end{Verbatim} +This rule covers programs where \texttt{regex-20} is present, implies an incorrect program (\texttt{correct=F}) and covers 208 incorrect programs and 1 correct. Regex-20 is the AST pattern describing a call to the \texttt{int} function: +\begin{Verbatim}[fontfamily=tt] +(Module (body (Assign (value (Call (func +(Name (id int) (ctx Load)))))))) +\end{Verbatim} +The second best n-rule that covers 72 incorrect and 0 correct programs was: +\begin{Verbatim}[fontfamily=tt] +IF regex-5==T AND regex-35==T THEN correct=F [72 0]. +\end{Verbatim} +Regex-5 describes a pattern in programs, where the result of function \texttt{input} is not casted to \texttt{float}; in fact, it is not casted to at all, students merely store the returned string. Regex-32 pattern describes the substraction of value 32 from a variable in an expression. Two sample program that match the first rule (left) and the second rule(right) are (matching parts of code are underlined): +\begin{Verbatim}[fontfamily=tt] +g2 = input() g2 \underline{= input(}'Temperature [F]? ') +g1 \underline{= int(}g2) g1 \underline{= ((g2 - 32) *} (5 / 9)) +print(((g1-32)*(5/9))) print(g2, 'F equals', g1, 'C') +\end{Verbatim} +These two rules describe two (out of many) types of errors made by students. The left program is incorrect, since it doesnt work when a user inputs a floating value. The right program is incorrect, because the input string must be casted to float. Not casting it (regex-5) and then using it in an expression (regex-35) will result in a Python exception stating that it can not subtract a number from a string. + +The next two rules are the most accurate two n-rules that also contain missing patterns in their conditions: +\begin{Verbatim}[fontfamily=tt] +IF regex-0==F THEN correct=F [106 0] +IF regex-1==F AND regex-16==T THEN correct=F [100 0] +\end{Verbatim} +The pattern regex-0 matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything. + +The second rule covers programs with missing regex-1 and present regex-16. Regex-16 matches programs with a call to the \texttt{print} function, where the argument contains a formula which subtracts 32 from a variable and then further multiplies the result. Regex-1 describes a call to the function \texttt{float} as the first item in an expression, i.e. \texttt{= float(...)}. This rule therefore represents programs that have the formula in the \texttt{print} function (regex-16 is present), however fail to cast input from string to float (regex-1 is missing). + +Finally, let us examine best four p-rules: +\begin{Verbatim}[fontfamily=tt] +IF regex-2!=F AND regex-8!=F THEN correct=T [ 1 200] +IF regex-1!=F AND regex-42!=F THEN correct=T [ 0 68] +IF regex-1!=F AND regex-8!=F THEN correct=T [ 3 217] +IF regex-80!=F THEN correct=T [ 0 38] +\end{Verbatim} +The patterns in the condition of the first rule, regex-2 and regex-8, correspond to a call to the function \texttt{input} within the function \texttt{float}, i.e.\texttt{float(input())}, and a call to the function \texttt{print}, which contains the pattern \texttt{-32)*} in the first argument, respectively. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}. + +This first rule demonstrates an important property of p-rules: although the patterns regex-2 and regex-8 are not sufficient for a program to be correct, as it is trivial to implement an incorrect program containing both patterns, only one out of 201 submissions macthing these two patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. When a students figures these out, implementing the rest of the program shold be straightforward. A sample program matching the first rule is: +\begin{Verbatim}[fontfamily=tt] +g1 = \underline{float(input(}'Temperature [F]: ')) +print(((g1 \underline{- 32) *} (5 / 9))) +\end{Verbatim} +The second and the third p-rules are variations of the first. For example, the second rule describes programs that have the formula in the second argument of the \texttt{print} function. The fourth rule, however, is different. The pattern regex-80 describes programs that subtract 32 from a variable, which is casted to float. The following program matches regex-80:: +\begin{Verbatim}[fontfamily=tt] +g1 = input('Fahrenheit?') +g0 = (\underline{(float(g1) - 32)} * (5 / 9)) +print(g0) +\end{Verbatim} + +\subsection{Buy Five} + +\subsection{Greatest Absolutist} + \section{Evaluation of rules} + + |