summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTimotej Lazar <timotej.lazar@fri.uni-lj.si>2018-02-05 19:08:53 +0100
committerTimotej Lazar <timotej.lazar@fri.uni-lj.si>2018-02-05 19:08:53 +0100
commit75b0b7de8ef57f02d35b589fdba7fa42252cce82 (patch)
tree7df7c2fd7685c7ed31ccc696b08d96574de8518a
parente4d5f477ca121532a7ca568543da11166065c5dd (diff)
Compactify sections 3 and 4, prettify rules
-rw-r--r--aied2018/aied2018.tex1
-rw-r--r--aied2018/rules.tex170
2 files changed, 103 insertions, 68 deletions
diff --git a/aied2018/aied2018.tex b/aied2018/aied2018.tex
index 514b7e7..17f0e7a 100644
--- a/aied2018/aied2018.tex
+++ b/aied2018/aied2018.tex
@@ -3,6 +3,7 @@
\usepackage[utf8]{inputenc}
\usepackage{newunicodechar}
\newunicodechar{∧}{\ensuremath{\land}}
+\newunicodechar{¬}{\ensuremath{\lnot}}
\newunicodechar{⇒}{\ensuremath{\Rightarrow}}
\newunicodechar{→}{\ensuremath{\rightarrow}}
\newunicodechar{⋯}{\ensuremath{\cdots}}
diff --git a/aied2018/rules.tex b/aied2018/rules.tex
index 55a9a10..855b875 100644
--- a/aied2018/rules.tex
+++ b/aied2018/rules.tex
@@ -1,85 +1,107 @@
-\section{Rules}
+\section{Learning rules}
\label{sec:rules}
-\subsection{The learning algorithm}
-The goal of learning rules in this paper is to extract and explain common approaches and mistakes in student programs. We use a rule learner called ABCN2e implemented within the Orange data mining library~\cite{demsar2013orange}. ABCN2e is a variant of the classical CN2 algorithm~\cite{clarkECML1991} for learning unordered rules. The differences between CN2 and ABCN2e are described in a technical report found at \url{https://ailab.si/abml.}
+The goal of learning rules in this paper is to discover and explain common approaches and mistakes in student programs. We use a rule learner called ABCN2e, implemented within the Orange data mining library~\cite{demsar2013orange}. ABCN2e modifies the original CN2 algorithm~\cite{clarkECML1991} to learn unordered rules; modifications are described in a technical report at \url{https://ailab.si/abml}.
-General rule-learning algorithms, such as CN2, tend to generate large amounts of specific rules, which leads to more accurate results, however this makes them less appropriate for explaining. We will now describe a problem specific configuration of the rule-learning algorithm that extracts relevant and explainable patterns from student programs.
+General rule-learning algorithms, such as CN2, tend to generate many specific rules. This produces more accurate results but makes rules harder to explain. This section describes the problem-specific configuration of the rule-learning algorithm for extracting relevant and explainable patterns from student programs.
-Each student's program is represented in the feature space of AST patterns described in the previous section. Each program is classified either as \emph{correct} or \emph{incorrect}. The reasons for a program to be incorrect are either: a) it contains some incorrect pattern (a buggy pattern), which needs to be removed or modified, or b) is missing one or several programing constructs (patterns) that should be included before the program can be correct.
+Each program is represented in the feature space of AST patterns described in the previous section. Based on test results each program is classified either as \emph{correct} or \emph{incorrect}. A program can be incorrect for one of two reasons: either a) it contains some incorrect pattern (a buggy pattern) that should be removed or modified, or b) it is missing one or more programing constructs (patterns) that should be present for the program to be correct.
-Both reasons can be expressed with classification rules. In the case of buggy patterns, we learn classification rules for incorrect programs, where each condition in the rule must express the presence of a pattern. The condition of such a rule therefore contains a set of patterns that imply a bug in the program. In the case of missing patterns, we learn another set of rules, which cover programs not covered by above rules and may contain missing patterns within their conditions. These rules describe which missing concepts in a program still have to be implemented. All rules explaining incorrect programs are called \emph{n-rules}.
+Classification rules can express both reasons. For buggy patterns we learn rules for incorrect programs, where each condition in the rule must express the presence of a pattern. The condition of such a rule therefore contains a set of patterns that imply a bug in the program. For missing patterns, we learn another set of rules covering programs that are not covered by above rules. These rules may contain missing patterns within their conditions, and describe the missing constructs in a program that have to be implemented. All rules explaining incorrect programs are called \emph{n-rules}.
+
+To learn explainable, meaningful and non-redundant rules, we impose the following additional constraints on the rule learner:
-To learn explainable, meaningful and non-redundant rules, we need to impose the following additional constraints on the rule learner:
\begin{itemize}
- \item classification accuracy of each learned rule must exceed 90\%, because we accept a 10\% false-positive error as acceptable,
- \item each conjunct in the condition of a rule must be significant according to the likelihood test, meaning that each pattern in the condition part is indeed relevant (we set the significance threshold to p=0.05),
- \item a condition can only have at most 3 patterns, and
- \item each rule should cover at least 5 unique programs to prevent learning redundant rules that would represent the same error with a different combination of patterns.
+ \item classification accuracy of each rule must exceed 90\%, because we accept a 10\% false-positive error as acceptable;
+ \item each conjunct in the condition of a rule must be significant according to the likelihood test, meaning that each pattern in the condition part is indeed relevant (we set the significance threshold to p=0.05);
+ \item a condition can have at most 3 patterns; and
+ \item each rule must cover at least 5 distinct programs -- this avoids redundant rules that represent the same error with a different combination of patterns.
\end{itemize}
-Different approaches can be represented with rules explaining correct programs. A program is correct when all the necessary patterns are implemented and none of the buggy patterns are present. For each exercise, there are different possible sets of necessary patterns, each set corresponding to a different approach to solving the exercise.
+Different approaches can be represented with rules explaining correct programs. A program is correct when it implements all required patterns and no buggy patterns. There may be several possible sets of required patterns for each exercise, with each set corresponding to a different approach to solving it.
+
+We use the same constraints as in the case of n-rules and learn rules for correct programs called \emph{p-rules}. In this case, we always require that conditions mention the presence of patterns, since it is easier to explain different approaches of students with something they have written and not with something they have not. To account for possible buggy patterns, the requirement to achieve 90\% classification accuracy was not evaluated on full data, but only on data not covered by n-rules. Hence, a rule can cover an example with a specific approach even though it contains a buggy pattern.
+
-We use the same constraints as in the case of n-rules and learn rules for correct programs called \emph{p-rules}. In this case, we always require that conditions mention the presence of patterns, since it is easier to explain different approaches of students with something they have written and not with something they have not. To account for possible buggy patterns, the requirement to achieve 90\% classification accuracy was not evaluated on full data, but only on data not covered by n-rules. Hence, a rule can cover an example with a specific approach even though if it contains a buggy pattern.
+\section{Interpreting rules}
-\section{Interpretation of rules}
-Learned rules can be used to analyze student programing. We will describe and interpret several learned rules from two Python exercises. The first exercise is \textit{Fahrenheit to Celsius}, which requires a simple user-computer interaction and a single expression. The second exercise is \textit{Greatest Absolutist}, one of the introductory Python exercises for functions.
+Learned rules can be used to analyze student programing. This section describes several rules induced for two Python exercises: \textsf{Fahrenheit to Celsius}, which reads a value from standard input and calculates the result, and \textsf{Greatest Absolutist}, one of the introductory exercises for functions.
\subsection{Fahrenheit to Celsius}
-The first problem in CodeQ Python class is to implement a program that converts from degrees Fahrenheit to degrees Celsius. The program should ask the user to input temperature in Fahrenheit degrees, and then output the temperature in Celsius. A sample correct program is:
+
+The first problem in CodeQ Python course is to write a program converting from degrees Fahrenheit to degrees Celsius. The program should ask the user to input a temperature, and print the result. A sample correct program is:
+
\begin{Verbatim}
F = float(input("Fahrenheit: "))
C = 5 / 9 * (F - 32)
print("Celsius: ", C)
\end{Verbatim}
-Students have so far submitted 1177 programs for this problem, 495 of them were correct and 682 incorrect. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules, 14 of those mention only presence of patterns, and 16 p-rules.
+Students have submitted 1177 programs for this problem, with 495 correct and 682 incorrect programs. Our systems extracted 891 relevant AST patterns, which were used as attributes in rule learning. The rule learner induced 24 n-rules, 14 of which mention only presence of patterns, and 16 p-rules.
-We will first take a look at n-rules that mention only presence of patterns in their conditions. The most accurate rule according to the rule learner was:
-\begin{Verbatim}
-IF regex-20==T THEN correct=F [208, 1]
+We first take a look at n-rules that mention only presence of patterns in their conditions. The most accurate rule according to the rule learner was:
+
+\begin{Verbatim}[fontfamily=sf]
+P20 ⇒ incorrect [208, 1]
\end{Verbatim}
-This rule covers programs where \texttt{regex-20} is present, implies an incorrect program (\texttt{correct=F}) and covers 208 incorrect programs and 1 correct. Regex-20 is the AST pattern describing a call to the \texttt{int} function:
-\begin{Verbatim}
-(Module (body (Assign (value (Call (func
-(Name (id int) (ctx Load))))))))
+
+\noindent
+This rule covers programs where the pattern \textsf{P20} is present. It implies an incorrect program, and covers 208 incorrect and one correct program. \textsf{P20} is the AST pattern describing a call to the \texttt{int} function:
+
+\begin{Verbatim}[fontfamily=sf]
+(Module (body (Assign (value (Call (func (Name (id int) (ctx Load))))))))
\end{Verbatim}
-The second best n-rule that covers 72 incorrect and 0 correct programs was:
-\begin{Verbatim}
-IF regex-5==T AND regex-35==T THEN correct=F [72 0].
+
+The second best n-rule covers 72 incorrect and no correct programs:
+
+\begin{Verbatim}[fontfamily=sf]
+P5 ∧ P35 ⇒ incorrect [72, 0]
\end{Verbatim}
-Regex-5 describes the pattern in programs, where the result of the \texttt{input} function is not casted to \texttt{float}; in fact, it is not casted to at all, students merely store the returned string. Regex-35 pattern describes a substraction of value 32 from a variable in an expression. Two sample program that match the first rule (left) and the second rule(right) are (code that matches patterns is highlighted):
+
+\noindent
+Pattern \textsf{P5} matches programs where the result of the \texttt{input} call is not cast to \texttt{float} but stored as a string. Pattern \textsf{P35} matches programs where the value 32 is subtracted from a variable on the left-hand side of a multiplication. Sample programs matching the first rule (left) and the second rule (right) are:
+
\begin{Verbatim}
g2 = input() g2 = \blue{input}('Temperature [F]? ')
g1 = \blue{int}(g2) g1 = (\red{(g2 - 32) *} (5 / 9))
print(((g1-32)*(5/9))) print(g2, 'F equals', g1, 'C')
\end{Verbatim}
-These two rules describe two (out of many) errors made by students. The left program is incorrect, since it doesnt work when a user inputs a floating value. The right program is incorrect, because the input string must be casted to float. Not casting it (regex-5) and then using it in an expression (regex-35) will result in a Python exception stating that it can not subtract a number from a string.
-The next two rules are the most accurate two n-rules that also contain missing patterns in their conditions:
-\begin{Verbatim}
-IF regex-0==F THEN correct=F [106 0]
-IF regex-1==F AND regex-16==T THEN correct=F [100 0]
+These rules describe two common student errors. The left program is incorrect, since it fails when the user inputs a decimal. The right program is incorrect because the input string must be cast to a number. Not casting it (pattern \textsf{P5}) and then using it in an expression (pattern \textsf{P35}) will raise an exception.
+
+The two most accurate n-rules with missing patterns in their conditions are:
+
+\begin{Verbatim}[fontfamily=sf]
+¬P0 ⇒ incorrect [106, 0]
+¬P1 ∧ P16 ⇒ incorrect [100, 0]
\end{Verbatim}
-The pattern regex-0 matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything.
-The second rule covers programs with missing regex-1 and present regex-16. Regex-16 matches programs with a call to the \texttt{print} function, where the argument contains a formula which subtracts 32 from a variable and then further multiplies the result. Regex-1 describes a call to the function \texttt{float} as the first item in an expression, i.e. \texttt{= float(...)}. This rule therefore represents programs that have the formula in the \texttt{print} function (regex-16 is present), however fail to cast input from string to float (regex-1 is missing).
+\noindent
+Pattern \textsf{P0} matches programs with a call to function \texttt{print}. A program without a \texttt{print} is always incorrect, since it will not output anything.
+
+The second rule covers programs with \textsf{P1} missing and \textsf{P16} present. \textsf{P16} matches programs with a call to the \texttt{print} function, where the argument contains a formula which subtracts 32 from a variable and then further multiplies the result. \textsf{P1} describes a call to the function \texttt{float} as the first item in an expression, i.e. \texttt{= float(...)}. This rule therefore represents programs that have the formula in the \texttt{print} function (\textsf{P16} is present), however fail to cast input from string to float (\textsf{P1} is missing).
Let us now examine the other type of rules. The best four p-rules are:
-\begin{Verbatim}
-IF regex-2!=F AND regex-8!=F THEN correct=T [ 1 200]
-IF regex-1!=F AND regex-42!=F THEN correct=T [ 0 68]
-IF regex-1!=F AND regex-8!=F THEN correct=T [ 3 217]
-IF regex-80!=F THEN correct=T [ 0 38]
+\begin{Verbatim}[fontfamily=sf]
+P2 ∧ P8 ⇒ correct [1, 200]
+P1 ∧ P42 ⇒ correct [0, 68]
+P1 ∧ P8 ⇒ correct [3, 217]
+P80 ⇒ correct [0, 38]
\end{Verbatim}
-The patterns in the condition of the first rule, regex-2 and regex-8, correspond to a call to the function \texttt{input} within the function \texttt{float}, i.e.\texttt{float(input())}, and a call to the function \texttt{print}, which contains the pattern \texttt{-32)*} in the first argument, respectively. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}.
-This first rule demonstrates an important property of p-rules: although the patterns regex-2 and regex-8 are not sufficient for a program to be correct (it is trivial to implement an incorrect program containing both patterns), only one out of 201 submissions macthing these two patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. When a students figures these two out, implementing the rest of the program should be straightforward. A sample program matching the first rule is:
+\noindent
+Patterns in the condition of the first rule, \textsf{P2} and \textsf{P8}, correspond respectively to expressions of the form \texttt{float(input(?))} and \texttt{print((?-32)*?)}. Programs matching both patterns wrap the function \texttt{float} around \texttt{input}, and have an expression that subtracts 32 and then uses multiplication within the \texttt{print}.
+
+This first rule demonstrates an important property of p-rules: although patterns \textsf{P2} and \textsf{P8} are in general not sufficient for a correct program (it is trivial to implement a matching but incorrect program), only one out of 201 student submissions matching these patterns was incorrect. This suggests that the conditions of p-rules represent the critical elements of the solution. Once a student has figured out these patterns, they are almost certain to have a correct solution. A sample program matching the first rule is:
+
\begin{Verbatim}
g1 = \blue{float(input(}'Temperature [F]: '))
print(((g1 \red{- 32) *} (5 / 9)))
\end{Verbatim}
-The second and the third p-rules are variations of the first. For example, the second rule describes programs that have the formula in the second argument of the \texttt{print} function. The fourth rule, however, is different. The pattern regex-80 describes programs that subtract 32 from a variable, which is casted to float. The following program matches regex-80:
+
+\noindent
+The second and third p-rules are variations of the first. For instance, the second rule describes programs that have the formula in the argument to the \texttt{print} function. The fourth rule, however, is different. \textsf{P80} describes programs that subtract 32 from a variable cast to float. The following program matches \textsf{P80}:
+
\begin{Verbatim}
g1 = input('Fahrenheit?')
g0 = (\blue{(float(g1) - 32)} * (5 / 9))
@@ -87,7 +109,9 @@ print(g0)
\end{Verbatim}
\subsection{Greatest Absolutist}
-In this more complex exercise a student has to implement a function named \texttt{max\_abs} that accepts a list of numbers as an argument and returns the number with the largest absolute value. A sample correct solution is:
+
+In this exercise students must implement a function that accepts a list of numbers and returns the element with the largest absolute value. One solution is
+
\begin{Verbatim}[fontfamily=tt]
def max_abs(l):
vmax = l[0]
@@ -99,10 +123,11 @@ def max_abs(l):
We have received 155 submissions (57 correct, 98 incorrect) for this exercise. Due to its higher complexity and since the solutions are much more diverse, we obtained 8298 patterns to be used as attributes in learning. High number of patterns together with a low number of learning examples could present a problem for rule learning: since the space of possible rules is large, some of the learned rules might be a result of statistical anomalies. One needs to apply a certain amount of caution when interpreting these rules.
-The rule-learning algorithm learned 15 n-rules (7 mentioning only presence of patterns) and 6 p-rules. Bellow we can see the two best n-rules referring to the presence of patterns and two programs; the left one is covered by the first rule, and the right one by the second rule:
-\begin{Verbatim}
-IF regex-64==T THEN correct=F [22 0]
-IF regex-2==T AND regex-70==T THEN correct=F [17 0]
+The rule-learning algorithm learned 15 n-rules (7 mentioning only presence of patterns) and 6 p-rules. Below we can see the two best n-rules referring to the presence of patterns and two programs; the left one is covered by the first rule, and the right one by the second rule:
+
+\begin{Verbatim}[fontfamily=sf]
+P64 ⇒ incorrect [22, 0]
+P2 ∧ P70 ⇒ incorrect [17, 0]
\end{Verbatim}
\begin{Verbatim}
@@ -114,34 +139,44 @@ def max_abs(l): def max_abs(l):
return \blue{vmax} return \red{vmax}
\end{Verbatim}
-The pattern from the first rule, regex-64, matches programs, where a) a variable is used in the condition of an if clause without an application of another function, such as \texttt{abs}, and b) the function returns this variable. The left program demonstrates this pattern, where the value \texttt{vmax} is compared in the if clause and then returned. According to the teachers of the Python class, this error is a common, because students forget that they need to compare the absolute value of \texttt{vmax}.
+The pattern from the first rule, \textsf{P64}, matches functions returning the variable that is used in the condition of an if clause without an application of another function (such as \texttt{abs}). The left program demonstrates this pattern, where the value \texttt{vmax} is compared in the if clause and then returned. According to the teachers of the Python class, this error is common, because students forget that they need to compare the absolute value of \texttt{vmax}.
-The second rule contains two patterns. Regex-70 (blue) matches programs that contain function \texttt{abs} in an assignment statement within a function, for loop and an if clause. Regex-2 (red) describes the pattern, where a variable is used in an assignment statement within a for-if block, and the same variable is returned by the function. Such programs are incorrect, because they do not store the original value of the variable. For example, if -7 has the largest absolute value in the list, then the function should return -7 and not 7.
+The second rule contains two patterns. \textsf{P70} (blue) matches functions containing the call to \texttt{abs} in an assignment statement nested within a for loop and an if clause. \textsf{P2} (red) matches functions that return the variable used in an assignment statement within a for-if block. Such programs are incorrect because they do not store the original list element. For example, if -7 has the largest absolute value in the list, then the function should return -7 and not 7.
The best two n-rules with absence of patterns in condition are:
-\begin{Verbatim}
-IF regex-1==T AND regex-11==F AND regex-131==F AND THEN correct=F [34 0]
-IF regex-36==T AND regex-162==F THEN correct=F [26 0]
+
+\begin{Verbatim}[fontfamily=sf]
+P1 ∧ ¬P11 ∧ ¬P131 ⇒ incorrect [34, 0]
+P36 ∧ ¬P162 ⇒ incorrect [26, 0]
\end{Verbatim}
-The first rule covers programs that match regex-1 (checks whether a function is defined within the program), but miss regex-11 (if the variable from a for loop is directly used in an assignment statement within an if clause) and regex-131 (whether the return statement uses indexing, e.g. .\texttt{return l[i]}). Such a program is, for example, the above right program: it has a function defined, it does not directly use variable \texttt{t} in the assignment, but uses its absolute value, and does not use indexing in the return statement.
+
+\noindent
+The first rule covers programs matching \textsf{P1} (checks for a function definition in the program) but missing \textsf{P11} (if the iteration variable in a for loop is directly assigned to another variable within an if clause) and \textsf{P131} (whether the return statement uses indexing, i.e. \texttt{return l[?]}). One such example is the above right program: it contains a function definition, it does not directly assign the value of \texttt{v} but uses its absolute value, and does not use indexing in the return statement.
This rule specifies two missing patterns, which makes is quite difficult to understand. It does not directly state the issue with a given program: if one of the two missing patterns were implemented, the rule would not cover this program any more. Therefore, the questions is, which of these two reasons is really missing? Different missing patterns could be understood as different options to finalize the program.
-The second rule identifies only one missing pattern. The Regex-36 pattern describes the use of \texttt{max} function in the return statement, whereas the regex-162 pattern describes programs that call \texttt{max} function using the input list as the argument. For example, the following program is covered by the second rule,
+The second rule identifies only one missing pattern. \textsf{P36} matches a call to \texttt{max} in the return statement, whereas \textsf{P162} matches a call to \texttt{max} with the given list as the argument. This rule covers, for example, the following program:
+
\begin{Verbatim}
def max_abs(l):
return max(abs(l))
\end{Verbatim}
-since it uses \texttt{max} in the return statement, however \texttt{max} is not applied directly to the input list \texttt{l}. Note that Python would fail to run the above program, as function \texttt{abs} does not accept a list as the argument.
+
+\noindent
+It uses \texttt{max} in the return statement, but does not apply it directly to the input list \texttt{l}. Note that this program would fail because the function \texttt{abs} does not accept a list argument.
The four most accurate p-rules induced by our rule learner were:
-\begin{Verbatim}[fontfamily=tt]
-IF regex-11==T AND regex-17==T AND regex-35==T THEN correct=T [ 0 20]
-IF regex-11==T AND regex-27==T AND regex-3==T THEN correct=T [ 2 34]
-IF regex-519==T THEN correct=T [0 9]
-IF regex-27==T THEN correct=T [ 6 38]
+
+\begin{Verbatim}[fontfamily=sf]
+P11 ∧ P17 ∧ P35 ⇒ correct [0, 20]
+P11 ∧ P27 ∧ P3 ⇒ correct [2, 34]
+P519 ⇒ correct [0 9]
+P27 ⇒ correct [ 6 38]
\end{Verbatim}
-A sample program (left) covered by the first rule and a sample program (right) covered by the second rule:
+
+\noindent
+Sample programs covered by the first and second rules are:
+
\begin{Verbatim}[fontfamily=tt]
def max_abs(l): def max_abs(\green{l}):
\green{vmax = 0} vmax = l[0]
@@ -150,19 +185,18 @@ def max_abs(l): def max_abs(\green{l}):
\red{vmax} = \blue{v} vmax = \blue{v}
return vmax return \red{vmax}
\end{Verbatim}
-The first two rules and the above programs are similar. Both rules share a common reason, regex-11 (blue in both programs), describing a pattern, where the variable from the for loop is used in the right side of an assignment within the if clause. Regex-17 and regex-27 are also similar (red in both programs). The former links the occurrence of a variable within the \texttt{abs} function in an if condition with the variable from an assignment, whereas the latter links the same variable from an if condition with the variable from the return statement. Regex-35 matches variable assignments to 0, hence the first rule covers solutions initializing \texttt{vmax} to zero. Regex-3 matches for-looping over the input list.
-After inspecting all covered examples of the first and the second rule, we found out that the first rule is only a more strict version of the second rule, since all examples covered by the first rule are also covered by the second rule. These two rules therefore do not describe two different approaches, but two different representations of the same approach. Similarly, the fourth rule is a generalization of the first two rules, containing only regex-27 within conditions. This pattern seem to be particularly important. Of 44 programs, where students used the absolute value of \texttt{vmax} in comparison and returned \texttt{vmax} at the end, 38 were evaluated as correct.
+\noindent
+The first two rules and the above programs are similar. Both rules share a common reason, \textsf{P11} (blue in both programs), describing a pattern, where the variable from the for loop is used in the right side of an assignment within the if clause. \textsf{P17} and \textsf{P27} are also similar (red in both programs). The former links the occurrence of a variable within the \texttt{abs} function in an if condition with the variable from an assignment, whereas the latter links the same variable from an if condition with the variable from the return statement. \textsf{P35} matches variable assignments to 0, hence the first rule covers solutions initializing \texttt{vmax} to zero. \textsf{P3} matches for-looping over the input list.
+
+After inspecting all covered examples of the first and the second rule, we found out that the first rule is only a more strict version of the second rule, since all examples covered by the first rule are also covered by the second rule. These two rules therefore do not describe two different approaches, but two different representations of the same approach. Similarly, the fourth rule is a generalization of the first two rules, containing only \textsf{P27} within conditions. This pattern seem to be particularly important. Of 44 programs, where students used the absolute value of \texttt{vmax} in comparison and returned \texttt{vmax} at the end, 38 were evaluated as correct.
The third rule describes a different pattern. It covers programs that define a list containing values 2, 1, and -6. Defining such a list is evidently not necessary for the solution of this exercise. Why would it then correlate with the correctness of the solution?
To explain this rule we first have to describe how students test their programs. One option is to simply use the \textit{Test} button, which submits the program to a server, where it is tested against a predefined set of test cases. The other option is to click the \textit{Run} button, which runs the program and outputs the results. Those students who defined a list with values 2, 1, and -6 in their programs are most likely using the second option. They create their own test cases and then submit a program only when they are certain that it is correct. Since the description of the exercise includes a single test case with values 2, 1, and -6, most students use this list as the testing case.
-On the other hand, given that the rule covers only 9 programs, the probability that the rule is a statistical artifact is not negligible.
+On the other hand, given that the rule covers only nine programs, the probability that the rule is a statistical artifact is not negligible.
\section{Evaluation and discussion}
-
-
-