1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
|
\section{Method}
\subsection{Learning rules for correct and incorrect programs}
\begin{figure}[t]
\centering
\begin{enumerate}
\item Let $P$ be the data of all student programs, each described with a set of AST patterns
and classified as correct (it passes unit tests) or incorrect.
\item Let method $learn\_rules(target, P, P_1, sig, acc)$ be a method that learns
a set of rules for class $target$ from data $P$. The method needs to consider
two additional constraints: the significance of every attribute-value pair
in the condition part of the rule needs to be significant with respect to the likelihood-ratio
test ($p<sig$) and classification accuracy of each rule on data $P_1$ should be at least $acc$.
\item Let $I-rules = learn\_rules(incorrect, P, P, 0.05, 0.9)$
\item Let $P_c$ be data $P$ without programs that are already covered by $I-rules$
\item Let $C-rules = learn\_rules(correct, P, P_c, 0.05, 0.9)$
\item Return $I-rules$ and $C-rules$
\end{enumerate}
\caption{An outline of the algorithm for learning rules. The method $learn\_rules$,
which induces rules for a specific class, is a variant of the
CN2 algorithm~\cite{YYY} implemented within the Orange data-mining suite~\cite{XXX}.
In all our experiments, $sig$ was set to 0.05 and $acc$ was set to 0.9. }
\label{figure:algorithm}
\end{figure}
As described in the previous section, the feature space contains all frequent AST patterns.
The submitted programs are then represented in this feature space and classified as either
correct, if they pass all prespecified tests, or incorrect otherwise. Such data
serves as the data set for machine learning.
However, since we are always able to validate a program with unit tests,
the goal of machine learning is not to classify new programs, but to discover patterns
that are correlated with correctness of programs. This approach of machine learning is
referred to as descriptive induction, that is the automatic discovery of patterns describing
regularities in data. We use rule learning for this task since rule-based models are easy to comprehend.
Before we can explain the algorithm, we need to discuss the reasons why a program
can be incorrect. From our previous pedagogical experience, a student program is
incorrect 1) if some incorrect pattern is present, which needs to be removed, or 2) if the
program lacks a certain relation (pattern) that should be included before the program can be correct.
We shall now explain how both types of errors can be identified with rules.
To discover patterns related to the first point, the algorithm first learns rules that describe
incorrect programs. The conditions of these rules contain frequent patterns that symptom
incorrect programs. Since rules are used to generate hints and since hints should not be
presented to students unless they are probably correct, we require that each learned rule's
classification accuracy exceeds a certain threshould (in our experiments we used 90\%),
each conjunct in a condition is significant with respect to the likelihood-ratio test (with $p=0.05$ in our experiments),
and a conjunct can only specify the presence of a pattern. The former two constraints are needed
to induce good rules with significant patterns, while the latter constraint assures that rules mention
only presence (and not absence) of patterns as reasons for a program to be incorrect.
With respect to the second type of error, we could try the same approach and learn rules for the class of
correct programs. Having accurate rules for correct programs, the conditional part of these rules would
define sufficient groups of patterns that render a program correct. However, it turns out that
it is difficult to learn accurate rules for correct programs, since these rules should contain all relevant patterns
and prevent incorrect patterns, yet a conjunct can only
specify the presence of a pattern. If specifying absence of patterns was allowed in rules' condition,
the learning problem would still be difficult, since usually there are many incorrect patterns.
A possible way to solve this problem is to learn from data set not containing programs that are covered by rules for incorrect class.
This way all known incorrect patterns are removed from the data and are not needed anymore in conditions of rules.
However, removing incorrect patterns also removes the need for relevant patterns. For example, if all incorrect programs were removed,
a rule ``IF True then class=correct'' would suffice.
Such rule does not contain any relevant patterns and could not be used to generate hints.
We achieved the best results by learning from both data sets. The original data set (with all programs)
is used to learn rules, while the filtered data are used to test whether a rule achieves the
required classification accuracy (90\%).
Figure~\ref{figure:algorithm} contains an outline of the algorithm. The rules describing
incorrect programs are called $I-rules$ and the rules for correct programs are called $C-rules$.
Even though our main interest is discovery of patterns, we could still use induced rules to classify
new programs, for example to evaluate the quality of rules. The classification procedure has three steps:
first check whether a $I-rule$ covers the program that needs to be classified, and if it
does, classify it as incorrect. Then, check whether a $C-rule$ covers the program and classify
it as correct if it does. Otherwise, if none of the induced rules cover this program, classify it as
incorrect.
\subsection{Generating hints from rules}
|