diff options
Diffstat (limited to 'aied2017')
-rw-r--r-- | aied2017/method.tex | 60 |
1 files changed, 33 insertions, 27 deletions
diff --git a/aied2017/method.tex b/aied2017/method.tex index 57643db..4df3989 100644 --- a/aied2017/method.tex +++ b/aied2017/method.tex @@ -22,39 +22,45 @@ Patterns constructed in this way form the set of features for rule learning. To \subsection{Learning rules} -\begin{figure}[t] -\centering - \begin{enumerate} - \item Let $P$ be the data of all student programs, each described with a set of AST patterns - and classified as correct (it passes unit tests) or incorrect. - \item Let method $learn\_rules(target, P, P_1, sig, acc)$ be a method that learns - a set of rules for class $target$ from data $P$. The method needs to consider - two additional constraints: the significance of every attribute-value pair - in the condition part of the rule needs to be significant with respect to the likelihood-ratio - test ($p<sig$) and classification accuracy of each rule on data $P_1$ should be at least $acc$. - \item Let $I-rules = learn\_rules(incorrect, P, P, 0.05, 0.9)$ - \item Let $P_c$ be data $P$ without programs that are already covered by $I-rules$ - \item Let $C-rules = learn\_rules(correct, P, P_c, 0.05, 0.9)$ - \item Return $I-rules$ and $C-rules$ - \end{enumerate} - \caption{An outline of the algorithm for learning rules. The method $learn\_rules$, which induces rules for a specific class, is a variant of the CN2 algorithm~\cite{x} implemented within the Orange data-mining suite~\cite{demsar2013orange}. In all our experiments, $sig$ was set to 0.05 and $acc$ was set to 0.9.} - \label{figure:algorithm} -\end{figure} - -Submitted programs are represented in the feature space of AST patterns described above. We classify programs as correct or incorrect based on predefined unit tests for each problem, and use these labels for machine learning. - -Since programs can already be validated with appropriate unit tests, our goal is not classifying new submissions, but rather to discover patterns correlated with program correctness. This machine-learning approach is called \emph{descriptive induction} -- the automatic discovery of patterns describing regularities in data. We use rule learning for this task because rule-based models are easily comprehensible. +%\begin{figure}[t] +%\noindent \textsc{Procedure $learn\_rules(D, sig, acc)$} +%\\ +%\\ +%\noindent Given: data \emph{D}, threshold of significance \emph{sig}, threshold of accuracy \emph{acc}, and a +%procedure for learning rules $learn\_target\_rules(target, D, D_{acc}, sig, acc)$ that learns a set of rules for +%class \emph{target} from data \emph{D}, where: a) each attribute-value pair in the condition part +%needs to be significant using the likelihood-ratio test with significance level $p<sig$, +%b) classification accuracy of rules on data $D_{acc}$ should be at least $acc$, and c) only positive values of +%attributes are mentioned in conditions. +%\begin{enumerate} +% \item Let \emph{I-rules} = $learn\_target\_rules(incorrect, D, D, sig, acc)$ +% \item Let $D_c$ be data $D$ without programs covered by \emph{I-rules} +% \item Let \emph{C-rules} = $learn\_target\_rules(correct, D, D_c, sig, acc)$ +% \item Return \emph{I-rules} and \emph{C-rules} +% \end{enumerate} +% \caption{An outline of the algorithm for learning rules. The method \emph{learn\_rules}, which induces rules for a specific class, is a variant of the CN2 algorithm~\cite{cn21991} implemented within the Orange data-mining suite~\cite{demsar2013orange}. In all our experiments, \emph{sig} was set to 0.05 and \emph{acc} was set to 0.9.} +% \label{figure:algorithm} +%\end{figure} + +We represent submitted programs in the feature space of AST patterns described above. One pattern corresponds to one boolean feature with value set to $true$ when pattern is present and $false$ when pattern is absent. We classify programs as correct or incorrect based on predefined unit tests for each problem, and use these labels for machine learning. + +Since programs can be validated with appropriate unit tests, our goal is not classifying new submissions, but to discover patterns related to program correctness. This machine-learning approach is called \emph{descriptive induction} -- the automatic discovery of patterns describing regularities in data. We use rule learning for this task because conditions of rules can be easily translated to hints. Before explaining the algorithm, let us discuss the reasons why a program can be incorrect. Our teaching experience indicates that bugs in student programs can often be described by 1) some incorrect or \emph{buggy} pattern, which needs to be removed, or 2) some missing relation (pattern) that should be included before the program can be correct. We shall now explain how both types of errors can be identified with rules. -To discover buggy patterns, the algorithm first learns rules that describe incorrect programs. The conditions of these rules contain frequent patterns symptomatic of incorrect programs. Since rules are used to generate hints, and since hints should not be presented to students unless they are probably correct, we require that each learned rule's classification accuracy exceeds a certain threshold (in our experiments we used 90\%), each conjunct in a condition is significant with respect to the likelihood-ratio test (with $p=0.05$ in our experiments), and a conjunct can only specify the presence of a pattern. The former two constraints are needed to induce good rules with significant patterns, while the latter constraint assures that rules mention only presence (and not absence) of patterns as reasons for a program to be incorrect. +To discover buggy patterns, the algorithm first learns rules that describe incorrect programs (I-rules). We use a variant of the CN2 algorithm~\cite{cn21991} implemented within the Orange data-mining toolbox~\cite{demsar2013orange}. Since we use rules to generate hints, and since hints should not be presented to students unless they are likely to be correct, we impose additional constraints on the rule learner: +\begin{enumerate} + \item The classification accuracy of each learned rule must exceed a certain threshold (in our experiments we used 90\% as threshold). + \item Each conjunct in a condition must be significant with respect to the likelihood-ratio test (in our experiments significance threshold was set to $p=0.05$). + \item A conjunct can only specify the presence of a pattern: we allow feature-value pairs with only $true$ as value. +\end{enumerate} +\noindent The former two constraints are needed to induce good rules with significant patterns, while the latter constraint assures that rules mention only presence (and not absence) of patterns as reasons for a program to be incorrect. This is important, since these conditions ought to contain frequent patterns symptomatic of incorrect programs to be used by the hint generation procedure. -With respect to the second type of error, we could try the same approach and learn rules for the class of correct programs. Having accurate rules for correct programs, the conditional part of these rules would define sufficient groups of patterns that render a program correct. However, it turns out that it is difficult to learn accurate rules for correct programs, since these rules should contain all relevant patterns and prevent incorrect patterns, yet a conjunct can only specify the presence of a pattern. If specifying absence of patterns was allowed in rules' condition, the learning problem would still be difficult, since usually there are many incorrect patterns. A possible way to solve this problem is to learn from data set not containing programs that are covered by rules for incorrect class. This way all known incorrect patterns are removed from the data and are not needed anymore in conditions of rules. However, removing incorrect patterns also removes the need for relevant patterns. For example, if all incorrect programs were removed, the rule “$\mathsf{true} ⇒ \mathsf{correct}$” would suffice. Such rule does not contain any relevant patterns and could not be used to generate hints. We achieved the best results by learning from both data sets: the original data set (with all programs) is used to learn rules, while only programs without buggy patterns are used to test whether a rule achieves the required classification accuracy (90\%). +With respect to the second type of error, we could try the same approach and learn rules using the above algorithms for the class of correct programs (C-rules). Having accurate rules for correct programs, the conditional part of these rules would define sufficient combinations of patterns that render a program correct. However, it turns out that it is difficult to learn accurate rules for correct programs, since these rules would have to contain all patterns that need to be in the program and would have to specify the absence of all incorrect patterns, yet a conjunct can only specify the presence of a pattern. If specifying absence of patterns was allowed in rules' condition, the learning problem would still be difficult, since there can be many incorrect patterns, resulting in many C-rules. -Figure~\ref{figure:algorithm} contains an outline of the algorithm. The rules describing -incorrect programs are called I-rules and the rules for correct programs are called C-rules. +A possible way to solve this problem is to remove programs that are covered by rules for incorrect class. This way all known buggy patterns are removed from the data, therefore will not be included in C-rules. However, removing incorrect patterns also removes the need for relevant patterns. For example, if all incorrect programs were removed, the a single C-rule “$\mathsf{true} ⇒ \mathsf{correct}$” would suffice, which cannot be used to generate hints. We achieved the best results by learning from both data sets: we use the original data set (with all programs) to learn rules, while we use the data set without buggy patterns to test whether a rule achieves the required classification accuracy (90\%). -Even though our main interest is discovery of patterns, we can still use induced rules to classify new programs, for example to evaluate the quality of rules. The classification procedure has three steps: if an I-rule covers the program, classify it as incorrect; if a C-rule covers the program, classify it as correct; otherwise, if no rule covers the program, classify it as incorrect. +Even though our main interest is discovery of patterns, we can still use induced rules to classify new programs, for example to evaluate the quality of rules. The classification procedure has three steps: 1) if an I-rule covers the program, classify it as incorrect; 2) else if a C-rule covers the program, classify it as correct; 3) otherwise, if no rule covers the program, classify it as incorrect. \subsection{Generating hints} |