diff options
-rw-r--r-- | paper/evaluation.tex | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/paper/evaluation.tex b/paper/evaluation.tex index 077b684..8e768b1 100644 --- a/paper/evaluation.tex +++ b/paper/evaluation.tex @@ -59,7 +59,7 @@ when compared to buggy hints: in the case of problem \code{sister} 84 out of 127 The last column shows the number of submissions where no hints could be generated. This value is relatively high for the \code{is\_sorted} problem, because the algorithm could not learn any positive rules and thus no intent hints were generated. -To sum up, buggy hints seem to be good and reliable, since they are always implemented when presented, even when we tested them on past data -- the decisions of students were not influenced by these hints. The percentage of implemented intent hints is, on average, lower (56\%), which is still not a bad result, providing that it is difficult to determine the programmer’s intent. In 12\% (244 out 2057) of generated intent hints, students implemented an alternative hint that was identified by our algorithm. Overall we were able to generate hints for 84.5\% of incorrect submissions. Of those hints, 86\% were implemented (73\% of all incorrect submissions). +To sum up, buggy hints seem to be good and reliable, since they are always implemented when presented, even when we tested them on past data -- the decisions of students were not influenced by these hints. The percentage of implemented intent hints is, on average, lower (56\%), which is still not a bad result, providing that it is difficult to determine the programmer’s intent. In 12\% (244 out of 2057) of generated intent hints, students implemented an alternative hint that was identified by our algorithm. Overall we were able to generate hints for 84.5\% of incorrect submissions. Of those hints, 86\% were implemented (73\% of all incorrect submissions). High classification accuracies for many problems imply that it is possible to determine program correctness simply by checking for the presence of a small number of patterns. Our hypothesis is that for each program certain crucial patterns exist that students have difficulties with. When they figure out these patterns, implementing the rest of the program is usually straightforward. |