aied2018/introduction.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

\section{Introduction}

Providing feedback to students is among the most time-consuming tasks when teaching programming. In large courses with hundreds of students, feedback is therefore often limited to automated program testing. While test cases can reliably determine whether a program is correct or not, they cannot easily be associated with specific errors in the code.

Prompt, specific feedback could however greatly benefit learning. Furthermore, analyzing the problems students have with programming exercises can allow teachers to improve the course. The main obstacle to such analysis is the large variability of student submissions: even for the simplest tasks, students can submit tens of thousands of different programs~\cite{huang2013syntactic,piech2015autonomously}.

Several attempts have been made to automatically discover commonalities in a set of programs~\cite{jin2012program,rivers2015data-driven,nguyen2014codewebs,hovemeyer2016control}. This would allow a teacher to annotate a representative subset of submissions with feedback messages, which could then be automatically propagated to similar programs. These techniques are used for instance by the OverCode tool to visualize variations in student programs~\cite{glassman2015overcode}.

This paper presents a new language for describing patterns in student code. Our approach is based on \emph{tree regular expressions} (TREs) used in natural language processing~\cite{levy2006tregex}. TREs are similar to ordinary regular expressions: they allow us to specify important patterns in a program’s abstract syntax tree (AST) while disregarding irrelevant parts. We found that TREs are sufficiently expressive to represent various concepts and errors in novice programs.

We have previously demonstrated this approach with Prolog programs~\cite{lazar2017automatic}. Here we refine the definition of AST patterns, and show that they can be applied to Python -- representing a different programming paradigm -- with only a few language-specific modifications. We also demonstrate that rules learned from such patterns can be easily interpreted.

In CodeWebs, Nguyen et al. look for functionally equivalent AST subtrees that perform the same transformation from inputs to outputs~\cite{nguyen2014codewebs}. Piech et al. learn program embeddings to a similar end~\cite{piech2015learning}. Our work differs from theirs mainly in that TREs allow us to describe relations in (potentially disjoint) subparts of the program. We induce rules based on binary correct/incorrect labels, and do not have to execute any submission more than once.

Jin et al. use linkage graphs to represent dependencies between individual program statements based on the variables they contain~\cite{jin2012program}. This allows them to form equivalence classes of programs by ignoring unrelated statements. While TREs can encode such attributes, we use smaller patterns that encode dependencies between pairs of variables. Hovemeyer et al. focus on control structures such as loops and conditionals~\cite{hovemeyer2016control}. Our AST patterns include TREs that describe control flow, expressing the same features.

When comparing programs we must account for superficial differences such as different naming schemes. Rivers et al. canonicalize student programs using equivalency-preserving transformations (renaming variables, reordering binary expressions, inlining functions and so on)~\cite{rivers2015data-driven}. We use their canonicalization as a preprocessing step before extracting patterns.

The next section describes TREs and AST patterns, and gives a brief evaluation to show that patterns can discriminate between correct and incorrect programs. Section~\ref{sec:rules} describes our modified version of the CN2 rule-learning algorithm, and analyzes student solutions to two programming exercises in terms of such rules. The last section outlines the directions for our future work.