* Introduction (1) :Timotej: ** Generics ITSs are useful, but expensive to create. Internet increases the reach of ITSs (think MOOC) and allows us to gather a lot of data. The challenge is to extract useful information from this data to improve feedback to students. - koedinger2013new ** Programming ITS A particularly challenging domain: - no meaningful “solutions steps” - no easily discernible and independent “program features” ** Problem Find good features to describe student programs that: - are automatically learnable - can serve as a basis for generating hints ** Contributions A pattern language for describing program features. Rules for classifying correct/incorrect Prolog programs. * Background (2) :Timotej: Model-tracing is not well-suited for programming tutors. - Hint Factory (Barnes & Stamper) for programming (Jin, Rivers) - student traces are basically insane anyway, so learning from them is not a good idea (Piech 2015) Instead we focus on analyzing individual submissions - CBM (maybe just /like/ CBM?). - why: human tutors are usually able to comment without referring to previous program versions Analyzing program based on formal descriptions (usually defined manually): - plans (Johnson) - techniques (Brna/Huang/Le) - strategies (Gerdes) Comparing programs directly / after canonicalization (Rivers 2015) - not easy to determine which differences are crucial Defining “programming attributes/features”: - AST subtrees (Nguyen 2014) - linkage graphs (Jin 2012) - n-grams or other purely textual attributes - mostly used for plagiarism (MOSS) / code clone detection Tgrep / trx for structural patterns in NLP - use the same for describing code! * Dataset (3) :Timotej: Solution traces ** Motivating example: ancestor/2 - show and explain correct / buggy program - examples of patterns & rules - buggy rules ≈ error classes - correct rules ≈ program strategies or “plans” → determine intent - how correct / buggy rules can be used to automatically determine intent and generate feedback - using rules to assist authoring tutors ** Patterns Programming is all about manipulating data (variables/literals). Patterns encode (only) relations between data nodes in an AST, and are more granular than subtrees. Patterns abstract away some information Algorithm for extracting patterns. - which patterns are considered * Method (2) :Martin: Rule learning. - first learn correct, then buggy rules Automatic hint generation. * Evaluation (3) :Martin:Timotej: Example: describe patterns/rules for ancestor/2 (and/or another domain). - example trace CA/AUC for rules/RF/…. Using e.g. RF with patterns for attributes achieves a high classification accuracy for correct/incorrect programs. Rules are less accurate, but can be used for hints (how can we improve this)? Simulated hints: for how many submissions would we suggest a hint that was ultimately implemented in the final solution? Hints from buggy rules are almost always implemented; correct rules are less certain. * Conclusions / future work (1) - syntactic variations (e.g. using = instead of implicit matching) - combine with canonicalization - ABML for rule learning - integrate into a tutor (CodeQ / CloudCoder) to test in situ * Misc Problems: - predicates with similar clauses (e.g. union/3)