TITLE: On the Virtues of a Small Source Language in the Compiler Course AUTHOR: Eugene Wallingford DATE: May 05, 2012 11:53 AM DESC: ----- BODY: I have not finished grading my students' compilers yet. I haven't even looked at their public comments about the project. (Anonymous feedback comes later in the summer when course assessment data arrives.) Still, one lesson has risen to the surface: Keep the source language small. No, really. I long ago learned the importance of assigning a source language small enough to be scanned, parsed, and translated completely in a single semester. Over the years, I had pared the languages I assigned down to the bare essentials. That leaves a small language, one that creates some fun programming challenges. But it's a language that students can master in fifteen weeks. My students this term were all pretty good programmers, and I am a weak man. So I gave in to the temptation to add just a few of more features to the language, to make it a bit more interesting for my students: variables, an assignment statement, a sequence construct, and a single loop form. It was as if I had learned nothing from all my years teaching this course. The effect of processing a larger language manifested itself in an expected way: the more students have to do, the more likely that they won't get it all done. This affected a couple of the teams more than the others, but it wasn't so bad. It meant that some teams didn't get as far along with function calls and with recursion than we had hoped. Getting a decent subset of such a language running is still an accomplishment for students. But the effect of processing a larger language manifested itself in a way I did not expect, too, one more detrimental to student progress: a "hit or miss" quality to the correctness of their implementations. One team had function calls mostly working, but not recursion. Another team had tail recursion mostly working(!), but ordinary function calls didn't work well. One team had local vars working fine but not global variables, while most teams knocked out globals early and, if they struggled at all, it was with locals. The extra syntactic complexity in the language created a different sort of problems for the teams. While a single new language feature doesn't seem like too much in isolation, but it interacts with all the existing features and all the other new features to create a much more complex language for the students to understand and for the parser to recognize and get right. Sure, our language had regular tokens and a context-free grammar, which localizes the information the scanner and parser need to see in order to do their jobs. Like all of us, though, students make errors when writing their code. In the more complex space, it is harder to track down the root cause of an error, especially when there are multiple errors present and complicate the search. (Or should I say complect?) This is an important lesson in language design more generally, especially for a language aimed at beginners. But it also stands out when a compiler for the language is being written by beginning compiler writers. I am chastened and will return to the True Path of Small Language the next time I teach this course. -----