TITLE: SIGCSE Day 2 -- Al Aho on Teaching Compiler Construction AUTHOR: Eugene Wallingford DATE: March 24, 2010 7:42 PM DESC: ----- BODY:

[A transcript of the SIGCSE 2010 conference: Table of Contents]

Early last year, I wrote a blog entry about using idea's from Al Aho's article, Teaching the Compilers Course, in the most recent offering of my course. When I saw that Aho was speaking at SIGCSE, I knew I had to go. As Rich Pattis told me in the hallway after the talk, when you get a chance to hear certain people speak, you do. Aho is one of those guys. (For me, so is Pattis.) The talk was originally scheduled for Thursday, but persistent fog over southeast Wisconsin kept several people from arriving at the conference on time, including Aho. So the talk was rescheduled for Fri. I still had to see it, of course, so I skipped the attention-grabbing "If you ___, you might be a computational thinker". Aho's talk covered much of the same ground as his inroads paper, which gave me the luxury of being able to listen more closely to his stories and elaborations than to the details. The talk did a nice job of putting the compiler course into its historical context and tried to explain why we might well teach a course very different -- yet in many ways similar -- to the course we taught forty, twenty-five, or even ten years ago. He opened with lists of the top ten programming languages in 1970 and 2010. There was no overlap, which introduced Aho's first big point: the landscape of programming languages has changes in a big way since the beginning of our discipline, and there have been corresponding changes in the landscape of compilers. The dimensions of change are many: the number of languages, the diversity of languages, the number and kinds of applications we write. The growth in number and diversity applies not only to the programming languages we use, which are the source language to a compiler, but also to the target machines and the target languages produced by compilers. From Aho's perspective, one of the most consequential changes in compiler construction has been the rise of massive compiler collections such as gcc and LLVM. In most environments, writing a compiler is no longer a matter of "writing a program" as much a software engineering exercise: work with a large existing system, and add a new front end or back end. So, what should we teach? Syntax and semantics are fairly well settled as matter of theory. We can thus devote time to the less mathematical parts of the job, such as the art of writing grammars. Aho noted that in the 2000s, parsing natural languages is mostly a statistical process, not a grammatical one, thanks to massive databases of text and easy search. I wonder if parsing programming languages will ever move in this direction... What would that mean in terms of freer grammar, greater productivity, or confusion? With the availability of modern tools, Aho advocates an agile "grow a language" approach. Using lex and yacc, students can quickly produce a compiler in approximately 20 lines of code. Due to the nature of syntax-directed translation, which is closely related to structural recursion, we can add new productions to a grammar with relative ease. This enables us to start small, to experiment with different ideas. The Dragon book circa 2010 adds many new topics to its previous editions. It just keeps getting thicker! It covers much more material, both breadth and depth, than can be covered in the typical course, even with graduate students. This gives instructors lots of leeway in selecting a subset around which to build a course. The second edition already covers too much material for my undergrad course, and without enough of the examples that many students need these day. We end up selecting such a small subset of the material that the price of the book is too high for the number of pages we actually used. The meat of the talk matched the meat of his paper: the compiler course he teaches these days. Here are a few tidbits. On the Design of the Course On System Development On Language Design I am still thinking about how to allow students to design their own language and still have the time and energy to produce a working system in one semester. Perhaps I could become more involved early in the design process, something Aho and his suite of teaching assistants can do, or even lead the design conversation. On Project Management ~~~~ Aho peppered his talk with several reminiscences. He told a short story about lex and how it was extended with regular expressions from egrep by Eric Schmidt, Google CEO. Schmidt worked for Aho as a summer intern. "He was the best intern I ever had." Another interesting tale recounted one of his doctoral student's effort to build a compiler for a quantum computer. It was interesting, yes, but I need to learn more about quantum computing to really appreciate it! My favorite story of the day was about awk, one of Unix's great little languages. Aho and his colleagues Weinberger and Kernighan wrote awk for their own simple data manipulation tasks. They figured they'd use it to write throwaway programs of 2-5 lines each. In that context, you can build a certain kind of language and be happy. But as Aho said, "A lot of the world is data processing." One day, a colleague came in to his office, quite perturbed at a bug he had found. This colleague had written a 10,000-line awk program to do computer-aided design. (If you have written any awk, you know just how fantabulous this feat is.) In a context where 10K-line programs are conceivable, you want a very different sort of language! The awk team fixed the bug, but this time they "did it right". First, they built a regression test suite. (Agile Sighting 1: continuous testing.) Second, they created a new rule. To propose a new language feature for awk, you had to produce regression tests for it first. (Agile Sighting 2: test-first development.) Aho has built this lesson into his compiler course. Students must write their compiler test-first and instrument their build environments to ensure that the tests are run "all of the time". (Agile Sighting 3: continuous integration.) An added feature of Aho's talk over his paper was three short presentations from members of a student team that produced PixelPower, a language which extends C to work with a particular graphics library. They shared some of the valuable insights from their project experience: One final note to close this long report. Aho had this to say about the success of his course:
If you make something a little better each semester, after a while it is pretty good. Through no fault of my own this course is very good now.
I think Aho's course is good precisely because he adopted this attitude about its design and implementation. This attitude serves us well when designing and implementing software, too: Many iterations. Lots of feedback. Collective ownership of the work product. An hour well spent. -----