TITLE: SIGCSE Day 2 -- Al Aho on Teaching Compiler Construction
AUTHOR: Eugene Wallingford
DATE: March 24, 2010 7:42 PM
DESC:
-----
BODY:
[A transcript of the
SIGCSE 2010
conference:
Table of Contents]
Early last year, I wrote a
blog entry
about using idea's from Al Aho's article,
Teaching the Compilers Course,
in the
most recent offering
of my course. When I saw that Aho was speaking at SIGCSE,
I knew I had to go. As
Rich Pattis
told me in the hallway after the talk, when you get a chance
to hear certain people speak, you do. Aho is one of those
guys. (For me, so is Pattis.)
The talk was originally scheduled for Thursday, but persistent
fog over southeast Wisconsin kept several people from arriving
at the conference on time, including Aho. So the talk was
rescheduled for Fri. I still had to see it, of course, so I
skipped the attention-grabbing "If you ___, you might be a
computational thinker".
Aho's talk covered much of the same ground as his inroads
paper, which gave me the luxury of being able to listen more
closely to his stories and elaborations than to the details.
The talk did a nice job of putting the compiler course into
its historical context and tried to explain why we might well
teach a course very different -- yet in many ways similar --
to the course we taught forty, twenty-five, or even ten years
ago.
He opened with lists of the top ten programming languages in
1970 and 2010. There was no overlap, which introduced Aho's
first big point: the landscape of programming languages has
changes in a big way since the beginning of our discipline,
and there have been corresponding changes in the landscape of
compilers. The dimensions of change are many: the number of
languages, the diversity of languages, the number and kinds
of applications we write. The growth in number and diversity
applies not only to the programming languages we use, which
are the source language to a compiler, but also to the target
machines and the target languages produced by compilers.
From Aho's perspective, one of the most consequential changes
in compiler construction has been the rise of massive compiler
collections such as
gcc
and
LLVM.
In most environments, writing a compiler is no longer a matter
of "writing a program" as much a software engineering exercise:
work with a large existing system, and add a new front end or
back end.
So, what should we teach? Syntax and semantics are fairly
well settled as matter of theory. We can thus devote time to
the less mathematical parts of the job, such as the art of
writing grammars. Aho noted that in the 2000s, parsing
natural languages is mostly a statistical process, not a
grammatical one, thanks to massive databases of text and easy
search. I wonder if parsing programming languages will ever
move in this direction... What would that mean in terms of
freer grammar, greater productivity, or confusion?
With the availability of modern tools, Aho advocates an agile
"grow a language" approach. Using lex and yacc, students can
quickly produce a compiler in approximately 20 lines of code.
Due to the nature of syntax-directed translation, which is
closely related to
structural recursion,
we can add new productions to a grammar with relative ease.
This enables us to start small, to experiment with different
ideas.
The
Dragon book
circa 2010 adds many new topics to its previous editions.
It just keeps getting thicker! It covers much more material,
both breadth and depth, than can be covered in the typical
course, even with graduate students. This gives instructors
lots of leeway in selecting a subset around which to build a
course. The second edition already covers too much material
for my undergrad course, and without enough of the examples
that many students need these day. We end up selecting such
a small subset of the material that the price of the book is
too high for the number of pages we actually used.
The meat of the talk matched the meat of his paper: the
compiler course he teaches these days. Here are a few
tidbits.
On the Design of the Course
- Aho claims that, through all the years, every team
has delivered a working system. He attributes this
to experience teaching the course and the support they
provide students.
- Each semester, he brings in at least one language
designer in as a guest speaker, someone like Stroustrup
or Gosling. I's love to do this but don't have quite
the pull, connections, or geographical advantage of Aho.
I'll have to be creative, as I was the last time I taught
agile software development and arranged a phone conference
with
Ken Auer.
- Students in the course become experts in one language:
the one they create. They become much more knowledgable
in several others: the languages they to to write, build,
and test their compiler.
On System Development
- Aho sizes each student project at 3,000-6,000 LOC. He
uses Boehm's model to derive a team size of 5, which fits
nicely with his belief that 5 is the ideal team size.
- Every team member must produce at least 500 lines of code
on the project. I have never had an explicit rule about
this in the past, but experience in my last two courses
with team projects tells me that I should.
- Aho lets teams choose their own technology, so that they
can in the way that makes them most comfortable. One
serendipitous side effect of this choice is that requires
him to stay current with what's going on in the world.
- He also allows teams to build interpreters for complex
languages, rather than full-blown compilers. He feels that
the details of assembly language get in the way of other
important lessons. (I have not made that leap yet.)
On Language Design
- One technique he uses to scope the project is to require
students to identify an essential core of their language
along with a list of extra features that they will implement
if time permits. In 15 years, he says, no team has ever
delivered an extra feature. That surprises me.
- In order to get students past the utopian dream of a perfect
language, he requires each team to write two or three
programs in their language to solve representative problems
in the language's domain. This makes me think of test-first
design -- but of the language, not the program!
- Aho believes that students come to appreciate our current
languages more after designing a language and grappling
with the friction between dreams and reality. I think this
lesson generalizes to most forms of design and creation.
I am still thinking about how to allow students to design their
own language and still have the time and energy to produce a
working system in one semester. Perhaps I could become more
involved early in the design process, something Aho and his suite
of teaching assistants can do, or even lead the design conversation.
On Project Management
- "A little bit of process goes a long way" toward successful
delivery and robust software. The key is finding the proper
balance between too much process, which stifles developers,
and too little, which paralyzes them.
- Aho has experimented with different mechanisms for organizing
teams and selecting team leaders. Over time, he has found it
best to let teams self-organize. This matches my experience
as well, as long as I keep an eye out for obviously bad
configurations.
- Aho devotes one lecture to project management, which I need
to do again myself.
Covering more content
is a siren that scuttles more student learning than it buoys.
~~~~
Aho peppered his talk with several reminiscences. He told a short
story about lex and how it was extended with regular expressions
from egrep by Eric Schmidt, Google CEO. Schmidt worked for Aho as
a summer intern. "He was the best intern I ever had." Another
interesting tale recounted one of his doctoral student's effort
to build a compiler for a quantum computer. It was interesting,
yes, but I need to learn more about quantum computing to really
appreciate it!
My favorite story of the day was about
awk,
one of Unix's great little languages. Aho and his colleagues
Weinberger and Kernighan wrote awk for their own simple data
manipulation tasks. They figured they'd use it to write
throwaway programs of 2-5 lines each. In that context, you can
build a certain kind of language and be happy. But as Aho said,
"A lot of the world is data processing." One day, a
colleague came in to his office, quite perturbed at a bug he
had found. This colleague had written a 10,000-line
awk program to do computer-aided design. (If you have written
any awk, you know just how fantabulous this feat is.) In a
context where 10K-line programs are conceivable, you want a
very different sort of language!
The awk team fixed the bug, but this time they "did it right".
First, they built a regression test suite. (Agile Sighting 1:
continuous testing.) Second, they created a new rule. To
propose a new language feature for awk, you had to produce
regression tests for it first. (Agile Sighting 2: test-first
development.) Aho has built this lesson into his compiler
course. Students must write their compiler test-first and
instrument their build environments to ensure that the tests
are run "all of the time". (Agile Sighting 3: continuous
integration.)
An added feature of Aho's talk over his paper was three short
presentations from members of a student team that produced
PixelPower, a language which extends C to work with a
particular graphics library. They shared some of the valuable
insights from their project experience:
- They designed their language to have big overlap with C.
This way, they had an existing compiler that they
understood well and could extend.
- The team leader decided to focus the team, not try to make
everyone happy. This is a huge lesson to learn as soon as
you can, one the students in my last compiler course learned
perhaps a bit too late. "Getting things done," Aho's
students said, "is more important than getting along."
- The team kept detailed notes of all their discussions and
all their decisions. Documentation of process is in many
ways much more important than documentation of code, which
should be able to speak for itself. My latest team used
a wiki for this purpose, which was a good idea they had
early in the semester. If anything, they learned that they
should have used it more frequently and more extensively.
One final note to close this long report. Aho had this to say
about the success of his course:
If you make something a little better each semester, after a
while it is pretty good. Through no fault of my own this course
is very good now.
I think Aho's course is good precisely because he adopted this
attitude about its design and implementation. This attitude
serves us well when designing and implementing software, too:
Many iterations. Lots of feedback. Collective ownership of
the work product.
An hour well spent.
-----