TITLE: A New Demo Compiler for My Course
AUTHOR: Eugene Wallingford
DATE: June 29, 2017 4:03 PM
DESC:
-----
BODY:
Over the last couple of weeks, I have spent a few minutes most
afternoons writing a little code. It's been the best part of
my work day. The project was a little compiler.
One of the first things we do in my compiler course is to study
a small compiler for a couple of the days. This is a nice way
to introduce the stages of a compiler and to raise some of the
questions that we'll be answering over the course of the
semester. It also gives students a chance to see the insides
of a working compiler before they write their own. I hope that
this demystifies the process a little: "Hey, a compiler really
is just a program. Maybe I can write one myself."
For the last decade or so, I have used a compiler called
acdc for this demo, based on Chapter 2 of
Crafting A Compiler
by Fischer, Cytron, and LeBlanc. ac is a small
arithmetic language with two types of numbers, sequences of
assignment statements, and print statements.
dc
is a stack-based desk calculator that comes as part of many
Unix installations. I whipped up a acdc compiler in
Java about a decade ago and have used it ever since. Both
languages have enough features to be useful as a demo but not
enough to overwhelm. My hacked-up compiler is also open to
improvements as we learn techniques throughout the course,
giving us a chance to use them in the small before students
applied them to their own project.
I've been growing dissatisfied with this demo for a while now.
My Java program feels heavy, with too many small pieces to be
simple enough for a quick read. It requires two full class
sessions to really understand it well, and I've been hoping to
shorten the intro to my course. ac is good, but it
doesn't have any flow control other than sequencing, which means
that it does not give us a way to look at assembly language
generation with jumps and backpatching. On top of all that, I
was bored with acdc; ten years is a long time to spend
with one program.
This spring I stumbled on a possible replacement in
The Fastest FizzBuzz in the West.
It defines a simple source language for writing
FizzBuzz
programs declaratively. For example:
1...150
fizz=3
buzz=5
woof=7
produces the output of a much larger program in other languages.
Imagine being able to pull this language during your next
interview for a software dev position!
This language is really simple, which means that a
compiler for it can be written in relatively few lines of code.
However, it also requires generating code with a loop and
if-statements, which requires thinking about branching
patterns in assembly language.
The "Fastest FizzBuzz" article uses a Python parser generator to
create its compiler. For my course, I want something that my
students can read with only their knowledge coming into the course,
and I want the program to be transparent enough so that they can
see directly how each stage works and how it interacts with other
parts of the compiler.
I was also itching to write a program, so I did.
I wrote my compiler in Python. It performs a simple scan of the
source program, taking as much advantage of the small set of simple
tokens as possible. The parser works by recursive descent, which
also takes advantage of the language's simple structure. The type
checker makes sure the numbers all make sense and that the words
are unique. Finally, to make things even simpler, the code
generator produces an executable Python 3.4 program.
I'm quite hopeful about this compiler's use as a class demo. It is
simple enough to read in one sitting, even by students who enter the
course with weaker programming skills. Even so, the language can
also be used to demonstrate the more sophisticated techniques we
learn throughout the course. Consider:
- Adding comments to the source language overwhelms the
ad hoc approach I use in the scanner, motivating
the use of a state machine.
- While the parser is easily handled by recursive descent,
the language is quite amenable to a simple table-driven
approach, too. The table-driven parser will be simple
enough that students can get the hang of the technique
with few unnecessary details.
- The type checker demonstrates walking an abstract syntax
tree without worrying about too many type rules. We can
focus our attention on type systems when dealing with the
more interesting source language of their project.
- The code generator has to deal with flow of control, which
enables us to learn assembly language generation on a
smaller scale without fully implementing code to handle
function calls.
So this compiler can be a demo in the first week of the course
and also serve as a running example throughout.
We'll see how well this plays in class in a couple of months. In
any case, I had great fun ending my days the last two weeks by
firing up emacs or IDLE and writing code. As a bonus, I used this
exercise to improve my git skills, taking them beyond the small
set of commands I have used on my academic projects in the past.
(git rebase -i is almost my friend now.) I also wrote
more pyunit tests than I have written in a long, long
time, which reminded me of some of the challenges students face
when trying to test their code. That should serve me well in the
fall, too.
I do like writing code.
-----