TITLE: Intermediate Representations and Life Beyond the Compiler
AUTHOR: Eugene Wallingford
DATE: April 03, 2012 4:00 PM
DESC:
-----
BODY:
In the simplest cases, a compiler can generate target code
directly from the abstract syntax tree:
In many cases, though, there are good reasons why we don't
want to generate code for the target machine immediately.
One is modularity. A big part of code generation for a
particular target machine is machine-dependent. If we
write a monolithic code generator, then we will have to
reimplement the machine-independent parts every time we
want to target a new machine.
Even if we stick with one back-end architecture, modularity
helps us. Not all of the machine-dependent elements of
code generation depend in the same way on the machine. If
we write a monolithic code generator, then any small change
in the target machine -- perhaps even a small upgrade in the
processor -- can cause changes throughout our program. If
instead we write a modular code generator, with modules that
reflect particular shear layers in the generation process,
a lá
How Buildings Learn,
then we may be able to contain changes in target machine
specification to an easily identified subset of
modules.
So, more generally we think of code generation in two parts:
- one or more machine-independent transformations from an
abstract syntax tree to intermediate representations of
the program, followed by
- one or more machine-dependent transformations from the
final intermediate representation to machine code.
Intermediate representations between the abstract syntax tree
and assembly code have other advantages, too. In particular,
they enable us to optimize code in machine-independent ways,
without having to manipulate a complex target language.
In practice, an intermediate representation sometimes outlives
the compiler for which it was created. Chris Clark described
an example of this phenomenon in
Build a Tree--Save a Parse:
Sometimes the intermediate language (IL) takes on a life of
its own. Several systems that have multiple parsers,
sophisticated optimizers, and multiple code generators have
been developed and marketed commercially. Each of these
systems has its own common virtual assembly language used by
the various parsers and code generators. These intermediate
languages all began connecting just one parser to one code
generator.
P-code is an example IL that took on a life of its own.
It was invented by Nicklaus Wirth as the IL for the ETH Pascal
compiler. Many variants of that compiler arose [Ne179],
including the USCD Pascal compiler that was used at Stanford
to define an optimizer [Cho83]. Chow's compiler evolved into
the MIPS compiler suite, which was the basis for one of the
DEC C compilers -- acc. That compiler did not parse the same
language nor use any code from the ETH compiler, but the IL
survived.
Good language design usually pays off, sometimes in unexpected
ways.
(If you like creating languages and writing language processors,
Clark's paper is worth a read!)
-----