TITLE: Intermediate Representations and Life Beyond the Compiler AUTHOR: Eugene Wallingford DATE: April 03, 2012 4:00 PM DESC: ----- BODY: In the simplest cases, a compiler can generate target code directly from the abstract syntax tree: generating target code directly from the abstract syntax tree

generating target code directly from the abstract syntax tree

In many cases, though, there are good reasons why we don't want to generate code for the target machine immediately. One is modularity. A big part of code generation for a particular target machine is machine-dependent. If we write a monolithic code generator, then we will have to reimplement the machine-independent parts every time we want to target a new machine. Even if we stick with one back-end architecture, modularity helps us. Not all of the machine-dependent elements of code generation depend in the same way on the machine. If we write a monolithic code generator, then any small change in the target machine -- perhaps even a small upgrade in the processor -- can cause changes throughout our program. If instead we write a modular code generator, with modules that reflect particular shear layers in the generation process, a lá How Buildings Learn, then we may be able to contain changes in target machine specification to an easily identified subset of modules. So, more generally we think of code generation in two parts:

one or more machine-independent transformations from an abstract syntax tree to intermediate representations of the program, followed by

one or more machine-dependent transformations from the final intermediate representation to machine code.

Intermediate representations between the abstract syntax tree and assembly code have other advantages, too. In particular, they enable us to optimize code in machine-independent ways, without having to manipulate a complex target language. In practice, an intermediate representation sometimes outlives the compiler for which it was created. Chris Clark described an example of this phenomenon in Build a Tree--Save a Parse:

Sometimes the intermediate language (IL) takes on a life of its own. Several systems that have multiple parsers, sophisticated optimizers, and multiple code generators have been developed and marketed commercially. Each of these systems has its own common virtual assembly language used by the various parsers and code generators. These intermediate languages all began connecting just one parser to one code generator. P-code is an example IL that took on a life of its own. It was invented by Nicklaus Wirth as the IL for the ETH Pascal compiler. Many variants of that compiler arose [Ne179], including the USCD Pascal compiler that was used at Stanford to define an optimizer [Cho83]. Chow's compiler evolved into the MIPS compiler suite, which was the basis for one of the DEC C compilers -- acc. That compiler did not parse the same language nor use any code from the ETH compiler, but the IL survived.

Good language design usually pays off, sometimes in unexpected ways. (If you like creating languages and writing language processors, Clark's paper is worth a read!) -----