TITLE: Thinking Out Loud about the Compiler in a Pure OO World AUTHOR: Eugene Wallingford DATE: March 18, 2012 5:20 PM DESC: ----- BODY: John Cook pokes fun at OO practice in his blog post today. The "Obviously a vast improvement." comment deftly ignores the potential source of OOP's benefits, but then that's the key to the joke. A commenter points to a blog entry by Smalltalk veteran Travis Griggs. I agree with Griggs's recommendation to avoid using verbs-turned-into-nouns as objects, especially lame placeholder words such as "manager" and "loader". As he says, they usually denote objects that fiddle with the private parts of other objects. Those other objects should be providing the services we need. Griggs allows reasonable linguistic exceptions to the advice. But he also acknowledges the pull of pop culture which, given my teaching assignment this semester, jumped out at me:

There are many 'er' words that despite their focus on what they do, have become so commonplace, that we're best to just stick with them, at least in part. Parser. Compiler. Browser.

I've thought about this break in my own OO discipline before, and now I'm thinking about it again. What would it be like to write compiles without creating parsers and code generators -- and compilers themselves -- as objects? We could ask a program to compile itself:

     program.compileTo( targetMachine )

But is the program a program, or does it start life as a text file? If the program starts as a text file, perhaps we say

  program.parse()

to create an abstract syntax tree, which we then could ask

  ast.compileTo( targetMachine )

(Instead of sending a parse() message, we might send an asAbstractSyntax() message. There may be no functional difference, but I think the two names indicate subtle differences in mindset.) When my students write their compilers in Java or another OO language, we discuss in class whether abstract syntax trees ought to be able to generate target code for themselves. The danger lies in binding the AST class to the details of a specific target machine. We can separate the details of the target machine for the AST by passing an argument with the compileTo() message, but what? Given all the other things we have to learn in the course, my students usually end up following Griggs's advice and doing the traditional thing: pass the AST as an argument to a CodeGenerator object. If we had more time, or a more intensive OO design course prior to the compilers course, we could look at techniques that enable a more OO approach without making things worse in the process. Looking back farther to the parse behavior, would it ever make sense to send an argument with the parse() message? Perhaps a parse table for an LL(1) or LR(1) algorithm? Or the parsing algorithm itself, as a strategy object? We quickly run the risk of taking steps in the direction that Cook joshes about in his post. Or perhaps parsing is a natural result of creating a Program object from a piece of text. In that approach, when we say

     Program source = new Program( textfile );

the internal state of source is an abstract syntax tree. This may sound strange at first, but a program isn't really a piece of text. It's just that we are conditioned to think that way by the languages and tools most of us learn first and use most heavily. Smalltalk taught us long ago that this viewpoint is not necessary. (Lisp, too, though in a different way.) These are just some disconnected thoughts on a Sunday afternoon. There is plenty more to say, and plenty of prior art. I think I'll fire up a Squeak image sometime soon and spend some time reacquainting myself with its take on parsing and code generation, in particular the way it compiles its core out to C. I like doing this kind of "what if?" thinking. It's fun to explore along the boundary between the land where OOP works naturally and the murky territory where it doesn't seem to fit so well. That's a great place to learn new stuff. -----