TITLE: Code Signatures in Lisp AUTHOR: Eugene Wallingford DATE: May 10, 2012 12:18 PM DESC: ----- BODY: Recently, @fogus tweeted:
I wonder if McCarthy had to deal with complaints of parentheses count in the earliest Lisps?
For me, this tweet immediately brought to mind Ward Cunningham's experiment with file signatures as an aid in browsing unfamiliar code, which he presented at a workshop on "software archeology" at OOPSLA 2001. In his experiment, Ward collapsed each file in the Java 1.3 source code distribution into a single line consisting of only braces, quotes, and semicolons. For example, the AWT class java.awt.peer.ComponentPeer looked like this:
    ;;;;;;;;{;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;} 
while java.awt.print.PageFormat looked like this:
    {;{;;}{;;};}{;;{;}{;};}{;;{;}{;};}{;{;;;;;;"";};}{;{;;;;;;"";};}{;{;}{;};}{;{;}{;};}{}{}{;}{;}{{;}{;}}{;}{;;;;;;}{}{;{;;;;;;;;;;;;;;;;;;;;;;};}}
As Ward said, it takes some time to get use to the "radical summarization" of files into such punctuation signatures. He was curious about how such a high-level view of a code base might help a programmer understand the regularities and irregularities in the code, via an interactive process of inspection and projection. Perhaps this came to mind as a result of experiences I had when I was first learning to program in Scheme. Coming from other languages with more syntax, I developed a bad habit of writing code like this:
    (define factorial
      (lambda (n)
        (if (zero? n)
            1
            (* n (factorial (- n 1)))
        )))
When real Scheme and Lisp programmers saw my code, they suggested that I put those closing parens at the end of the multiplication line. They were even more insistent when I dropped closing parens onto separate lines in the middle of a larger piece of code, say, with a let expression of several complex values. I objected that the line breaks helped me to see the structure of my code better. They told me to trust them; after I had more experience, I wouldn't need the help, and my code would be cleaner and more idiomatic. They were right. Eventually, I learned to read Scheme code more like real Schemers do. I found myself drawn to the densest parts of the code, in which those closing parens often played a role, and learned to see that that's where the action usually was. I think it was the connection between counting parentheses and the structure of code that brought to mind Ward's work. And then I wondered what it would be like to take the signature of Lisp or Scheme code in terms of its maligned primary punctuation, the parentheses? In a few spare minutes, I fiddled with the I idea. As an example, consider the following Lisp function, which is part of an implementation of CLOS written by Patrick Henry Winston and Berthold Horn to support their AI and Lisp textbooks:
    (defun call-next-method ()
      (if *around-methods*
          (apply (pop *around-methods*) *args*)
        (progn
          (do () ((endp *before-methods*))
            (apply (pop *before-methods*) *args*))
          (multiple-value-prog1
              (if *primary-methods*
	          (apply (pop *primary-methods*) *args*)
                (error "Oops, no applicable primary method!")) 
            (do () ((endp *after-methods*))
              (apply (pop *after-methods*) *args*))))))
Collapsing this function into a single line of parentheses results in:
    (()((())((()(())(()))(((())())(()(())(()))))))
The semicolons in Java code give the reader a sense of the code's length; collapsing Lisp in this way loses the line breaks. So I wrote another function to insert a | where the line breaks had been, which results in:
    (()|(|(())|(|(()(())|(()))|(|(|(())|())|(()(())|(()))))))
This gives a better idea of how the code plays out on the page, but it loses all sense of the code's structure, which is so important when reading Lisp. So I implemented a third signature, one that surrenders the benefit of a single line in exchange for a better sense of structure. This signature preserves leading white space and line breaks but otherwise gives us just the parentheses:
    (()
      (
          (())
        (
          (()(())
          (()))
          (
            (
           (())
               ())
        (()(())
          (()))))))
Interesting. It's almost art. I think there is a lot of room left to explore here in terms of punctuation. To capture the nature of Scheme and Lisp programs, we would probably want to include other characters, such as the hash, the comma, quotes, and backquotes. These would expose macro-related expressions to the human reader. To expand the experiment to include Clojure, we would of course want to include [] and {} in the signatures. I'm not an every-day Schemer, so I am not sure how much either the flat signatures or the structured signatures would help seasoned Lisp or Scheme programmers develop an intuitive sense of a function's size, complexity, and patterns. As Ward's experiment showed, the real value comes when signing entire files, and for that task flat signatures may have more appeal. It would be neat to apply this idea to a Lisp distribution of non-trivial size -- say, the full distribution of Racket or Clojure -- and see what might be learned. -----