TITLE: Basic Concepts: The Unity of Data and Program AUTHOR: Eugene Wallingford DATE: February 06, 2007 10:31 PM DESC: ----- BODY: I remember vividly a particular moment of understanding that I experienced in graduate school. As I mentioned last time, I was studying knowledge-based systems, and one of the classic papers we read was William Clancey's Heuristic Classification. This paper described an abstract decomposition for classification programs, the basis of diagnostic systems, that was what we today would call a pattern. It gave us the prototype against which we could pattern our own analysis of problem-solving types. In this paper, Clancey discussed how configuration and planning are two points of view on the same problem, design. A configuration system produces as output an artifact capable of producing state changes in some context; a planning system takes such an artifact as input. A configuration takes as input a sequence of desired state changes, to be produced by the configured system; a planning system produces a sequence of operations that produces desired state changes in the given artifact. Thus, the same kind of system could produce a thing, an artifact, or a process that creates an artifact. In a certain sense, things and processes were the same kind of entity. Wow. Design and planning systems could be characterized by similar software patterns. I felt like I'd been shown a new world. Later I learned that this interplay between thing and process ran much deeper. Consider this Lisp (or Scheme) "thing", a data value known as a list:
(a 1 2)
If I replace the symbol "a" with the symbol "+", I also have a Lisp list of size 3:
(+ 1 2)
But this Lisp list is also the Lisp program for computing the sum of 1 and 2! If I give this program to a Lisp interpreter, I will see the result:
> (+ 1 2)
3
In Lisp, there is no distinction between data and program. Indeed, this is true for C, Java, or any other programming language. But the syntax of Lisp (and especially Scheme) is so simple and uniform that the unity of data and program stands out starkly. It also makes Scheme a natural language to use in a course on the principles of programming languages. The syntax and semantics of Lisp programs are so uniform that one can write a Lisp interpreter in about a page of Lisp code. (If you'd like, take a look at my implementation of John McCarthy's Lisp-in-Lisp, in Scheme, based on Paul Graham's essay The Roots of Lisp. If you haven't read that paper, please do soon.) There is no distinction between data and program. This is one of the truly beautiful ideas in computer science. It runs through everything that we do, from von Neumann's stored program computer, itself to the implementation of a virtual machine for Java to run inside a web browser. A related idea is the notion that programs can exist at arbitrary levels of abstraction. For each level at which a program is data to another program, there is yet another program whose behavior is to produce that data. An assembler produces machine language from assembly language. One of the lessons of computer science is that "machine" is an abstract idea. Everything can be interpreted by someone -- or something -- else. I don't know enough of the history of mathematics or natural philosophy to say to what extent these ideas are computer science's contributions to our body of knowledge. On the one hand, I'm sure that deep thinkers throughout history at least had reason and resource to make some of the connections between thing and process, between design and planning. On the other, I imagine that before we had digital computers at our disposal, we probably didn't have sufficient vocabulary or the circumstances needed to explore issues of natural language to the level of program versus data, or of languages being processed from abstract to concrete, down to the details of a particular piece of hardware. Church. Turing. Chomsky. McCarthy. These are the men who discovered the fundamental truths of language, data, and program, and who laid the foundations of our discipline. At first, I wondered why hadn't I learned this small set of ideas as an undergraduate student. In retrospect, I'm not surprised. My alma mater's CS program was aimed at applications programming, taught a standard survey-style programming languages course, and didn't offer a compilers course. Whatever our students here learn about the practical skills of building software, I hope that they also have the chance to learn about some of the beautiful ideas that make computer science an essential discipline in the science of the 21st century. -----