TITLE: Studying Code Is More Like Natural Science Than Reading AUTHOR: Eugene Wallingford DATE: March 16, 2017 8:50 AM DESC: ----- BODY: A key passage from Peter Seibel's 2014 essay, Code Is Not Literature:

But then it hit me. Code is not literature and we are not readers. Rather, interesting pieces of code are specimens and we are naturalists. So instead of trying to pick out a piece of code and reading it and then discussing it like a bunch of Comp Lit. grad students, I think a better model is for one of us to play the role of a 19th century naturalist returning from a trip to some exotic island to present to the local scientific society a discussion of the crazy beetles they found: "Look at the antenna on this monster! They look incredibly ungainly but the male of the species can use these to kill small frogs in whose carcass the females lay their eggs."

The point of such a presentation is to take a piece of code that the presenter has understood deeply and for them to help the audience understand the core ideas by pointing them out amidst the layers of evolutionary detritus (a.k.a. kludges) that are also part of almost all code. One reasonable approach might be to show the real code and then to show a stripped down reimplementation of just the key bits, kind of like a biologist staining a specimen to make various features easier to discern.

My scientist friends often like to joke that CS isn't science, even as they admire the work that computer scientists and programmers do. I think Seibel's essay expresses nicely one way in which studying software really is like what natural scientists do. True, programs are created by people; they don't exist in the world as we find it. (At least programs in the sense of code written by humans to run on a computer.) But they are created under conditions that look a lot more like biological evolution than, say, civil engineering. As Hal Abelson says in the essay, most real programs end up containing a lot of stuff just to make it work in the complex environments in which they operate. The extraneous stuff enables the program to connect to external APIs and plug into existing frameworks and function properly in various settings. But the extraneous stuff is not the core idea of the program. When we study code, we have to slash our way through the brush to find this core. When dealing with complex programs, this is not easy. The evidence of adaptation and accretion obscures everything we see. Many people do what Seibel does when they approach a new, hairy piece of code: they refactor it, decoding the meaning of the program and re-coding it in a structure that communicates their understanding in terms that express how they understand it. Who knows; the original program may well have looked like this simple core once, before it evolved strange appendages in order to adapt to the condition in which it needed to live. The folks who helped to build the software patterns community recognized this. They accepted that every big program "in the wild" is complex and full of cruft. But they also asserted that we can study such programs and identify the recurring structures that enable complex software both to function as intended and to be open to change and growth at the hands of programmers. One of the holy grails of software engineering is to find a way to express the core of a system in a clear way, segregating the extraneous stuff into modules that capture the key roles that each piece of cruft plays. Alas, our programs usually end up more like the human body: a mass of kludges that intertwine to do some very cool stuff just well enough to succeed in a harsh world. And so: when we read code, we really do need to bring the mindset and skills of a scientist to our task. It's not like reading People magazine. -----