TITLE: Computer Science as Science AUTHOR: Eugene Wallingford DATE: February 08, 2007 6:23 PM DESC: ----- BODY: If you're in computer science, you've heard the derision from others on campus. "Any discipline with 'science' in the name isn't." Or maybe you've heard, "What we really need on campus is a department of toaster oven science." These comments reflect a deep misunderstanding of what computing is, or what computer scientists do. We haven't always done a very good job telling our story. And a big part of what happens in the name of CS really isn't science -- it's engineering, or (sadly) technical training. But does that mean that no part of computer science is 'science'? Not at all. Computer science is grounded in a deep sense of empiricism, where the scientific method plays an essential role in the process of doing computer science. It's just that the entities or systems that we study don't always spring from the "natural world" without man's intervention. But they are complex systems that we don't understand thoroughly yet -- systems created by technological processes and by social processes. I mentioned an example of this sort of research from my own backyard back in December 2004. A master's student of mine, Nate Labelle, studied relationships among open-source software packages. He was interested in seeing to what extent the network of 'imports' relationships bore any resemblance to the social and physical networks that had been studies deeply by folks such as Barabasi and Watts. Nate conducted empirical research: he mapped the network of relationships in several different flavors of Linux as well as a few smaller software packages, and then determined the mathematical properties of the networks. He presented this work in a few places, including a paper at the 6th International Conference on Complex Systems. He concluded that:
... despite diversity in development groups and computer system architecture, resource coupling at the inter-package level creates small-world and scale-free networks with a giant component containing recurring motifs; which makes package networks similar to other natural and engineered systems.
This scientific result has implications for how we structure package repositories, how we increase software robustness and security, and perhaps how we guide the software engineering process. Studying such large systems is of great value. Humans have begun to build remarkably complex systems that we currently understand only surface deep, if at all. That's is often a surprise to non-computer scientists and non-technology people: The ability to build a (useful) system does not imply that we understand the system. Indeed, it is relatively easy for me to build systems whose gross-level behavior I don't yet understand well (using a neural network or genetic algorithm) or to build systems that perform a task so much better than I that it seems to operate on a completely different level than I do. A lot of chess and other game-playing programs can easily beat the people who wrote them! We can also apply this scientific method to study processes that occur naturally in the world using a computer. Computational science is, at its base, a blend of computer science and another scientific domain. The best computational scientists are able to "think computationally" in a way that qualitatively changes their domain science. As Bertrand Russell wrote a century ago, science is about description, not logic or causation or any other naive notion we have about necessity. Scientists describe things. This being the case, computer science is in many ways the ultimate scientific endeavor -- or at least a foundational one -- because computer science is the science of description. In computing we learn how to describe thing and process better, more accurately and more usefully. Some of our findings have been surprising, like the unity of data and program. We've learned that process descriptions whose behavior we can observe will teach us more than static descriptions of the same processes left to the limited interpretative powers of the human mind. The study of how to write descriptions -- programs -- has taught us more about language and expressiveness and complexity than our previous mathematics could ever have taught us. And we've only begun to scratch the surface. For those of you who are still skeptical, I can recommend a couple of books. Actually, I recommend them to any computer scientist who would like to reach a deeper understanding of this idea. The first is Herb Simon's seminal book The Sciences of the Artificial, which explains why the term "science of the artificial" isn't an oxymoron -- and why thinking it is is a misconception about how science works. The second is Paul Cohen's methodological text Empirical Methods for Computer Science, which both teaches computer scientists -- especially AI students -- how to do empirical research in computer science. Along the way, he demonstrates the use of these techniques on real CS problems. I seem to recall examples from machine learning, which is perticularly empirical in its approach. -----