TITLE: Data-Intensive Computing and CS Education AUTHOR: Eugene Wallingford DATE: March 26, 2008 7:27 AM DESC: ----- BODY: An article in the March 2008 issue of Computing Research News describes a relatively new partnership among NSF, Google, and IBM to help the academic computing "explore innovative research and education ideas in data-intensive computing". They define data-intensive computing a new paradigm in which the size of the data dominates all other performance features. Google's database of the web is one example, but so are terabytes and petabytes of scientific data collected from satellites and earth-bound sensors. On the hardware side of the equation, we need to understand better how to assemble clusters of computers to operate on the data and how to network them effectively. Just as important is the need to develop programming abstractions, languages, and tools that are powerful enough so that we mortals can grasp and solve problems at this massive scale. Google's Map-Reduce algorithm (an idea adapted from the functional programming world) is just a start in this direction. This notion of data-intensive computing came up in two of the plenary addresses at the recent SIGCSE conference. Not surprisingly, one was the talk by Google's Marissa Mayer, who encouraged CS educators to think about how we can help our students prepare to work within this paradigm. The second was the banquet address by Ed Lazowska, the chair of Washington's Computer Science department. Lazowska's focus was more on the need for research into the hardware and software issues that undergird computing on massive data sets. (My notes on Lazowska's talk are still in the works.) This recurring theme is one of the reasons that our Hot Topic group at ChiliPLoP began its work on the assembly and preparation of large data sets for use in early programming courses. What counts as "large" for a freshman surely differs from what counts as "large" for Google, but we can certainly begin to develop a sense of scale in our students' minds as they write code and see the consequences of their algorithms and implementations. Students already experience large data in their lives, with 160 GB video iPods in their pockets. Having them compute on such large sets should be a natural step. The Computing Research News also has an announcement of a meeting of the Big-Data Computing Study Group, which is holding a one-day Data-Intensive Computing Symposium today in Sunnyvale. I don't know how much of this symposium will report new research results and how much will share background among the players, in order to forge working relationships. I hope that someone writes up the results of the symposium for the rest of us... Though our ChiliPLoP group ended up working on a different project this year, I expect that several of us will continue with the idea, and it may even be a theme for us at a future ChiliPLoP. The project that we worked on instead -- designing a radically different undergraduate computer science degree program -- has some currency, though, too. In this same issue of the CRN, CRA board chair Dan Reed talks about the importance of innovation in computing and computing education:
As we debate the possible effects of an economic downturn, it is even more important that we articulate -- clearly and forcefully -- the importance of computing innovation and education as economic engines. [... T]he CRA has created a new computing education committee ... whose charge is to think broadly about the future of computing education. We cannot continue the infinite addition of layers to the computing curriculum onion that was defined in the 1970s. I believe we need to rethink some of our fundamental assumptions about computing education approaches and content.
Rethink fundamental assumptions and start from a fresh point of view is just what we proposed. We'll have to share our work with Reed and the CRA. -----