Knowing and Doing: March 2011 Archives

March 2011
Su	Mo	Tu	We	Th	Fr	Sa
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

March 31, 2011 8:06 PM

My Erdos Number

Back in the early days of my blog, I wrote about the H number as a measure of a researcher's contribution to the scientific community. In that article, the mathematician Paul Erdos makes a guest appearance in a quoted discussion about the trade-off between a small number of highly influential articles and a large number of articles having smaller effect.

Erdos is perhaps the best example of the former. By most accounts, he published more papers than any other mathematician in history, usually detailing what he called "little theorems". He is also widely know for the number of different coauthors with whom he published, so much so that one's Erdos number is a badge of honor among mathematicians and computer scientists. The shorter the path between a researcher and Erdos in the collaboration graph of authors and co-authors, the more impressive.

Kevlin Henney recently pointed me in the direction of Microsoft's VisualExplorer, which finds the shortest paths between any author and Erdos. Now I know that my Erdos number is 3. To be honest, I was surprised to find that my number was so small. There are many paths of lengths four and five connecting me to Erdos, courtesy of several of my good buddies and co-authors who started their professional lives in mathematics. (Hey to Owen and Joe.)

But thanks to Dave West, I have a path of length 3 to Erdos. I have worked with Dave at OOPSLA and at ChiliPLoP on a new vision for computer science, software development, and university education. Like me, Dave has not published a huge number of papers, but he has an eclectic set of interests and collaborators. One of his co-authors published with Erdos. 1-2-3!

In the world of agile software development, we have our own graph-theoretic badge of honor, the Ward number. If you have pair-programmed with Ward Cunningham, your Ward number is 1... and so on. My Ward number is 2, via the same Joe in my Erdos network, Bergin.

Back in even earlier days of my blog, I wrote an entry connected to Erdos, via his idea of Proofs from THE BOOK. Erdos was a colorful character!

Yes, computer scientists and mathematicians like to have fun, even if their fun involves graphs and path-finding algorithms.

Posted by Eugene Wallingford | Permalink | Categories: General

March 29, 2011 8:13 PM

Global Variables Considered

Last week, a student stopped in to ask a question. He had written a program for one of his courses of which he was especially proud. It consisted in large part of two local functions, and used recursion in a way that created an elegant, clear solution.

Yet his professor dinged his grade severely. The student had used a global variable.

That's when the question for me arrived.

But why are we taught not to use global variables?

First, let me say that this a strong student. He is not the sort to beg for points, and he wasn't asking me this question as a snark or a complaint. He really wanted to know the answer.

My first response was cynical and at least partly tongue-in-cheek. We teach you not to use global variables because we were taught to use global variables.

My second response was to point out that "global" is a relative term, not an absolute one. In OO languages, we write classes that contain instance variables and methods that operate on them. The instance variables are global to class's methods and local to class's clients. The programming world seems to like such "globals" just fine.

That is the beginning of my trouble trying to create an argument that supports the way in which his program was graded. In his program, written in a traditional procedural language, the offending variable was local to one procedure but global to two nested procedures. That sounds awfully similar to an ordinary Java class's instance variables!

On the extreme end of the global/local continuum we have a language like Cobol. All data is declared at the top of a program in an elaborate Data Division, and the "paragraphs" of the Procedure Division refer back to it. Not many computer scientists spend much time defending Cobol, but its design and organization make perfectly good sense in context, and programmers are able to write understandable, large programs.

As the student and I talked, I explained two primary reasons for the historical bias against globals:

Readability. When a variable lives outside the code that manipulates it, there is a chance that it can become separated in space from that code. As a large program evolves over time, it seems that the chance the variable will become separated from the related code approaches 1. That makes the code hard to understand. When the reader encounters a variable, she may have a hard time knowing what it means without seeing the code that uses it. When she encounters a procedure with a reference to a faraway variable, she may have a hard time knowing what the code does without easy reference to the variable and any other code that uses it.

This force is counteracted effectively in some circumstances. In OOP, we try not to write classes that are too long, which means that the instance vars and the methods will be relatively close to one another in the file or on the printed page. Furthermore, there is a convention that the vars will be declared at the top or bottom of the class, so the reader can always find them easily enough. That's part of what makes Cobol's layout work: readers study the Data Division first and then read the Procedure Division with an eye to the top of the file.

My student's programming had a structure that mirrored a small class: a procedure with a variable and two local procedures of reasonable size. I can imagine endorsing the relatively global variable because it was part of a understandable, elegant program.

Not-Obvious Dependencies. When two or more procedures operate on the same variable that lives outside all of them, there is a risk of that lack of readability rises to something worse: an inability to divine how the program works. The two procedures exert influence over each other's behavior through the values stored in the shared variable. In OO programs, this interaction is an expected part of how objects behave, and we try to keep methods and the class as a whole small enough to counteract the problem.

In the general case, though, we can end up with variables and procedures scattered throughout a program and interacting in non-obvious ways. A change to one procedure might affect another. Adding another procedure that refers to or changes the variable complicates matters for all existing procedures in the co-dependent relationship. Hidden dependencies are the worst kind of all.

This is what really makes global variables bad for us. Unless we can counteract this force effectively, we really don't want to use them.

These are two simple technical reasons that programmers prefer not to use global variables. CS professors tend to simplify them into the dictum, "no global variables allowed", and make it a hard and fast rule for beginners. Unfortunately, sometimes we forget to take the blinders off after our students -- or we ourselves! -- become more accomplished programmers. The dictum becomes dogma, and a substitute for professional judgment.

I have what I regard as a healthy attitude about global variables. But I admitted to my student that I have my own weakness in the dictum-turned-dogma arena. When I teach OOP to beginners, we follow the rule All instance variables are private. I'm a reasonable guy and so am willing to talk to students who want to violate the rule, but it's pretty much non-negotiable. I've never had a first- or second- year OOP student convince me that one of his IVs should protected or -- heaven forbid! -- public. But even in my own code, after a couple of decades of doing OOP, I rarely violate this rule. I say "rarely" only in the interest of being conservative in my assessment. I can't remember the last time I wrote a class with a public instance variable.

Not all teachers are good at giving up their dogma. Some professors don't even realize that what they believe is best thought of as an oversimplification for the purposes of helping novices develop good habits of thought.

Ironically, last semester I ran across the paper Global Variable Considered Harmful, by Bill Wulf and Mary Shaw. (If you can't get through the ACM paywall, you can also find the paper here.) This is, I think, the first published attempt to explain why the global variable is a bad idea. Read it -- it's a nice treatment of the issues as they existed back in 1973. Forty years later, I am comfortable using variables that are relatively global to one or more procedures under controlled conditions. At this point in my programming career, I willing to use my professional judgment to help me make good programs, not just programs that follow the rules.

I shared the Wulf and Shaw paper with my student. I hope he got a kick out of it, and I hope he used it to inform his already reliable professional judgment. The paper might even launch him ahead of the profs who teach the prohibition on global variables as if it were revealed truth.

Posted by Eugene Wallingford | Permalink | Categories: Software Development

March 28, 2011 8:14 PM

A Well-Meaning Headline Sends an Unfortunate Signal

Last week, the local newspaper ran an above-the-fold front-page story about the regional Physics Olympics competition. This is a wonderful public-service piece. It extols young local students who spend their extracurricular time doing math and physics, and it includes a color photo showing two students who are having fun. If you would like to see the profile of science and math raised among the general public, you could hardly ask for more.

Unless you read the headline:

Young Einsteins

I don't want to disparage the newspaper's effort to help the STEM cause, but the article's headline undermines the very message it is trying to send. Science isn't fun; it isn't for everyone; it is for brains. We're looking for smart kids. Regular people need not apply.

Am I being too sensitive? No. The headline sends a subtle message to students and parents. It sends an especially dangerous signal to young women and minorities. When they see a message that says, "Science kids are brainiacs", they are more likely than other kids to think, "They don't mean me. I don't belong."

I don't want anyone to mislead people about the study of science, math, and CS. They are not the easiest subjects to study. Most of us can't sleep through class, skip homework, and succeed in these courses. But discipline and persistence are more important ingredients to success than native intelligence, especially over the long term. Sometimes, when science and math come too easily to students early in their studies, they encounter difficulties later. Some come to count on "getting it" quickly and, when it no longer comes easily, they lose heart or interest. Others skate by for a while because they don't have to practice and, when it no longer comes easily, they haven't developed the work habits needed to get over the hump.

If you like science and math enough to work at them, you will succeed, whether you are an Einstein or not. You might even do work that is important enough to earn a Nobel Prize.

Posted by Eugene Wallingford | Permalink | Categories: General

March 26, 2011 12:08 PM

Narrow Caution and Noble Issue

A cautionary note from John Ruskin, in The Stones of Venice:

We are to take care how we check, by severe requirement or narrow caution, efforts which might otherwise lead to a noble issue; and, still more, how we withhold our admiration from great excellencies, because they are mingled with rough faults.

Ruskin was a permission giver.

I found this passage in The Seduction, an essay by Paula Marantz Cohen. Earlier in the piece, she related that many of her students were "delighted" by Ruskin's idea that "the best things shall be seldomest seen in their best form". The students...

... felt they were expected to be perfect in whatever it was they undertook seriously (which might be why they resisted undertaking much seriously).

In the agile software development world, we recognize that fear even short of perfectionism can paralyze developers, and we take steps to overcome the danger (small steps, tests first, pair programming). We teachers need to remember that our high school and college students feel the same way -- and that their feelings are often made even more formidable by the severe requirement and narrow caution by which we check their efforts.

Marantz closes her essay by anticipating that other professors might not like her new approach to teaching, because it "dumbs things down" with shorter reading assignments, shorter writing assignments, and classroom discussion that allows personal feelings. It seems to me, though, that getting students to connect with literature, philosophy, and ideas bigger than themselves is an important win. One advantage of shorter writing assignments was that she was able to give feedback more frequently and thus focused more directly on specific issues of structure and style. This is a positive trade-off.

In the end she noted that, despite working from a much squishier syllabus and with a changing reading list, students did not complain about grades. Her conclusion:

I suspect that students focus on grades when they believe that this is all they can get out of a course. When they feel they have learned something, the grade becomes less important.

I have felt this, both as student and as teacher. When most of the students in one of my classes are absorbed in their grade, it usually means that I am doing something wrong with the class.

Go forth this week and show admiration for the great excellencies in your students, your children, and your colleagues, not only despite the excellencies being mingled with rough faults, but because they are so.

Posted by Eugene Wallingford | Permalink | Categories: Managing and Leading, Teaching and Learning

March 25, 2011 4:40 PM

Another Conference Changes Its Name

Ralph Johnson reports that "the conference on aspect oriented software development is renaming itself to 'Modularity'", much as OOPSLA has become SPLASH.

For the last couple of decades, computer science research has been focusing on more and more specific domains. The area of artificial intelligence, for example, soon spawned journals and conferences devoted specifically to sub-areas such as machine learning, expert systems, computer vision, and many others. It's interesting for me to see conferences such as AOSD and OOPSLA going the other direction, moving from the technology that spawned the conference in the first place to the more general idea or goal that motivates its community.

Of course, these conferences aren't purely academic; they have always had a strong alliance between industry and academia. Perhaps that is one of the reasons they are willing to rebrand themselves. Certainly, the changing economic model that drives this sort of conference is playing a big role.

By the way, Ralph's entry isn't really about the changing conference name. He merely uses that as a launching point for something more interesting: a first cut at cataloging different types of modularity. That is the best reason to read it!

Posted by Eugene Wallingford | Permalink | Categories: Software Development

March 24, 2011 10:23 PM

Teachers and Programming Languages as Permission Givers

Over spring break, I read another of William Zinsser's essays at The American Scholar, called Permission Givers. Zinsser talks about importance of people who give others permission to do, to grow, and to explore, especially in a world that offers so many freedoms but is populated with people and systems that erect barriers at every turn.

My first reaction to the paper was as a father. I have recognized our elementary and high schools as permission-denying places in a way I didn't experience them as a student myself, and I've watched running the gauntlet of college admissions cause a bright, eager, curious child to wonder whether she is good enough after all. But my rawest emotions were fear and hope -- fear that I had denied my children permission too often, and hope that on the whole I had given them permission to do what they wanted to do and become who they can be. I'm not talking about basic rules; some of those are an essential part of learning discipline and even cultivating creativity. I mean encouraging the sense of curiosity and eagerness that happy, productive people carry through life.

The best teachers are permission givers. They show students some of what is possible and then create conditions in which students can run with ideas, put them together and take them apart, and explore the boundaries of their knowledge and their selves. I marvel when I see students creating things of beauty and imagination; often, there is a good teacher to be found there as well. I'm sad whenever I see teachers who care deeply about students and learning but who sabotage their students' experience by creating "a long trail of don'ts and can'ts and shouldn'ts", by putting subtle roadblocks along the path of advancement.

I don't think that by nature I am permission giver, but over my career as a teacher I think I've gotten better. At least now I am more often aware of when I'm saying 'no' in subtle and damaging ways, so that I can change my behavior, and I am more often aware of the moments when the right words can help a student create something that matters to them.

In the time since I read the essay, another strange connection formed in my mind: Some programming languages are permission givers. Some are not.

Python is a permission giver. It doesn't erect many barriers that get in the way of the novice, or even the expert, as she explores ideas. Ruby is a permission giver, too, but not to the extent that Python is. It's enough more complex syntactically and semantically that things don't always work the way one first suspects. As a programmer, I prefer Ruby for the expressiveness it affords me, but I think that Python is the more empowering language for novices.

Simplicity and consistency seem to be important features of permission-giving languages, but they are probably not sufficient. Another of my favorite languages, Scheme, is simple and offers a consistent model of programming and computation, but I don't think of it as a permission giver. Likewise Haskell.

I don't think that the tired argument between static typing and dynamic typing is at play here. Pascal had types but it was a permission giver. Its descendant Ada, not so much.

I know many aficionados of other languages often feel differently. Haskell programmers will tell me that their language makes them so productive. Ada programmers will tell me how their language helps them build reliable software. I'm sure they are right, but it seems to me there is a longer learning curve before some languages feel like permission givers to most people.

I'm not talking about type safety, power, or even productivity. I'm talking about the feeling people have when they are deep in the flow of programming and reach out for something they want but can't quite name... and there it is. I admit, too, that I also have beginners in mind. Students who are learning to program, more than experts, need to be given permission to experiment and persevere.

I also admit that this idea is still new in mind and is almost surely colored heavily by my own personal experiences. Still, I can't shake the feeling that there is something valuable in this notion of language as permission giver.

~~~~

If nothing else, Zinsser's essay pointed me toward a book I'd not heard of, Michelle Feynman's Reasonable Deviations from the Beaten Track, a collection of the personal and professional letters written by her Nobel Prize-winning father. Even in the most mundane personal correspondence, Richard Feynman tells stories that entertain and illuminate. I've only begun reading and am already enjoying it.

Posted by Eugene Wallingford | Permalink | Categories: Computing, Teaching and Learning

March 23, 2011 8:13 PM

SPLASH 2011 and the Educators' Symposium

I have been meaning to write about SPLASH 2011 and especially the Educators' Symposium for months, and now I find that Mark Guzdial has beaten me to the punch -- with my own words, no less! Thanks to Mark for spreading the news. Go ahead and read his post if you'd like to see the message I sent to the SIGCSE membership calling for their submissions. Or visit the call for participation straightaway and see what the program committee has in mind. Proposals are due on April 8, only a few weeks hence. Dream big -- we are.

For now, though, I will write the entry I've been intending all these months:

The Next Ten Years of Software Education

By the early 2000s, I had become an annual attendee of OOPSLA and had served on a few Educators' Symposium program committees. Out of the blue, John Vlissides asked me to chair the 2004 symposium. I was honored and excited. I eventually got all crazy and cold called Alan Kay and asked him to deliver our keynote address. He inspired us with a vision and ambitious charge, which we haven't been able to live up to yet.

When I was asked to chair again in 2005, we asked Ward Cunningham to deliver our keynote address. He inspired us with his suggestions for nurturing simple ideas and practices. It was a very good talk. The symposium as whole, though, was less successful at shaking things than in 2004. That was likely my fault.

I have been less involved in the Educators' Symposium since 2006 or 2007, and even less involved in OOPSLA more broadly. Being department head keeps me busy. I have missed the conference.

Fast-forward to 2010. OOPSLA has become SPLASH, or perhaps more accurately been moved under the umbrella of SPLASH. This is something that we had talked about for years. 2011 conference chair Crista Lopes was looking for a Educators' Symposium chair and asked me for any names I might suggest. I admitted to her that I would love to get involved again, and she asked me to chair. I'm back!

OOPSLA was OO, or at least that what its name said. It had always been about more, but the name brand was of little value in a corporate world in which OOP is mainstream and perhaps even passe. Teaching OOP in the university and in industry has changed a lot over the last ten years, too. Some think it's a solved problem. I think that's not true at all, but certainly many people have stopped thinking very hard about it.

In any case, conference organizers have taken the plunge. SPLASH != OOPSLA and is now explicitly not just about OO. The new conference acknowledges itself to be about programming more generally. That makes the Educators' Symposium something new, too, something more general. This creates new opportunities for the program committee, and new challenges.

We have decided to build the symposium around a theme of "The Next Ten Years". What ideas, problems, and technologies should university educators and industry trainers be thinking about? The list of possibilities is long and daunting: big data, concurrency, functional programming, software at Internet scale... and even our original focus, object-oriented programming. Our goal for the end of the symposium is to be able to write a report outlining a vision for software development education over the next ten years. I don't expect that we will have many answers, if any, but I do expect that we can at least begin to ask the right questions.

And now here's your chance to help us chart a course into the future, whether you plan to submit a paper or proposal to the symposium:

Who would be a killer keynote speaker?

What person could inspire us with a vision for computer science and software, or could ask us the questions we need to be asking ourselves?

Finding the right keynote speaker is one of the big questions I'm thinking about these days. Do you have any ideas? Let me know.

(And yes, I realize that Alan Kay may well still be one of the right answers!)

In closing, let me say that whenever I say "we" above, I am not speaking royally. I mean the symposium committee that has graciously offered their time and energy to designing implementing this challenge: Curt Clifton, Danny Dig, Joe Bergin, Owen Astrachan, and Rick Mercer. There are also a handful of people who have been helping informally. I welcome you to join us.

Posted by Eugene Wallingford | Permalink | Categories: Software Development, Teaching and Learning

March 22, 2011 4:45 PM

Encounters with Large Numbers and the Limits of Programs

Yesterday, this xkcd illustration of radiation doses made the round of Twitter. My first thought was computational: this graph is a great way to help students see what "order of magnitude" means and how the idea matters to our understanding of a real-world phenomenon.

Late yesterday afternoon, one of my colleagues stopped by to describe a Facebook conversation he had been having with a few of our students, and in particular one of out better students. This student announced that he was going to write a program to generate all possible brackets for the men's NCAA basketball tournament. My colleague said, "Um, there are 2 to the 67th power brackets", to which the student responded, "Yeah, I know, that's why I'm going to write a program. There are too many to do by hand." From this followed a discussion of just how big 2**67 is and how long it would take a program to generate all the brackets. Even using a few heuristics to trim the problem down, such as always picking a 1-seed to beat a 16-seed, the number is astronomically large. (Or, as Richard Feynman suggests, "economically large".)

Sometimes, even good students can gain a better understanding of a concept by encountering it in the wild. This is perhaps even more often true when the idea is unintuitive or beyond our usual experience.

Posted by Eugene Wallingford | Permalink | Categories: Computing

March 10, 2011 9:21 PM

SIGCSE Day 2 -- Limited Exposure

For a variety of reasons, I am scheduled for only two days at SIGCSE this year. I did not realize just how little time that is until I arrived and started trying to work in all the things I wanted to do: visit the exhibits, attend a few sessions and learn a new thing or two, and -- most important -- catch up with several good friends.

It turns out that's hard to do in a little more than a day. Throw in a bout of laryngitis in the aftermath of a flu-riddled week, and the day passed even more quickly. Here are a few ideas that stood out from sessions on either end of the day.

Opening Keynote Address

Last March I blogged about Matthias Felleisen winning ACM's Outstanding Educator Award. This morning, Felleisen gave the opening address for the conference, tracing the evolution of his team's work over the last fifteen years in a smooth, well-designed talk. One two-part idea stood out for me: design a smooth progression of teaching languages that are neither subset nor superset of any particular industrial-strength language, then implement them, so that your tools can support student learning as well as possible.

Matthias's emphasis on the smooth progression reminds me of Alan Kay's frequent references to the fact that English-speaking children learn the same language used by Shakespeare to write our greatest literature, growing into it over time. One of his goals for Smalltalk, or whatever replaces it, is a language that allows children to learn programming and grow smoothly into more powerful modes of expression as their experience and cognitive skills grow.

Two Stories from Scratch

At the end of the day, I listened in on a birds-of-a-feather session about Scratch, mostly in K-12 classrooms. One HS teacher described how his students learn to program in Scratch and then move onto a "real language". As they learn concepts and vocabulary in the new language, he connects the new terms back to their concrete experiences in Scratch. This reminded me of a story in one of Richard Feynman's books, in which he outlines his father's method of teaching young Richard science. He didn't put much stock in learning the proper names of things at first, instead helping his son to learn about how things work and how they relate to one another. The names come later, after understanding. One of the advantages of a clean language such as Scratch (or one of Felleisen's teaching languages) is that it enables students to learn powerful ideas by using them, not by memorizing their names in some taxonomy.

Later in the session, Brian Harvey told the story of a Logo project conducted back in the 1970s, in which each 5th-grader in a class was asked to write a Logo program to teach a 3rd-grader something about fractions. An assignment so wide open gave every student a chance to do something interesting, whatever they themselves knew about fractions. I need to pull this trick out of my teaching toolbox a little more often.

(If you know of a paper about this project, please send me a pointer. Thanks.)

~~~~

There is one unexpected benefit of a short stay: I am not likely to leave any dynamite blog posts sitting in the queue to be written, unlike last year and 2008. Limited exposure also limits the source of triggers!

Posted by Eugene Wallingford | Permalink | Categories: Computing, Teaching and Learning

March 09, 2011 11:31 PM

SIGCSE Day 1 -- Innovative Approaches for Introducing CS

SIGCSE 2011 in Dallas, Texas

I'm in Dallas for a couple of days for SIGCSE 2011. I owe my presence to Jeff Forbes and Owen Astrachan, who organized a pre-conference workshop on innovative approaches for introducing computer science and provided support for its participants, courtesy of their NSF projects.

The Sheraton Dallas is a big place, and I managed to get lost on the way to the workshop this morning. As I entered the room fifteen minutes late, Owen was just finishing up talking about something called the Jinghui Rule. I still don't know what it is, but I assume it had something to do with us not being able to use our laptops during much of the day. This saves you from reading a super-long breakdown of the day, which is just as well. The group will produce a report soon, and I'm sure Jeff and Owen will do more complete job than I might -- not least of which because we all produced summaries of our discussion throughout the day, presented them to the group as a whole, and submitted them to our leaders for their use.

The topics we discussed were familiar ones, including problems, interdisciplinary approaches, integrative approaches, motivating students, and pedagogical issues. Even still, the discussions were often fresh, as most everyone in the room wrestles with these topics in the trenches and is constantly trying new things.

I did take a few notes the old-fashioned way about some things that stood out to me:

Owen captured the distinction between "interdisciplinary" and "integrative" well; here is my take. Interdisciplinary approaches pull ideas from other areas of study into our CS courses as a way to illustrate or motivate ideas. Integrative approaches push CS techniques out into courses in other areas of study where they become a native part of how people in those disciplines work.
Several times during the day people mentioned the need to "document best practices" of various sorts. Joe Bergin was surely weeping gently somewhere. We need more than disconnected best practices; we need a pattern language or two for designing certain kinds of courses and learning experiences.
Several times during the day talk turned to what one participant termed student-driven discovery learning. Alan Kay's dream of an Exploratorium never strays far from my mind, especially when we talk about problem-driven learning. We seem to know what we need to do!
A group of us discussed problems and big data in a "blue sky" session, but the talk was decidedly down-to-earth: the need to format, sanitize, and package data sets for use in the classroom.
One of the biggest challenges we face is the invisibility of computing today. Most everyone at the workshop today views computing's ubiquity as a great opportunity, and I often feel the same way. But I fear the reality is that, for most everyone else, computing has disappeared into the background noise of life. Convincing them that it is cool to understand how, say, Facebook works may be a tougher task than we realize.

Finally, Ge Wang demoed some of the cool things you can do with an iPhone using apps like those from Smule. Wow. That was cool.

Posted by Eugene Wallingford | Permalink | Categories: Computing, Teaching and Learning

March 04, 2011 5:25 PM

The Growing Buzz around Empirical Analysis of Repositories

This has turned into a recurring theme, due to a hopeful trend out in industry.

Last semester, I wrote a bit about studying program repositories as a way to understand how programmers work. Then last month, I wrote about simple empirical analysis of code, referring to Michael Feathers's article on how we can learn a lot about our program's design by looking at our commit log. Feathers went on to write a short note about getting empirical about refactoring, in which he expanded on the idea of looking at our code to understand its design better.

Now we have Turbulence, a package for pulling useful metrics about our code out of a git repository. The package began its life when Feathers and Corey Haines wrote a script to plot code churn versus its complexity. Haines has written a bit about the Turbulence project.

It doesn't end there. Developers are using Turbulence and adding to its code base. Feathers's has called for a renewed focus on design in the wild using the data we have at our fingertips. The physicians have begun to heal themselves, and they are leading the way for the rest of us.

One nice side effect of this trend is making available to a wider audience some of the academic research that has been done in this vein, such as Nagappan and Ball's paper on code churn and defect density. (I had the pleasure of meeting Ball when we served on a panel at OOPSLA several years ago.)

As many people are saying, we swim in data. We just have to find ways to use it well. I remain ever amazed at what our tools enable us to do.

All this talk about git has me resolved to go all the way and make a full switch to it. I've dabbled with git a bit and consumed a lot of software off GitHub, but now it's time to do all my development in it. Fortunately, there are a few excellent resources to help me, including the often-lauded Git Immersion guided tour by Jim Weirich and crew. and Scott Chacon's visually engaging Getting Git slidedeck. My trip to to SIGCSE and the spring break that follows immediately after can't come to soon!

Posted by Eugene Wallingford | Permalink | Categories: Software Development