TITLE: Methods, Names, and Assumptions in Adding New Code to a Program AUTHOR: Eugene Wallingford DATE: August 11, 2011 7:59 PM DESC: ----- BODY: Since the mid-1990s, there has been a healthy conversation around refactoring, the restructuring of code to improve its internal structure without changing its external behavior. Thanks to Martin Fowler, we have a catalog of techniques for refactoring that help us restructure code safely and reliably. It is a wonderful tool for learners and practitioners alike. When it comes to writing new code, we are not so lucky. Most of us learn to program by learning to write new code, yet we rarely learn techniques for adding code to a program in a way that is as safe and reliable as effective as the refactorings we know and love. You might think that adding code would be relatively simple, at least compared to restructuring a large, interconnected web of components. But how can we move with the same confidence when adding code as we do when we follow a meticulous refactoring recipe under the protection of good unit tests permits? Test-driven design is a help, but I have never felt like I had the same sort of support writing new code as when I refactor. So I was quite happy a couple of months ago to run across J.B. Rainsberger's Adding Behavior with Confidence. Very, very nice! I only wish I had read it a couple of months ago when I first saw the link. Don't make the same mistake; read it now. Rainsberger gives a four-step process that works well for him:
  1. Identify an assumption that the new behavior needs to break.
  2. Find the code that implements that assumption.
  3. Extract that code into a method whose name represents the generalisation you're about to make.
  4. Enhance the extracted method to include the generalisation.
I was first drawn to the idea that a key step in adding new behavior is to make a new method, procedure, or function. This is one of the basic skills of computer programming. It is one of the earliest topics covered in many CS1 courses, and it should be taught sooner in many others. Even still, most beginners seem to fear creating new methods. Even more advanced students will regress a bit when learning a new language, especially one that works differently than the languages they know well. A function call introduces a new point of failure: parameter passing. When worried about succeeding, students generally try to minimize the number of potential points of failure. Notice, though, that Rainsberger starts not with a brand new method, empty except for the new code to be written. This technique asks us first to factor out existing code into a new method. This breaks the job of writing the new code into two, smaller steps: First refactor, relying on a well-known technique and the existing tests to provide safety. Second, add the new code. (These are Steps 3 and 4 in Rainsberger's technique.) That isn't what really grabbed my attention first, however. The real beauty for me is that extracting a method forces us to give it us a name. I think that naming gives us great power, and not just in programming. A lot of times, CS textbooks make a deal about procedures as a form of abstraction, and they are. But that often feels so abstract... For programmers, especially beginners, we might better focus on the fact that help us to name things in our programs. Names, we get. By naming a procedure that contains a few lines of code, we get to say what the code does. Even the best factored code that uses good variable names tends to say how something is done, not what it is doing. Creating and calling a method separates the two: the client does what the method does, and the server implements how it is done. This separation gives us new power: to refactor the code in other ways, certainly. Rainsberger reminds us that it also gives us power to add code more reliably! "How can I add code to a program? Write a new function." This is an unsurprising, unhelpful answer most of the time, especially for novices who just see this as begging the question. "Okay, but what do I do then?" Rainsberger makes it a helpful answer, if a bit surprising. But he also puts it in a context with more support, what to do before we start writing the new code. Creating and naming procedures was the strongest take-home point for me when I first read this article. As the ideas steeped in my mind for a few days, I began to have a greater appreciation for Rainsberger's focus on assumptions. Novice thinkers have trouble with assumptions. This is true whether they are learning to program, learning to read and analyze literature, or learning to understand and argue public policy issues. They have a hard time seeing assumptions, both the ones they make and the ones made by other writers. When the assumptions are pointed out, they are often unsure what to do with them, and are tempted to skip right over them. Assumptions are easy to ignore sometimes, because they are implicit and thus easy to lose track of when deep in a argument. Learning to understand and reason about assumptions is another important step on the way to mature thinking. In CS courses, we often introduce the idea of preconditions and postconditions in Data Structures. (Students also see them in a discrete structures course, but there they tend to be presented as mathematical tools. Many students dismiss their value out of hand). Writing pre- and postconditions for a method is a way to make assumptions in your program explicit. Unfortunately, most beginning don't yet see the value in writing them. They feel like an extra, unnecessary step in a process dominated by the uncertainty they feel about their code. Assuring them that these invariants help is usually like pushing a rock up a hill. Tomorrow, you get to do it again. One thing I like about Rainsberger's article is that it puts assumptions into the context of a larger process aimed at helping us write code more safely. Mathematical reasoning about code does that, too, but again, students often see it as something apart from the task of programming. Rainsberger's approach is undeniably about code. This technique may encourage programmers to begin thinking about assumptions sooner, more often, and more seriously. As I said, I haven't seen many articles or books talk about adding code to a program in quite this way. Back in January, "Uncle Bob" Martin wrote an article in the same spirit as this, called The Transformation Priority Premise. It offers a grander vision, a speculative framework for all additions to code. If you know Uncle Bob's teachings about TDD, this article will seem familiar; it fits quite nicely with the mentality he encourages when using tests to drive the growth of a program. While his article is more speculative, it seems worthy or more thought. It encourages the tiniest of steps as each new test provokes new code in our program. Unfortunately, it takes such small steps that I fear I'd have trouble getting my students, especially the novices, to give it a fair try. I have a hard enough time getting most students to grok the value of TDD, even my seniors! I have similar concerns about Rainsberger's technique, but his pragmatism and unabashed focus on code gibes me hope that it may be useful teaching students how to add functionality to their programs. -----