TITLE: Code Duplication as a Hint to Think Differently AUTHOR: Eugene Wallingford DATE: February 18, 2013 12:59 PM DESC: ----- BODY: Last week, one of my Programming Languages students sent me a note saying that his homework solution worked correctly but that he was bothered by some duplicated code. I was so happy. Any student who has me for class for very long hears a lot about the dangers of duplication for maintaining code, and also that duplication is often a sign of poor design. Whenever I teach OOP or functional programming, we learn ways to design code that satisfy the DRY principle and ways to eliminate it via refactoring when it does sneak in. I sent the student an answer, along with hearty congratulations for recognizing the duplication and wanting to eliminate it. My advice When I sat down to blog the solution, I had a sense of deja vu... Hadn't I written this up before? Indeed I had, a couple of years ago: Increasing Duplication to Eliminate Duplication. Even in the small world of my own teaching, it seems there is nothing new under the sun. Still, there was a slightly different feel to the way I talked about this in class later that day. The question had come earlier in the semester this time, so the code involved was even simpler. Instead of processing a vector or a nested list of symbols, we were processing with a flat list of symbols. And, instead of applying an arbitrary test to the list items, we were simply counting occurrences of a particular symbol, s. The duplication occurred in the recursive case, where the procedure handles a pair:

    (if (eq? s (car los))
        (+ 1 (count s (cdr los)))      ; <---
        (count s (cdr los)))           ; <---

Then we make the two sub-cases more parallel:

    (if (eq? s (car los))
        (+ 1 (count s (cdr los)))      ; <---
        (+ 0 (count s (cdr los))))     ; <---

And then use distributivity to push the choice down a level:

    (+ (if (eq? s (car los)) 1 0)
       (count s (cdr los)))            ; <--- just once!

This time, I made a point of showing the students that not only does this solution eliminate the duplication, it more closely follows the command to follow the shape of the data:

When defining a program to process an inductively-defined data type, the structure of the program should follow the structure of the data.

This guideline helps many programmers begin to write recursive programs in a functional style, rather than an imperative style. Note that in the first code snippet above, the if expression is choosing among two different solutions, depending on whether we see the symbol s in the first part of the pair or not. That's imperative thinking. But look at the list-of-symbols data type:

    <list-of-symbols> ::= ()
                        | (<symbol> . <list-of-symbols>)

How many occurrences of s are in a pair? Obviously, the number of s's found in the car of the list plus the number of s's found in the cdr of the list. If we design our solution to match the code to the data type, then the addition operation should be at the top to begin:

    (+ ; number of s's found in the car
       ; number of s's found in the cdr )

If we define the answer for the problem in terms of the data type, we never create the duplication-by-if in the first place. We think about solving the subproblems for the car and the cdr, fill in the blanks, and arrive immediately at the refactored code snippet above. I have been trying to help my students begin to "think functionally" sooner this semester. There is a lot or room for improvement yet in my approach. I'm glad this student asked his question so early in the semester, as it gave me another chance to model "follow the data" thinking. In any case, his thinking was on the right track. -----