TITLE: Increasing Duplication to Eliminate Duplication AUTHOR: Eugene Wallingford DATE: February 27, 2010 9:40 AM DESC: ----- BODY: In a recent entry, I discussed how Kent Beck' design advice "Exploit Symmetries" improves our ability to refactor code. When we take two things that are similar and separate them into parts that are either identical or different, we maximize the repetition in our code. This enables us to factor the repetition in the sharpest way. Here is a simple example from Scheme. Suppose we are writing a procedure to walk down a vector and count how many of items satisfy a particular condition. Along the way, we might produce code something like this:

  (define count-occurrences-of-test-at
    (lambda (test? von position)
      (if (>= position (vector-length von))
          0
          (if (test? (vector-ref von position))
              (+ 1 (count-occurrences-of-test-at test? von (+ position 1)))
              (count-occurrences-of-test-at test? von (+ position 1))))))

Our procedure duplicates code, but it may not be obvious at first how to factor it away. The problem is that the duplication occurs nested in a larger expression at two different levels: one is a consequent of the if expression, and the other is part of the computation that is the other consequent. As a first step, we can increase the symmetry in our code by rewriting the else clause as a similar computation:

  (define count-occurrences-of-test-at
    (lambda (test? von position)
      (if (>= position (vector-length von))
          0
          (if (test? (vector-ref von position))
              (+ 1 (count-occurrences-of-test-at test? von (+ position 1)))
              (+ 0 (count-occurrences-of-test-at test? von (+ position 1)))))))

Now we see that the duplicated code is always part of the value of expression, and the if expression itself is about choosing whether to add 1 or 0 to the value of the recursive call. We can use one of the distributive laws of code to factor out the repetition:

  (define count-occurrences-of-test-at
    (lambda (test? von position)
      (if (>= position (vector-length von))
          0
          (+ (if (test? (vector-ref von position)) 1 0)
             (count-occurrences-of-test-at test? von (+ position 1))))))

Voilá! No more duplication. By increasing the duplication in our code, we create a more symmetric relation, and the symmetry enables us to eliminate the duplication entirely. I have never thought of myself as thinking in terms of symmetry when I write code, but I do think in terms of regularity. My mind prefers code with regular form, both on the surface and in the programming structures I use. Often times, my thorniest refactoring problems arise when I let irregular structure sneak into my code. When some duplication or complexity make me uneasy, I find that taking the preparatory step of increasing regularity can help me see a way to simpler code. Of course, we might approach this problem differently altogether, from a functional point of view, and write a different sort of solution:

  (define count-occurrences-of-test
    (lambda (test? von)
      (apply + (vector-map test? von))))

This eliminates another form of duplication that we find across many procedures that operate on vectors: the common structure of the code simulates a loop over a vector. That is yet another form of regularity that we can exploit, once we begin to recognize it. Then, when we write new code, we can look for ways to express the solution in terms of the functionally mapping pattern, so that we don't have to roll our own loop by hand. When imperative programmers begin to see this form of symmetry, they are on their way to becoming functional programmers. (It is also the kind of symmetry at the heart of MapReduce.) -----