Programming Languages and Paradigms

Some students do not read documentation or instructions. (Present company excluded, of course.) As a result, they write functions that already exist, sometimes under a different name. That's not even always a bad thing... Perhaps it's better to use your time practicing than poking around documentation.

-- recursive poetry from Microsoft |

Write a Racket function ** (nth lst n)**, where

> (nth '(a b c) 2) 'c > (nth '(d c b a z) 1) 'c > (nth (range 1 1000) 41) 42 > (nth '(d c b a z) 10) nth: no such position

Make the position 0-based. If `n` is too big for the
list, return an error message as a string.

Check out a candidate solution in this source file. This is a fine example of using structural recursion to process an inductively-defined datatype. As we have seen, a Racket list can be defined inductively as:

<list> ::= () | (<any> . <list>)Racket even has a built-in predicate for the "any" datatype, named

Note a few things... First, it is probably simpler if we
think of the inductive datatype being processed in this
problem as the list, not the number. That way, the number
comes along for the ride, getting decremented on each call.
If we think of processing the number, we end up nesting the
second `if` expression in an odd way.

Second, we do not need
an interface procedure
to solve this problem. Unlike `annotate` last time,
this function requires that users pass an integer argument.
We can use it to simulate a loop by counting down to zero in
parallel with walking down the list.

Third, if the second argument were a symbol and we needed to return the position of the first occurrence, then we would need an interface procedure. That is Problem 5 on Homework 4!

`nth` is an implementation of Racket's primitive
`list-ref`
function. You can find
many other functions for working with lists
in the Racket docs. Most of them are also great exercises
for practicing your recursive programming skills!

My solution uses Racket's primitive
`error` function.
Feel free to use it whenever a function has an error case.
(Notice that we can test our error cases, using another test
feature of Rackunit.)

about a web page about a web page... |

Writing recursive functions well and confidently requires you to know several techniques, just as writing loops does. Last session, we explored the basic technique for writing recursive programs, the technique on which we base all of our recursive functions: structural recursion. With this technique, we mimic the structure of an inductive data specification in the code that processes the data.

We then introduced a second technique, the interface procedure, that hides implementation detail and allows us to preserve the argument signature specified for a function. Interface procedures are necessary whenever we find that we need a piece of data on each recursive call that is not provided by the original caller of the function.

This is a common occurrence in Racket. We encounter it frequently when processing vectors, which require positional access to values.

This session continues our discussion of recursive programming
by introducing two new techniques: **mutual recursion** and
**program derivation**. These techniques also help us to do
structural recursion in the face of specific circumstances that
we commonly encounter.

Last session, we defined two functions,
`remove-first`
and
`remove`,
over lists of symbols. *List of symbols* is the data type
on which they operate, and we had an inductive definition for the
type that guided our work. Today, let's consider a more complex
data structure, one that will be of great use to us when we write
programs to process languages: the **s-list**. The difference
between a list of symbols and an s-list is that the elements of
the list can themselves be s-lists.

Here is the BNF notation for an s-list:

<s-list> ::= () | (<symbol-expression> . <s-list>) <symbol-expression> ::= <symbol> | <s-list>

And here are some examples:

() (()) (a) ((a) b) (a b c) (a (b) c) (a b c d) (if (zero? n) zero (/ total n)) (a b c d e f g h) (cons (foo (car x)) (foo-cdr (cdr x)))

The items on the left are lists of symbols, but they are also
s-lists. A `symbol-expression` can be a symbol, or an
s-list.

Let's define a function, `subst`, that substitutes one
symbol for another anywhere in an s-list. You can think of
this as like a global "search and replace" operation. For
example, when applied to a program, `subst` can serve
as the foundation of an operation for renaming variables -- a
common refactoring that all programmers do.

This function takes three arguments: the new symbol, the old symbol, and an s-list to operate on:

> (subst 'd 'b '(a b c a b c d)) (ac adc d) > (subst 'a 'b '((a b) (((b g r) (f r)) c (d e)) b)) ((ad) (((ag r) (f r)) c (d e))a)a

Following the principle of structural recursion, the structure
of `subst` should follow the structure of the BNF
specification for an s-list:

(define subst (lambda (new old slist) (if (null? slist) ;; handle the empty list ;; handle a pair containing a symbol-expression )))

If `slist` is empty, there are nothing occurrences of
`old` to substitute, and the answer is the empty list.

(define subst (lambda (new old slist) (if (null? slist) '() ;; handle a pair containing a symbol-expression )))

The second arm of our BNF definition defines a case where
`slist` is a pair with the form
`(<symbol-expression> . <s-list>)`. The
result of `subst` will be the result of substituting
`new` for `old` in both parts, the
`<symbol-expression>` and the
`<s-list>`.

The first element of the pair is a `symbol-expression`.
Note, however, that `symbol-expression` is also defined
in terms of a choice. Our natural inclination might be to
implement this choice with a conditional expression. There
are two alternatives: the first element is a symbol, or it is
an s-list.

(define subst (lambda (new old slist) (if (null? slist) '() (if (symbol? (first slist)) ;; handle a symbol in the first, then the slist in the rest ;; handle an slist in the first, then the slist in the rest ))))

We have to return a list with the same structure as our input, so
in both cases we `cons` the result from the `first`
into the result from the `rest`.

(define subst (lambda (new old slist) (if (null? slist) '() (if (symbol? (first slist)) (cons ;; handle symbol in first ;; handle slist in rest) (cons ;; handle slist in first ;; handle slist in rest) ))))

If it is a symbol, we must determine whether or not to replace
it with `new`. We replace the symbol if it is equal to
`old`, and otherwise we leave it alone:

(define subst (lambda (new old slist) (if (null? slist) '() (if (symbol? (first slist)) (if (eq? (first slist) old) (cons new ;; handle the slist in the rest ) (cons (first slist) ;; handle the slist in the rest ) ;; handle an slist in the first, then the slist in the rest ))))

In both of these cases, we need to substitute `new` for
`old` in the `rest` of the list. The
`rest` is an s-list, so we can use `subst` to
compute that part of our result.

(define subst (lambda (new old slist) (if (null? slist) '() (if (symbol? (first slist)) (if (eq? (first slist) old) (cons new (subst new old (rest slist))) (cons (first slist) (subst new old (rest slist))) ) ;; handle an slist in the first, then the slist in the rest ))))

The only thing left to do is to decide what to do when the
first member of `slist` is not a symbol. In that case,
it is an s-list. We are in luck. We already have a fucntion
for substituting symbols in s-lists. It is the function that
we are writing, `subst`! So, we can:

- substitute
`new`for`old`in the`first`of the list, - substitute
`new`for`old`in the`rest`of the list, and - construct a new pair consisting of these two results.

(define subst (lambda (new old slist) (if (null? slist) '() (if (symbol? (first slist)) (if (eq? (first slist) old) (cons new (subst new old (rest slist))) (cons (first slist) (subst new old (rest slist))) ) (cons (subst new old (first slist)) (subst new old (rest slist)))))))

And we are done, or at least with we have a working solution. Our basic structural recursion technique has served us well.

Programming Aside: Notice how the indentation of this code makes the control structures we are using as clear as possible. Whenever you write a program -- especially in a language like Racket (with a uniform syntax (and so (many) function calls!)) -- you should strive towrite code that tells us how to read itself.

Our function works but, if we are honest with ourselves, we must admit that it has a couple of weaknesses.

First, we have repeated the expression
`(subst new old (rest slist))` three times, including
*twice* in the same arm of the main `if` expression.
We know that repeated code can cause all sorts of problems in
maintenance. But having to write these expressions separately
also makes it hard for us to write the function in the first
place. A mistake, even a typo, in any of the expressions will
break our function. Besides, all the repetition makes the code
harder to read.

Second, it is not really faithful to the structure suggested by the BNF. Look at the definition of an s-list again:

<s-list> ::= () | (<symbol-expression> . <s-list>) <symbol-expression> ::= <symbol> | <s-list>

Structural recursion tells us that the structure of our code
should reflect the structure of the data. *Our code does
not*. There are two BNF expressions in the data
definition, but we have written only one function!

The second weakness causes the first. By not following the data structure, we have created extra cases to solve, which requires us to duplicate code.

If you look back at the step-by-step evolution of our function, you will see a clue hinting at this second weakness. We had to leave ourselves detailed notes using comments so that we did not lose our place as we solved small parts of the problem. Those comments are a sign that we are managing a lot of complexity in our heads. But the data type we are processing is not that complex!

My running commentary does more than give us a clue about when we went off track. It tells us exactly where:

The first element of the pair is asymbol-expression. Note, however, thatsymbol-expressionis also defined in terms of a choice.Our natural inclination might be to implement this choice with a conditional expression.. ...

A better way to reflect the choice between kinds of symbol
expression would be to
follow the data definition.
An s-list is defined in terms of symbol expression, and a symbol
expression is defined in terms of s-list. We say that such data
types are **mutually inductive**. We'd like for our code to
show this relationship, too.

Patterns that show up in data should probably show up in the code that processes the data. (And in the languages we use to write the code...)

For our program structure to follow the pattern of the BNF,
we must define a function for substituting symbols in
s-lists, called `subst`, *and* a function for
substituting symbols in symbol expressions, called, say,
`subst-symbol-expr`. Because each data type is
defined in terms of the other, these functions will call
one another. This technique is called **mutual recursion**,
because the recursion involves two functions that call one
another, working together to create a solution.

To begin, let's suppose that `subst-symbol-expr`
exists and works. The "else" clause of our main decision
in `subst` becomes quite easy to write:

- substitute
`new`for`old`in the`first`of the s-list using`subst-symbol-expr`, - substitute
`new`for`old`in the`rest`of the s-list using`subst`, and - make a new pair from the results using
`cons`.

The definition of `subst` becomes:

(define subst (lambda (new old slist) (if (null? slist) '() (cons (subst-symbol-expr new old (first slist)) (subst new old (rest slist))) )))

Isn't that *much* clearer?

Now we have to write `subst-symbol-expr`. Using
structural recursion, the definition of this function follows
the BNF definition of the data type it processes, a symbol
expression. The BNF lists two alternatives for a symbol
expression: it is either a symbol, or it is an s-list. So:

(define subst-symbol-expr (lambda (new old symexp) (if (symbol? symexp) ;; handle a symbol ;; handle an slist )))

If the symbol expression is a symbol, then we decide whether to replace it with the new symbol:

(define subst-symbol-expr (lambda (new old symexp) (if (symbol? symexp) (if (eq? symexp old) new symexp) ;; handle an slist )))

If not, then it is an s-list. But we have already written a
function that can make substitutions in an s-list:
`subst`! Call it:

(define subst-symbol-expr (lambda (new old symexp) (if (symbol? symexp) (if (eq? symexp old) new symexp) (subst new old symexp))))

That's pretty clear, too.

Our solution now consists of two relatively small, relatively simple functions that work together to solve the problem.

What are the advantages of our new program?

- It now follows the BNF definition of the s-list data type more closely, which defined two types of expression. This makes the code easier to read and modify, because readers can easily find the parts of the program they care about from the parts of the data definition.
- We have simplified the definition considerably. Nested
`if`s can be hard to understand and trace, even for experienced programmers using good programming style. We now have only one nested`if`, and it is simpler the nesting we did in our first function. - We don't repeat code, in particular the three uses of
`subst`on the`rest`of the s-list, or multiple calls to`first`and`rest`. - Defining separate functions for each non-terminal in the
BNF breaks the
*programming*process into manageable parts and allows us to concentrate our efforts on one thing at a time.

Mutual recursion will be our technique of choice whenever we
have a multiple-part data definition.

Use mutual recursion to implement(count-occurrences s slist), which counts how many times the symbolsoccurs inslist.

For example:

> (count-occurrences 'a '(a b c)) 1 > (count-occurrences 'a '(((a be) a ((si be a) be (a be))) (be g (a si be)))) 5

The first step is to examine the BNF:

<s-list> ::= () | (<symbol-expression> . <s-list>) <symbol-expression> ::= <symbol> | <s-list>

From the BNF, we expect to write two functions, one that counts the symbol in an s-list and one that counts the symbol in a symbol expression.

We start with the pattern suggested by the BNF for s-list:

(define count-occurrences (lambda (s slist) (if (null? slist) ...;; slist is empty ...;; slist is a pair )))

We can conclude without much effort that a symbol occurs in
an empty list 0 times. In a non-empty list, the number of
times it occurs is equal to the number of times it occurs
in the `first` of the list **plus** the number of
times it occurs in the `rest` of the list.

Because this is a mutually recursive specification and
function, we will assume that a function named
`count-occurrences-sym-expr` exists, so that we can
use it to count the number of occurrences in the `car`
of the pair:

(define count-occurrences (lambda (s slist) (if (null? slist) 0 (+ (count-occurrences-sym-expr s (first slist)) (count-occurrences s (rest slist))) )))

Now, we define `count-occurrences-sym-expr`. The BNF
description for symbol expressions suggests the following
pattern:

(define count-occurrences-sym-expr (lambda (s sym-expr) (if (symbol? sym-expr) ...;; sym-expr is a symbol ...;; sym-expr is an slist )))

If the symbol expression is a symbol, then we need to determine
whether it is the symbol we're counting or not and return the
appropriate value, 0 or 1. If it is an s-list, then we have a
function for counting occurrences -- `count-occurrences`:

(define count-occurrences-sym-expr (lambda (s sym-expr) (if (symbol? sym-expr) (if (eq? s sym-expr) 1 0) ;; sym-expr is a symbol (count-occurrences s sym-expr) ))) ;; sym-expr is an slist

And we are done!

Our original definition of `subst` was somewhat confusing
-- to read *and* to write. We just saw that following
the BNF can make the program easier to program and easier to
understand. This ease comes, however, at the cost of extra
function calls.

How so? Notice that we now make two function calls each time
the `first` of the s-list contains an s-list: one to
`subst-symbol-expr`, and then a return call to
`subst`. Such "double dispatch" can be expensive on a
large dataset.

Sometimes, the run-time costs introduced by mutual recursion outweigh the program-time and read-time benefits of the separate functions. Can we modify our definition without losing too many of its benefits?

We can use Racket's substitution model to get back to a single function. Our solution currently looks like this:

(define subst (lambda (new old slist) (if (null? slist) '() (cons (subst-symbol-expr new old (first slist)) (subst new old (rest slist)))))) (define subst-symbol-expr (lambda (new old symexp) (if (symbol? symexp) (if (eq? symexp old) new symexp) (subst new old symexp))))

We can substitute the definition of `subst-symbol-expr`
into `subst`, using the standard rules from the
substitution model. This is exactly what the Racket
interpreter will do at run-time. First, we substitute the
`lambda` in place of the name:

(define subst (lambda (new old slist) (if (null? slist) '() (cons ( (lambda (new old symexp) ;; (if (symbol? symexp) ;; Here (if (eq? symexp old) ;; is new ;; the symexp) ;; first (subst new old se))) ;; substitution. new old (first slist)) (subst new old (rest slist))))))

Next, we replace the application of the `lambda` with
the body of the `lambda`, substituting the arguments
for the corresponding formal parameters: `new` for
`new`, `old` for `old`, and
`(first slist)` for `symexp`:

(define subst (lambda (new old slist) (if (null? slist) '() (cons (if (symbol? (first slist)) ;; (if (eq? (first slist) old) ;; Here is new ;; the second (first slist)) ;; substitution. (subst new old (first slist))) ;; (subst new old (rest slist))))))

The result is a single function that behaves exactly like the two original functions. After all, all we did was to derive by hand the same result that the Racket evaluator will produce. So, provided that we made no errors in our derivation, the resulting function has the same functionality. However, the new version is more efficient, because it eliminates the extra function calls. We hope that it is nearly as readable as the two-function version.

Take a closer look. The derived function is * not
like*
the single-function solution we wrote earlier.
That function repeated the expression

We can do this in Racket because the `if` construct is
an expression that returns a value, not a statement. In many
languages, `if` is a statement and returns no value. A
few, including Java and C++, have a "computed if" expression
that may let us do something like this. In Java, a "computed
if" is written as

<test> ? <then-value> : <else-value>

C++ has a concept that is similar to program derivation, the
**in-lining** of member functions. The difference, though,
is that its is implemented by the compiler. When we declare
a class member function `inline`, the compiler tries
to replace all calls to the function with equivalent code
from the body of the function.

For example, we may well use an accessor method `x()`
frequently when interacting with an object that has an
x-coordinate. By declaring the `x()` method as
`inline`, the compiler will replace the method call with
the equivalent code from the body of the function.

This enables the programmer to eliminate the overhead of extra
function calls *at run time*, without obscuring the
readability and design of our class. Program derivation works
like inlining, but it is a technique used by *programmers*
to modify their code. (I can certainly imagine having a
Racket compiler implementing program derivation automatically,
thus saving the programmer the effort and risk of error!)

We will use the program derivation technique occasionally to simplify the result of mutual recursion, and any other technique that introduces unwanted function calls that create undesirable inefficiency at run-time -- but only when the cost of the extra function calls outweighs the benefits of separate functions.

Use program derivation to eliminate the
`count-occurrences-symbol-expr` function. Do you like
the result?

- Reading -- Review the lecture notes. Try to write the code
again from scratch. Review the code. Pay special attention
to the section named
Increasing Efficiency Through Program Derivation,
which we will cover next time.

Read through Chapter 5 of*The Little Schemer*. - Homework 4 is available and is due next session. Homework 5 will be available then.