### Session 9 Writing Recursive Programs

#### Opening Exercise

Suppose that I have a list of course enrollment numbers and need to know if any of the sections is too big for the classrooms available to us.

After looking at the box-and-pointer diagram of an actual list and thinking through the task in terms of a list as pairs... map a function over the list to get enrollment numbers.

Write a function `(any-bigger-than? n lon)`, where `n` is a number and `lon` is a list of numbers. `any-bigger-than?` returns true if any number in `lon` is bigger than `n`, and false otherwise.

For example:

```> (any-bigger-than? 2 '(2 1 3))
#t
> (any-bigger-than? 40 '(26 37 41 25 12))
#t
> (any-bigger-than? 40 '(26 37 14 25 12))
#f
```

If you feel stumped: How would you solve this problem in Python?

How would you approach this task in a loopy language? You might write a loop that compares `n` to each successive element in `lon`. As soon as you find a larger list item you can return true. How do you know there are no numbers in `lon` larger than `n`? Only by examining every item in `lon` and never finding one.

```for i in lon:
if i > n:
return true
return false
```

This is the same kind of thinking you need to write a recursive function in Racket. We don't need a loop, because we can walk down the list recursively. If your curiosity gets the best of you, peek ahead to a solution...

Why not use MapReduce to solve this problem? We don't need to process the entire list! As soon as we find one value that meets the criterion, the function can return its answer. Continuing down the rest of the list is unnecessary.

Don't feel bad if this problem seems like a big challenge at this point. Most things are difficult when we lack the knowledge we need to solve them. Sometimes, we have the knowledge but don't have a clear plan for which knowledge to use when, or why.

Over the next few sessions, we will learn some techniques that help us think about problems and recursive solutions in a new way. These techniques will be quite useful when we move on to processing languages.

#### A Quick Tour of Recursive Programs

This section reviews your reading assignment for today. If you haven't read it yet, please do! It contains more detail than this summary.

To recur is to go back. Recursion is a technique for writing programs that goes back to the same function for help.

Loops go back, too. What is different about functions?

As you saw in the "99 Bottles of Beer on the Wall" example, iteration and recursion are pretty similar. We make the same decisions and solve the same sub-problems no matter which one we are using. The footnote on that example went even further: recursion is more powerful than a loop, because it gives us more convenient over the order in which we do things!

Be not afraid. If you can write a loop, then you can write recursive functions, too.

Recursive programs pair up nicely with inductive specifications:

• A recursive program returns an answer for one or more simple members of the set.
• It computes answers for more complex problems in terms of the answers to simpler problems. It can call itself to solve some of the simpler problems.

More formally, we will say that every recursive program consists of:

• one or more base cases that terminate computation in a pre-defined answer, and
• one or more recursive cases that compute solutions in terms of simpler problems.

The "limit" of the smaller problems is one of the base cases.

The base cases are often pretty straightforward, though we still have to think. They encode our understanding of the problem by recording the answer to the smallest problem(s) in the domain.

Each recursive case consists of three steps:

1. Split the data into smaller pieces.
2. Solve the pieces.
3. Combine the solutions for the parts into a single solution for the original argument.

This is usually where the descriptions of recursion end in our textbooks. "Okay," you might say, "great. But how do I do that?" The goal of the next few weeks is to help you feel this in your bones: Recursion doesn't have to be scary. Sometimes, it's all about the data.

In particular, there are rules that help us to think about each of these steps:

1. We split the data into sub-problems based on the type of the argument our function receives.
2. We will often define our data so that the "big" sub-problem is topologically similar to the original problem. We take advantage of this similarity by solving the "big" sub-problem with a recursive call.
3. We combine the solutions for the sub-parts based on the type of value returned by the function.

The driver for this process is the the type of the arguments that our function processes. This is where inductive specifications can help.

#### Writing Recursive Programs for Inductively-Specified Data

In our last session, we saw that we can use inductive definitions to specify data types. An inductive definition is one that:

• lists one or more specific members of the type, and
• describes how to construct more complex members from simpler ones.

Inductive specifications have essentially the same structure as recursive programs. For this reason, inductive data specs — especially ones formalized in a BNF description — can serve as a powerful guide for writing recursive programs that operate on the data.

In fact, this guidance is so useful that I offer you a Little Schemer-style commandment based on it:

When defining a program to process an inductively-defined data type,
the structure of the program should follow
the structure of the data.

To see how this works, let's write a function `list-length` that takes a list of numbers as an argument and returns the length of the list.

You may recall the definition for the data type called <list-of-numbers> from last session:

```<list-of-numbers> ::= ()
| (<number> . <list-of-numbers>)
```

This BNF definition can serve as a pattern for defining any program that operates on lists of numbers. A function that operates on a `<list-of-numbers>` will receive one of two things as an argument:

• the empty list, `()`, or
• a pair whose `car` is a `<number>` and whose `cdr` is a `<list-of-numbers>`.

According to the data definition, these are the only possibilities! There are no other cases to worry about.

Consider an example list, `(26 37 41 25 12)`. From the perspective of our function, this is the pair `(26 . <list-of-numbers>)`. You may feel that "hiding" the rest of the numbers will somehow make our job harder, but in fact it is the source of our power! We can now focus on solving a single task: combining the results.

The definition of a `<list-of-numbers>` consists of a choice. A function that operates on a `<list-of-numbers>` will have to make the same choice: is the argument an empty list or a pair? For lists, we use `null?` to make this choice. This boolean condition serves as the selector in an `if` or `cond` expression that defines actions to take for each arm.

We can start writing `(list-length lon)` with the pattern for a function that we have used many times before:

```(define list-length
(lambda (lon)
...
))
```

Following the rule above, our program's structure should mimic the structure of the BNF specification for the data type. A list of numbers is either an empty list or a pair. So, we start with the code for a choice:

```(define list-length
(lambda (lon)
(if (null? lon)
;; then: handle an empty list
;; else: handle a pair
)))
```

Now we can write code to handle the two cases in either order. Often, the base case has a simple answer, so we usually write this case first. How should our function act when the list is empty? The length of the empty list is 0, so return 0:

```(define list-length
(lambda (lon)
(if (null? lon)
0
;; else: handle a pair
)))
```

Now, we handle the second part of the specification. What if `lon` is not empty? The BNF for this element states that such a list of numbers consists of a number followed by a list of numbers:

```<list-of-numbers> ::= (<number> . <list-of-numbers>)
```

This tells us that we can decompose our problem into two subproblems:

• the length of the first part of the pair, and
• the length of the second part of the pair.

What is the length of `first`? What is the length of the `rest`? How do we combine these answers?

The `first` of the list is a number, not a list. It contributes one item to the length of the overall list.

The `rest` of the list is the rest of the list. (Ha!) It, too, is a `<list-of-numbers>` — the same data type as the argument to `list-length`. How can we find its length? Call `list-length`!

How do we combine two numbers that are counting the parts of a whole list? We add them together.

So, the pair has a length of 1, for the number in the cons cell we are processing, plus the result of `(list-length (rest lon))`:

```(define list-length
(lambda (lon)
(if (null? lon)
0
(+ 1 (list-length (rest lon))) )))    ; can use add1
```

Let's try it out:

```> (list-length '())
0
> (list-length '(42))
1
> (list-length '(1 10 100 1000 10000 100000 2 4 6 8))
10
```

And our definition is complete!

Another way to think about the recursive case is this: Split the list into its `first` and its `rest`, which is also a `<list-of-numbers>`. Assume that we already know the answer for the `rest`. How can we solve the `first`, and how do we assemble the two answers into our final answer? The recursive call is our assumption.

As we take successive `rest`s of the list, we will eventually encounter the empty list, which is our base case. But we don't have to think about that now. We received either an empty list or a pair.

Notice: We do not add an explicit guard to our code so that we don't try to take the `rest` of a non-list. Our code cannot make this error! The function takes the `rest` of its argument only after it knows the argument is not the empty list. But then the only alternative is a pair, which always has a `cdr` that is a list.

We did assume that the original argument received by `list-length` is, in fact, a `<list-of-numbers>`. The specification for the function states as much. This precondition makes it the responsibility of the caller of the function to provide a suitable argument. If the caller doesn't, then our function is not responsible for the error. The same is true in a statically-typed language, though in that case we usually have the compiler to catch the error for us.

Optional Digression: Can we implement `list-length` without recursion, using only higher-order functions? We can! If you'd like, give it a try, and then check out this code file to see a solution, as well as a discussion of `list-member?` and `any-bigger-than?`.

#### A Solution for `any-bigger-than?`

We can use the same technique to implement `any-bigger-than?`, from our warm-up exercise. `any-bigger-than?` returns true if `n` is smaller than any member of `lon`, and false otherwise.

We now know to pattern our solution on the BNF definition of `<list-of-numbers>`. So:

```(define any-bigger-than?
(lambda (n lon)
(if (null? lon)
;; then: handle an empty list
;; else: handle a pair
)))
```

There are no numbers in an empty list, so we know that there are no numbers bigger than `n`. In the base case, then, we return false.

```(define any-bigger-than?
(lambda (n lon)
(if (null? lon)
#f
;; else: handle a pair
)))
```

In the recursive case, `n` is smaller than a member of `lon` either if it is smaller than the `first` or if it is smaller than a member of the `rest`. The `first` of the list is a number, so we can check to see if `n` is less than one directly. The `rest` of the list is a list of numbers, so we let our function solve that case for us.

Racket provides us with built-in functions for expressing both the comparison, `<`, and the disjunction, `or`, so:

```(define any-bigger-than?
(lambda (n lon)
(if (null? lon)
#f
(or (< n (first lon))
(any-bigger-than? n (rest lon))))))
```

If you wrote a complete solution to the exercise, it may have different from this: you may have used another `if` in the recursive case. The version here is more faithful to the BNF for our data type specification and to how we define the answer, so most functional programmers prefer it. However, the two solutions compute the same value. The most important thing is that you develop a habit for writing recursive functions by thinking in this way.

Challenge Problem: Can we we eliminate the remaining `if` expression, too?

When you are first writing functions of this type, you may well feel uncomfortable trusting that your solution works in the recursive case, because that means relying on the function that you are writing. The only way to overcome this discomfort is to get lots of practice writing recursive functions. This will create a comfort level that the techniques you are using really do work. Of course, we will also do thorough testing of our functions. Trust only goes so far.

#### Manipulating Lists of Symbols

In order for us to gain strength as recursive programmers, let's practice on some less intuitive problems. That will force us to think more about both the problem and the solution.

These problems are important for two reasons. First, we will use the functions we write later in the course and in future homework assignments. But if that were the only reason they were important, we would need to understand only what they do, but not how they do it.

The second reason that they are important, though, is that they illustrate several common patterns in recursive programs and how to implement them. So it will be worth our effort to study in detail how they do what they do.

The rest of our examples today operate on values of the `<list-of-symbols>` data type. As its name suggests, `<list-of-symbols>` is quite similar to `<list-of-numbers>`. We can specify this data type inductively as:

```<list-of-symbols> ::= ()
| (<symbol> . <list-of-symbols>)
```

#### The `remove-upto` Function

`remove-upto` takes two arguments, a symbol `s` and a list of symbols `los`. It returns a list just like `los`, minus all the symbolls before the first occurrence of `s`. For example:

```> (remove-upto 'b '(a b c))
(b c)

> (remove-upto 'a '(b d f g a a a a a a))
(a a a a a a)
```

Note that `remove-upto` does not modify the original `los`. In functional programming, our functions almost never modify their arguments; instead, they compute a new value for us.

```(define remove-upto
(lambda (s los)
(if (null? los)
; then: handle an empty list
; else: handle a pair
)))
```

In the base case, `los` is empty, so there are no symbols to remove. So, we return `()`.

```(define remove-upto
(lambda (s los)
(if (null? los)
'()
; else: handle a pair
)))
```

What if `los` is not empty? Then it is a pair of the form `(<symbol> . <list-of-symbols>)`. Our answer depends on `(first los)`. If it is `s`, then we our answer is `los`. If not, then we drop it and all of the symbols up to `s` in `(rest los)`.

We can write that choice with another `if` expression:

```(define remove-upto
(lambda (s los)
(if (null? los)
'()
(if (eq? (first los) s)
los
; else remove up to s from the rest of los
)
)))
```

How can we remove all of the symbols up to the first `s` in `(rest los)`? The rest of the list is a list of symbols, so `remove-upto` can do that for us!

```(define remove-upto
(lambda (s los)
(if (null? los)
'()
(if (eq? (first los) s)
los
(remove-upto s (rest los))))))
```
... Show examples of removing up to `b` from `(a b c d)` and `(e d c b)` and `(c d e)` ... using the definition!

We are done! Let's test our function:

```> (remove-upto 'a '(a b c))
'(a b c)
> (remove-upto 'b '(a b c))
'(b c)
> (remove-upto 'd '(a b c))
'()
> (remove-upto 'a '())
'()
> (remove-upto 'a '(b d f g a a a a a a))
(a a a a a a)
```

Our understanding of the <list-of-symbols> data structure — and especially of its BNF description — guided us well in writing this function. We still have to think, of course. But the structure helps know what to think about.

#### The `remove-first` Function

Now let's try a similar but slightly trickier problem.

`remove-first` takes two arguments, a symbol `s` and a list of symbols `los`. It returns a list just like `los` minus only the first occurrence of `s`. For example:

```> (remove-first 'b '(a b c))
(a c)
```

Note again that `remove-first` does not modify the original `los`. It will build a new list.

```(define remove-first
(lambda (s los)
(if (null? los)
; then: handle an empty list
; else: handle a pair
)))
```

In the base case, `los` is empty, so there is no first occurrence of `s` to remove. So, return `()`.

```(define remove-first
(lambda (s los)
(if (null? los)
'()
; else: handle a pair
)))
```

What if `los` is not empty? Then we need to remove the first occurrence of `s` from a pair, if there is one. There are two cases. Either the first element in `los` is the symbol we want to remove, or it is not.

```(define remove-first
(lambda (s los)
(if (null? los)
'()
(if (eq? (first los) s)
; then remove s from the first of los
; else remove s from the rest of los
)
)))
```

If the `s` is the first element in `los`, what is the answer returned by `remove-first`? The rest of the list:

```(define remove-first
(lambda (s los)
(if (null? los)
'()
(if (eq? (first los) s)
(rest los)
; else remove s from the rest of los
)
)))
```

Now comes the tough case... If the first element of `los` is not the symbol we want to remove, then we need to remove the first occurrence of that symbol from the rest of the list. What is the answer to be returned by `remove-first` in this case? We need a list whose first item is the `first` of `los` and whose `rest` is the list we get by removing `s` from the rest of `los`:

... Show examples of removing `b` from `(a b c d)` and `(e d c b)` and `(c d e)` ...
... Draw pictures of lists that show the result is making a list from a head element and a tail list ... `cons`!

We reassemble a list from a `first` and a `rest` using `cons`. With which list do we `cons` the `first` item of `los`? The result of removing the first occurrence of `s` from the `rest` of `los` — which `remove-first` can compute for us!

```(define remove-first
(lambda (s los)
(if (null? los)
'()
(if (eq? (first los) s)
(rest los)
(cons (first los)
(remove-first s (rest los)))))))
```

And we are done! Let's test our function:

```> (remove-first 'a '(a b c))
'(b c)

> (remove-first 'b '(a b c))
'(a c)

> (remove-first 'd '(a b c))
'(a b c)

> (remove-first 'a '())
'()

> (remove-first 'a '(a a a a a a a a a a))     ; count 'em up!
'(a a a a a a a a a)
```

Once again, our understanding of the structure of a list of symbols guided us well. Following the BNF description helps us know what to think about as we consider the specific details of the task.

For a little more practice, check out the `remove` function.