Session 7

Thinking Functionally:
Programming with Higher-Order Functions

CS 3540
Programming Languages and Paradigms

Opening Exercise

Last time, we had a file listing the masses of any number of modules, one per line:

    > (file->lines "modules.txt")
    '("12" "14" "1969" "100756")

... and wrote a function to compute the total amount of fuel needed to send all of the modules into space:

    (define total-fuel
      (lambda (filename)
        (apply +
          (map mass->fuel
            (map string->number
               (file->lines filename))))))

This solution walks the list of modules *four* times! If the database is large, this might be prohibitively expensive. We can't do anything about loading the file or adding up the answers just yet, but...

YOUR CHALLENGE: call map only once!

    > (define modules (file->lines "modules.txt"))
    > (map (lambda (module) ?????)
    '(2 2 654 33583)


A Solution

Starting with one call to map as our foundation, we can focus on the rest of the task. modules is a list of strings. map will pass our lambda expression one string at a time. Our lambda should return the amount of fuel needed.

Given a module string, we want to convert it to a number (mass) and ask mass->fuel to convert it to a fuel total. We can do those tasks together with a nested call:

    (map (lambda (module)
           (mass->fuel (string->number module)))

If we prefer, we can name the lambda...

    (define module->fuel
      (lambda (module)
        (mass->fuel (string->number module))))
and pass the named function to map:
    (map module->fuel modules)

That's not so bad, is it?

Now we can add them up to compute the total fuel needed:

    (apply + (map module->fuel modules))
or the minimum fuel needed:
    (apply min (map module->fuel modules))
or the average fuel needed:
    (apply average (map module->fuel modules))
or name it and use it later when we know what we want to do:
    (define fuel-needs
      (map module->fuel modules))

We can also put all the parts together to solve the original problem:

    (define total-fuel
      (lambda (filename)
        (apply +
          (map module->fuel
               (file->lines filename)))))

A sequence of maps can usually be collapsed into a single map with a more powerful helper function. Consider Problem 5 on Homework 3... after a quick review of the programming patterns.

Recap: Programming with Higher-Order Functions

For the last couple of sessions, we have been trying out a new way to write programs: asking functions to do more work for us. We found that apply and map will do a lot of work for us, if only we supply them with a helper function.

The purpose of map is to process all the items in a list in the same way:

the map pattern

The purpose of apply is to combine all the items in a list into a single value:

the apply pattern

Together, they form a handy design pattern for solving a certain class of problems:

    (apply reducer
      (map item-function

Why is programming this was such a challenge for you? You are used to thinking about problems in a different way. When you encounter a problem to solve, you start thinking about it -- breaking it down into parts, solving the parts, putting the parts back together -- in a particular way. These are habits you have learned and practiced for at least a couple of semesters.

Changing your mindset to a functional approach requires you to establish new habits and to break old ones. Creating new habits is a challenge, even when we want to change.

I know that many of you are not asking to change habits, or develop a new programming style. But it will make you a better programmer, and it will prepare you for something that is happening in industry right now. Give it a try, and you will be surprised.

One thing we can do when we are trying to break old habits and create new ones is to watch for the triggers that cause us to fall back into an old habit and have a plan for what to do instead. Let's see if we can identify triggers for some common procedural habits and match them up with alternative courses of action in functional programming.

An Exercise to Set the Stage

Suppose that we had the data for the total error problem in a Python list:

     >>> pairs = [ [2, -7], [-4, -20], [ 7, 8], [-13, 2],\
                   [6, -5], [-10, -1], [-2, 4], [  7, 2] ]

Write Python code to solve the problem:

Write a function named total-error that takes one argument, a list of this form. The function returns the total of all the differences in the list.
Python has an abs function, too. But this is Python, so you can use assignment statements and a loop!

If you don't know Python, imagine that your data is in a language you do know -- a Java arraylist, a C array, ... -- and write your code in that language.

A Python Solution

Unshackled from the chains of Racket and functional programming, we might produce code that looks like this:

    total_error = 0
    for p in games:
        difference = abs(p[0]-p[1])
        total_error += difference
    return total_error

That's a pretty typical way for a procedural programmer to tackle the problem in many different languages. Notice that we don't have to access any individual number in the lists of lists, because we have a loop that treats the main list as a list of games; within that loop, we access the elements of a single game.

Now let's try to diagnose how we wrote the code and how we might think differently.

Growing a Solution

Let's look closer at our loop:

    total_error = 0
    for p in games:
        difference = abs(p[0]-p[1])    # 1. do something with p
        total_error += difference      # 2. accumulate result from #1

This loop does two kinds of thing: process the elements of the list and process those results. In functional programming, we usually try to separate different tasks into different pieces of code. Each function should have one responsibility.

Trigger: A loop does different kinds of action.
Action: Decompose the loop into separate loops, each with a single responsibility.

Let's start with the do something with p part of the loop. Instead of processing the results immediately, we can save them to be processed later.

    results = []
    for p in games:
        difference = abs(p[0]-p[1])    # 1. do something with p
        results.append(difference)     # 2. record result from #1

In functional programming, we like to let functions help us solve problems. Let's factor the do something with p action out into its own function:

    def error_for(two_list):
        return abs(two_list[0]-two_list[1])
... and then use the new function in our solution:
    results = []
    for p in games:
        difference = error_for(p)      # 1. do something with p -- in a fxn
        results.append(difference)     # 2. record result from #1
Trigger: A loop that does something with every item in a collection.
Action: Map a function over the list.

map implements the entire loop. We just have to give it the error_for() function to apply to each item:

    results = map(error_for, games)

Yes, that is Python! The Python map function produces a "map object" that we can loop over, not a list, but the idea is similar.

We have made progress toward our solution:

    games   = [ [2, -7], [-4, -20], [ 7, 8], [-13, 2],\
                [6, -5], [-10, -1], [-2, 4], [  7, 2] ]

    results = map(error_for, games)
    # [9, 16, 1, 15, 11, 9, 6, 5]

Now, let's implement the second part of our original loop: add up the results:

    total_error = 0
    for r in results:
        total_error += r               # 1. accumulate sum from item r

Trigger: A loop that combines the value for every item into a single answer.
Action: Use a reducing function.

Python doesn't have one simple way of reducing a map object, but it does have a sum function that operates on a list. So we can replace the entire total_error loop with:

    total_error = sum(list(results))

We can even get rid of the temporary variable results by substituting the expression that computes it in place of the name:

    total_error = sum(list(map(error_for, games)))

In Racket, we have been using apply when we reduce lists. apply implements the entire loop. We just have to give it a reducing function, such as + or average.

(That last line of code is not the way we would solve this problem in Python. The Pythonic way is to use a list comprehension.)

Growing a Solution in Racket

Now that we know the triggers, we can think about implementing our solution functionally in Racket. First, let's port our data back to a Racket list...

    (define games
      '((2 -7) (-4 -20) (7 8) (-13 2) (6 -5) (-10 -1) (-2 4) (7 2)))
... and our error_for() function to Racket...
    (define error_for
      (lambda (two_list)
        (abs (- (first two_list)
                (second two_list)))))

Now we can map error_for() over the list...

    (define results (map error_for games))
... use apply to total up our results...
    (define total_error (apply + results))

Of course, we can do this without a temporary variable in Racket, too:

    (apply + (map error_for games))

And that is the body of the function we need to write:

    (define total-error
      (lambda (games)
        (apply + (map error_for games))))

The apply can be any reducing function. Operators such as + and min are built-in functions that apply can use to combine values. average is a custom function we wrote for apply to use when combining values.

A Style of Programming

From the last few sessions and Homework 3, we have been using a common programming pattern: map a function over a list, then apply a reducer to turn map's result into a single answer.

    (apply reducer
      (map item-function

Our solution to Session 6's opening exercise does that:

    (apply string
           (map first-char
It processes a list of strings to create a list of characters and then reduces that list into a single string. Solutions to Problems 3 through 5 on the homework do something similar.

There are plenty of slight variations on this pattern. The two most common choices we face are:

But there are others. Problem 5 required that we pre-process the list by dropping a header row with rest. On that problem, some of you found it convenient to do multiple map steps rather than write a more complex item function.

We are not limited by the pattern. It simply gives us a way to think about a problem and to structure our solution.

On first exposure, you might imagine that you'll never use functions such as map and apply after you finish this course, but you might be wrong... In order to do distributed computing on large data sets across clusters of computers, programmers at Google developed a technique called MapReduce. The "map" in MapReduce is essentially the same map we learned about last session. The "reduce" is a general name for the idea of combining a set of partial results into a single final answer.   apply applies a reducer.

O(n) and parallelism.

Many of the functions we have been writing implement a simple form of MapReduce, using Racket's primitive functions. Next week, we will begin to learn techniques for writing other kinds of mappers and reducers.

MapReduce is now available as open-source software in packages such as Hadoop, which many people use to process large data sets.

... a visit to The Principal.

Status Check

Let's write another map-reduce function. Suppose that we have lists of strings of this sort:

    (define names
      '("Johnny" "christine" "FRANK" "Juliette" "JOANNA" "eugene"))

Your task:

Write a Racket function average-length that returns the average length of the strings in a list of strings.

For example:

    > (average-length '("hi" "lois"))

    > (average-length names)
    6 2/3
Racket has a primitive function named string-length that returns the length of a string.

Here's one possible solution.

Another Pattern: Filtering a List

Here's our game prediction data again:

    (define games
      '((2 -7) (-4 -20) (7 8) (-13 2) (6 -5) (-10 -1) (-2 4) (7 2)))

Let's solve a different kind of problem:

Find the games in which the home team was picked to win.

How might you solve this in Python? Here's an imperative solution:

    results = []
    for p in games:
        if p[0] > 0:                   # 1. if p meets a condition
           results.append(p)           # 2. record it in our result

In a functional style, we would move the operation into a function:

    def home_team_expected(two_list):
        return two_list[0] > 0

And use the function:

    results = []
    for p in games:
        if home_team_expected(p):      # 1. if p meets a condition, in a fxn
           results.append(p)           # 2. record it in our result

This looks a lot like the map steps in all of our previous solutions, but with an important twist: We don't compute an item to put in the list... We decide whether we want to put the original item in our result!

Trigger: A loop with an if finding values that meet a condition.
Action: use a filter.

filter is a function like map: it implements an entire loop. Instead of applying a function to every item and returning a list of results, it returns only the items that "pass the test" posed by its function argument.

To implement our solution, we can call filter and supply the text function, home_team_expected():

    results = filter(home_team_expected, pairs)

As with map, Python's filter produces a "filter object" that we can loop over.

We can do all of this directly in Racket, too:

    (define home-team-expected
      (lambda (two-list)
        (positive? (first two-list))))

    (filter home-team-expected

As expected, Racket's filter returns a list.

We can also use the resulting list to compute other results, such as the total error in games the home team was expected to win -- by passing it to total-error!

Putting It All Together

map, filter, and apply are useful separately, but their real power comes when we use them together.

Recall our list of strings:

    (define names
      '("Johnny" "christine" "FRANK" "juliette" "Joanna" "eugene"))

Our task:

Write a Racket function total-starting-with, which returns the total number of characters in the names that start with a given letter.

For example:

    > (total-starting-with "j" names)
    > (total-starting-with "e" names)
    > (total-starting-with "a" names)

When we process strings in this way, we usually have to convert the strings to a canonical form before processing. We can do that here with the function string-downcase, which lowercases a single string argument.

Another useful string function for this task is (string-prefix? s char), which returns true if s starts with char, and false otherwise.

If you need any other primitive function, ask!

Evolving a Solution

    ; (map string-downcase names)
    ; (filter (lambda (s)
    ;           (string-prefix? s "j"))
    ;         (map string-downcase names))
    ; (map string-length
    ;      (filter (lambda (s)
    ;                (string-prefix? s "j"))
    ;              (map string-downcase names)))
    ; (apply + (map string-length
    ;               (filter (lambda (s)
    ;                         (string-prefix? s "j"))
    ;                       (map string-downcase names))))

    (define total-starting-with
      (lambda (char list-of-strings)
        (apply + (map string-length
                      (filter (lambda (s)
                                (string-prefix? s char))
                              (map string-downcase list-of-strings))))))

Thinking Functionally

The patterns of data in our solutions look something like this:

    MAP      from  ((a b ...) (c d ...) (e f ...) ...)
               to  (   d1        d2        d3     ...)

    FILTER   from  (d1 d2 d3 d4 d5 d6 ...)
               to  (d1    d3       d6 ...)

    APPLY    from  (d1 d2 d3  ...)
               to  n

You can create new habits, with attention and practice. Take baby steps. Use the REPL to help you build code you trust.

Wrap Up

Eugene Wallingford ..... ..... February 7, 2023