Session 7

Thinking Functionally:
Programming with Higher-Order Functions


CS 3540
Programming Languages and Paradigms


Opening Exercise

Last time, we had a list of patients:

    (define patients
      '( (aa (date 1960  1  2) 66 144 DATA-aa)
         (bx (date 1997  8 11) 66 124 DATA-bx)
         ; ...
         (zy (date 1979 11  8) 66 171 DATA-zy) ))

... and saw a way to compute the Body Mass Index for every patient:

    (map body-mass-index
         (map third patients)
         (map fourth patients))

But this solution walks the patient database *three* times! If the database is large, this might be expensive.

YOUR CHALLENGE: call map only once!

    (map (lambda (patient) ?????)
         patients)

Hint:

What do the values passed to your lambda expression look like?
How can we use the structure of that list to express an answer?



A Solution

Starting with one call to map as our foundation, we can focus on the rest of the task. patients is a list of patient records. map will pass to our lambda expression one patient record at a time.

Each patient record looks like this:

    (bx (date 1997  8 11) 66 124 DATA-bx)
     -- ----------------- -- --- -------
     ID     a date        ht  wt ...more

Given a patient, we want to pass the third and fourth items to body-mass-index. We are ready to fill in the blank!

    (map (lambda (patient)
           (body-mass-index (third patient)
                            (fourth patient)))
         patients)

If we prefer, we can name the lambda...

    (define bmi-for-patient
      (lambda (patient)
        (body-mass-index (third patient) (fourth patient))))
and pass the named function to map:
    (map bmi-for-patient patients)

That's not so bad, is it?

Of course, we can use this expression to solve other problems, such as finding the average BMI:

    (apply average
           (map bmi-for-patient patients))
or the maximum:
    (apply max
           (map bmi-for-patient patients))
or even name it and use it later:
    (define patient-bmis
      (map bmi-for-patient patients))

We won't do that a lot this semester, but it's a useful part of larger programs. (Our interpreter at the end of the semester is large enough that we will compute and name some data objects.)



Programming with Higher-Order Functions

For the last couple of sessions, we have been trying out a new way to write programs: asking functions to do more work for us. We found that apply and map will do a lot of work for us, if only we supply them with a helper function.

Why is this such a challenge? You are used to thinking about problems in a different way. When you encounter a problem to solve, you start thinking about it -- breaking it down into parts, solving the parts, putting the parts back together -- in a particular way. These are habits you have learned and practiced for at least a couple of semesters.

Changing your mindset to a functional approach requires you to establish new habits and to break old ones. Creating new habits is a challenge, even when we want to change.

I know that many of you are not asking to change habits, or develop a new programming style. But it will make you a better programmer, and it will prepare you for something that is happening in industry right now. Give it a try, and you will be surprised.

One thing we can do when we are trying to break old habits and create new ones is to watch for the triggers that cause us to fall back into an old habit and have a plan for what to do instead. Let's see if we can identify triggers for some common procedural habits and match them up with alternative courses of action in functional programming.



An Exercise to Set the Stage

Suppose that we had the data for the total error problem in a Python list:

     >>> pairs = [ [2, -7], [-4, -20], [ 7, 8], [-13, 2],\
                   [6, -5], [-10, -1], [-2, 4], [  7, 2] ]
     >>>

Write Python code to solve the problem:

Write a function named total-error that takes one argument, a list of this form. The function returns the total of all the differences in the list.
Python has an abs function, too. But this is Python, so you can use assignment statements and a loop!

If you don't know Python, imagine that your data is in a language you do know -- a Java arraylist, a C array, ... -- and write your code in that language.



A Python Solution

Unshackled from the chains of Racket and functional programming, we might produce code that looks like this:

    total_error = 0
    for p in pairs:
        difference = abs(p[0]-p[1])
        total_error += difference
    return total_error

That's a pretty typical way for a procedural programmer to tackle the problem in many different languages. Now let's try to diagnose how we wrote the code and how we might think differently.



Growing a Solution

Let's look closer at our loop:

    total_error = 0
    for p in pairs:
        difference = abs(p[0]-p[1])    # 1. do something with p
        total_error += difference      # 2. do something with result from #1

This loop does two kinds of thing: process the elements of the list and process those results. In functional programming, we generally try to tease different tasks apart into different code. Each function should have one responsibility.

Trigger: A loop does different kinds of action.
Action: Decompose the loop into separate loops, each with a single responsibility.

Let's start with the do something with result part of the loop. Instead of processing the results immediately, we can save them to be processed later.

    results = []
    for p in pairs:
        difference = abs(p[0]-p[1])    # 1. do something with p
        results.append(difference)     # 2. record result from #1

In functional programming, we like to let functions help us solve problems. Let's factor the do something with p action out into its own function:

    def error_for(two_list):
        return abs(two_list[0]-two_list[1])
... and then use the new function in our solution:
    results = []
    for p in pairs:
        difference = error_for(p)      # 1. do something with p -- in a fxn
        results.append(difference)     # 2. record result from #1

Trigger: A loop that does something with every item in a collection.
Action: Map a function over the list.

map implements the entire loop. We just have to give it the error_for() function to apply to each item:

    results = map(error_for, pairs)

Yes, that is Python! The Python map function produces a "map object" that we can loop over, not a list, but the idea is similar.

We have made progress toward our solution:

    pairs   = [ [2, -7], [-4, -20], [ 7, 8], [-13, 2],\
                [6, -5], [-10, -1], [-2, 4], [  7, 2] ]

    results = map(error_for, pairs)
    # [9, 16, 1, 15, 11, 9, 6, 5]

Now, let's implement the second part of our original loop: add up the results:

    total_error = 0
    for r in results:
        total_error += r               # 1. accumulate sum from item r

Trigger: A loop that combines the value for every item into a single answer.
Action: Use a reducing function.

Python doesn't have a single, simple way of reducing a map object, but it does have a sum function that operates on a list. So we can replace the entire total_error loop with:

    total_error = sum(list(results))

We can even get rid of the temporary variable results by substituting the expression that computes it in place of the name:

    total_error = sum(list(map(error_for, pairs)))

In Racket, we have been using apply to reduce lists. apply implements the entire loop. We just have to give it a reducing function, such as + or average.



Growing a Solution in Racket

Now that we know the triggers, we can think about implementing our solution functionally in Racket. First, let's port our data back to a Racket list...

    (define pairs
      '((2 -7) (-4 -20) (7 8) (-13 2) (6 -5) (-10 -1) (-2 4) (7 2)))
... and our error_for() function to Racket...
    (define error_for
      (lambda (two_list)
        (abs (- (first two_list)
                (second two_list)))))

Now we can map error_for() over the list...

    (define results (map error_for pairs))
... use apply to total up our results...
    (define total_error (apply + results))

Of course, we can do this without a temporary variable in Racket, too:

    (apply + (map error_for pairs))

And that is the body of the function we need to write:

    (define total-error
      (lambda (list_of_pairs)
        (apply + (map error_for list_of_pairs))))

The apply can be a simple reducer. Functions like + and average are operators that apply can use to combine values.



A Style of Programming

From the last few sessions and Homework 3, you may have noticed a common programming pattern: map a function over a list, then apply a reducer to turn map's result into a single answer. Our solution to Session 6's opening exercise does that:

    (apply string
           (map first-char
                list-of-strings))
It processes a list of strings to create a list of characters and then reduces that list into a single string. Solutions to Problems 3 through 5 on the homework do something similar.

On first exposure, you might imagine that you'll never use functions such as map and apply after you finish this course, but you might be wrong... In order to do distributed computing on large data sets across clusters of computers, programmers at Google developed a technique called MapReduce. The "map" in MapReduce is essentially the same map we learned about last session. The "reduce" is a general name for the idea of combining a set of partial results into a single final answer.   apply is a reducer.

O(n) and parallelism.

Many of the functions we have been writing implement a simple form of MapReduce, using Racket's primitive functions. Next week, we will begin to learn techniques for writing other kinds of mappers and reducers.

MapReduce is now available as open-source software in packages such as Hadoop, which many people use to process large data sets.

The Principal.



Status Check

Let's write another map-reduce function. Suppose that we have lists of strings of this sort:

    (define names
      '("Johnny" "christine" "FRANK" "juliette" "Joanna" "eugene"))

Your task:

Write a Racket function average-length that returns the average length of the strings in a list of strings.

For example:

    > (average-length '("hi" "lois"))
    3

    > (average-length names)
    6 2/3

Here's one possible solution.



Another Pattern: Filtering a List

Introduce a new problem: find the games in which the home team was picked to win.

     >>> pairs = [ [2, -7], [-4, -20], [ 7, 8], [-13, 2],\
                   [6, -5], [-10, -1], [-2, 4], [  7, 2] ]
     >>>

Work through Python evolution: loop-and-if trigger.

An initial imperative solution:

    results = []
    for p in pairs:
        if p[0] > 0:                   # 1. if p meets a condition
           results.append(p)           # 2. record it in our result

As before, move the operation into a function:

    def home_team_expected(two_list):
        return two_list[0] > 0

And use the function:

    results = []
    for p in pairs:
        if home_team_expected(p):      # 1. if p meets a condition, in a fxn
           results.append(p)           # 2. record it in our result

Trigger: A loop with a bare if, selecting pairs that meet a condition.
Action: use a filter.

filter is a function like map: it implements an entire loop. Instead of applying a function to every item and returning a list of results, it returns only the items that "pass the test" posed by its function argument.

To implement our solution, we can call filter and supply the text function, home_team_expected():

    results = filter(home_team_expected, pairs)

As with map, Python's filter produces a "filter object" that we can loop over.

We can do all of this directly in Racket, too:

    (define home_team_expected
      (lambda (two-list)
        (positive? (first two-list))))

    (filter home_team_expected predictions)

As expected, Racket's filter returns a list.



Putting It All Together

map, filter, and apply are useful separately, but their real power comes when we use them together.

Recall our list of strings:

    (define names
      '("Johnny" "christine" "FRANK" "juliette" "Joanna" "eugene"))

Your task:

Write a Racket function total-starting-with, which returns the total number of characters in the names that start with a given letter.

For example:

    > (total-starting-with "j" names)
    20
    > (total-starting-with "e" names)
    6
    > (total-starting-with "a" names)
    0

Convert all the strings to a canonical form (lowercase) before processing.

If you need a primitive function, ask!



Evolving a Solution

    ; (map string-downcase names)
    ;
    ; (filter (lambda (s)
    ;           (string-prefix? s "j"))
    ;         (map string-downcase names))
    ;
    ; (map string-length
    ;      (filter (lambda (s)
    ;                (string-prefix? s "j"))
    ;              (map string-downcase names)))
    ;
    ; (apply + (map string-length
    ;               (filter (lambda (s)
    ;                         (string-prefix? s "j"))
    ;                       (map string-downcase names))))

    (define length-of-names-starting-with
      (lambda (char list-of-strings)
        (apply + (map string-length
                      (filter (lambda (s)
                                (string-prefix? s char))
                              (map string-downcase list-of-strings))))))


Thinking Functionally

The patterns of data in our solutions look something like this:

    MAP      from  ((a b ...) (c d ...) (e f ...) ...)
               to  (   d1        d2        d3     ...)

    FILTER   from  (d1 d2 d3 d4 d5 d6 ...)
               to  (d1    d3       d6 ...)

    APPLY    from  (d1 d2 d3  ...)
               to  n

You can create new habits, with attention and practice. Take baby steps. Use the REPL to help you build code you trust.



Wrap Up



Eugene Wallingford ..... wallingf@cs.uni.edu ..... February 7, 2019