CS 1510 Session 28

Session 28

Dictionaries, Design, and Parameters

CS 1510
Introduction to Computing

Opening Exercise

Back in Session 20, we built a function called multi_find(), which we used on Homework 9. Let's refresh our memory on how it works.

    def multi_find(source, target, start, end):
        result = ''
        pos = start

        while pos < end:
            next_pos = source.find(target, pos, end)
            # 1
            if next_pos == -1:
                break
            result += (str(next_pos) + ',')
            # 2
            pos = next_pos + 1

        if result == '':
            return result
        return result[:-1]

Trace the code for the call

    multi_find('abcdabccabacdeacbe', 'ab', 2, 12)

Write down the value of pos and next_pos every time you reach #1.
Write down the value of result every time you reach #2.

~~~~

Let's run the code and find out... Now we can see that find() lets the code speed through the source string, focusing on the matches. Try changing the 12 to 100!

Several students have told me "I don't really understand what this function does..." What can you do when you find yourself in this position?

First, trace it by hand for actual inputs. Live inside of it while it runs. This can help you get a feel for what happens each pass through a loop, for example.

Next, insert print statements at key locations and run the code on more, perhaps larger, test cases. This can help you answer unresolved questions from your trace.

Finally, give the function a better name. If multi_find() doesn't put a clear image of what the function does in your head, what does? Perhaps find_all_occurrences_of_a_substring_in_a_range() does? That's probably too long, but it would make a great method comment.

All programmers occasionally run into code that baffles them. We all use techniques like this to get out of the dark. You, too, can be the source of your own enlightenment.

Making `multi_find()` More Pythonic

When we wrote multi_find(), strings and files were the only collections we knew about, and we had only begun to write our own functions. Now that we know lists and understand functions pretty well, we can make the function more Python-like and work more like its inspiration, the string method find().

Return Type. multi_find() returns a string, but that was a product of our limited Python knowledge. A list is a much more useful return type. That's an easy improvement to make, affecting only three points in the code

creating the initial empty answer
adding a new item to the answer
returning the answer with no special case

The result is a straightforward function that's easier for client code to use.

    >>> multi_find('abcdabccabacdeacbe', 'ab', 2, 100)
    [4, 8]
    >>> multi_find('abcdabccabacdeacbe', 'b', 2, 100)
    [5, 9, 16]
    >>> multi_find('abcdabccabacdeacbe', 'a', 0, 100)
    [0, 4, 8, 10, 14]

Optional Arguments. If we want to search all the way to the end of a string, the find() method allows us to leave off the last argument. The default value is the length of the string.

    >>> len('abcdabccabacdeacbe')
    18
    >>> 'abcdabccabacdeacbe'.find('a', 6, 18)
    8
    >>> 'abcdabccabacdeacbe'.find('a', 6)
    8

If we want to search from the beginning of the string, we can even leave off the second argument. The method uses 0 as the default value.

    >>> 'abcdabccabacdeacbe'.find('a', 9)
    10
    >>> 'abcdabccabacdeacbe'.find('a', 0)
    0
    >>> 'abcdabccabacdeacbe'.find('a')
    0

We can do the same thing by giving a parameter a default value in the function header. Here is our current header for multi_find():

    def multi_find(source, target, start, end):

Making start default to 0 is as easy as this:

    def multi_find(source, target, start=0, end):

We also need to give a default value to end. There are two reasons:

If the user omits only one of the arguments, the find() method assumes that the omitted value is the last argument, the end of the range. We want our function to work that way, too.
The Python interpreter would be confused, too! As we learned back in Chapter 6, Python assigns arguments to the parameters in the order they are listed. It will try to assign a third argument to start, leaving end unassigned.

... look at "Check Yourself" on Page 369 of the text.

What is the default value for end? It is the length of the string to be searched, len(source). But if we try that...

    def multi_find(source, target, start=0, end=len(source)):

We get an error:


    Traceback (most recent call last):
      File "/Users/wallingf/home/teaching/cs1510/web/sessions/session28/multi_find_v3.py", line 1, in 
        def multi_find(source, target, start=0, end=len(source)):
    NameError: name 'source' is not defined

The variable source does not exist until the function body executes. (You will learn why when you take CS 3540 Programming Languages and Paradigms.) So we have to write code to handle that case:

    def multi_find(source, target, start=0, end=None):
        if end == None:
            end = len(source)

This uses the None that we have seen a time or two as a sentinel value. If it is ever the value of end, then we know the user did not pass four values, and the function should search all the way to the end of source.

This works nicely!

    >>> multi_find('abcdabccabacdeacbe', 'ab', 2, 100)
    [4, 8]
    >>> multi_find('abcdabccabacdeacbe', 'ab', 2)
    [4, 8]
    >>> multi_find('abcdabccabacdeacbe', 'ab', 0)
    [0, 4, 8]
    >>> multi_find('abcdabccabacdeacbe', 'ab')
    [0, 4, 8]

I like languages that let me create code that works like the built-in features of the language. Python gives us that freedom on occasion.

Dictionaries in the Lab

In Lab 14, you got your first official experience using dictionaries in a program, using this data file. Let's step through it...

Task 1-2. Build an index from actors to their movies:
```
       KEYS                       VALUES

       'Brad Pitt'        →  [ 'Sleepers', ... ]
       'Anthony Hopkins'  →  [ 'Hannibal', ... ]
       ...
       'Bruce Willis'     →  [ 'Die Hard', ... ]
       'Kevin Bacon'      →  [ 'A Few Good Men', ... ]
```
Standard running total loop, where we build a dictionary instead of a number, string, or list. That is the first step in breaking the task down into manageable steps...

One wrinkle: a space at the front of each movie name. Why? How to fix? (Write a strip_all(lst) function?)

This gives us a dictionary that contains this information:

Task 1-3. Build an index from movies to their actors:
```
       KEYS                       VALUES

       'Sleepers'         → [ 'Brad Pitt', ... ]
       'Hannibal'         → [ 'Anthony Hopkins', ... ]
       ...
       'Die Hard'         → [ 'Bruce Willis', ... ]
       'A Few Good Men'   → [ 'Kevin Bacon', ... ]
```
Standard running total loop to build another dictionary. The main wrinkle is that we encounter movies multiple times in the actor database. So:
- If the movie is not already in the dictionary, add it as a key, with a value that is a list containing the actor's name.
- If the movie is already in the dictionary, add the actor's name to the existing list of actors.

This gives us a dictionary that contains this information:
Together, the two dictionaries tell us a lot:

Task 2.2. Make our lists act like sets. Why do I create new lists in each function, rather than append to the existing lists? Lists are mutable!

Task 2.1 and Task 3.2. Our helper functions do most of the work.

Task 3.1. Our union helper function is useful again, but we need to use it several times, once for each movie the actor is in. And we have to take the actor out of the result, because he or she is not part of the answer!

With just these two dictionaries, we can begin to answer many interesting questions. Some are just for fun, such as The Six Degrees of Kevin Bacon, a movie watcher's pastime from a few years ago. The Oracle of Bacon implements a simple look-up. Can you use your two dictionaries to implement something similar?

Wrap Up

Code -- today's code file

Reading -- Read more about default values and named parameters in Section 8.3 in your textbook, pages 365-369. The rest of Chapter 8 discusses a few other features of Python functions, if you are interested. We won't be looking at them in class.

Homework -- Homework 11 is available and due Friday.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... December 4, 2014