Session 11

Processing Strings


CS 1510
Introduction to Computing


Exercise 1: Code Trace

Consider this program:

    user_str = input('Enter a string: ')
    result   = ''

    location = 0
    while location < (len(user_str) - 1):
        if user_str[location] > user_str[location+1]:
            result += user_str[location]
        else:
            result = result * 2
        location += 1

    print(result)

What is the program's output if the user enters edcba?

What is the program's output if the user enters abcde?

What is the program's output if the user enters eadbc?

What happens if we delete the - 1 from the while loop line?



String Interlude: Immutable Collection

A string is a collection of characters. It consists of a set of characters in a particular order. Collections and processing them are a funcamental part of computing. We will study several other kinds of Pythin collections later this semester, and the entire next course in your major is about collections more generally.

We can access individual characters of a Python string using [], but we cannot assign a value to a location accessed in this way.

    >>> name = 'Eugene'
    >>> name[1]
    'u'
    >>> name[1] = 'w'
    Traceback (most recent call last):
      File "<pyshell#26>", line 1, in <module>
        name[1] = 'w'
    TypeError: 'str' object does not support item assignment
    >>> name
    'Eugene'

In computing, we say that a Python string is immutable. You cannot change its value. (You can, of course, change the value bound to the variable name.)

Python string access is powerful. At this point, it offers more details than we can use effectively while learning how to program. For now, you can ignore the idea of 'extended slicing' that appears in Section 4.1.5.



Exercise 2: Find All

Let's start with an easy one...

Write a program that takes a string and a character as input and prints out every position the character appears at in the string.

For example:

    Enter a string: Mississippi
    Enter a character: i
    1
    4
    7
    10

    Enter a string: Mississippi
    Enter a character: s
    2
    3
    5
    6

How about this program?

    index = 0
    while index > len(user_str):
        if user_str[index] == user_char:
            print(index)
        index += 1

This is an example of linear search, one of the most common patterns of collection processing.



Exercise 3: Let's Split

People send me spreadsheets. Lots of spreadsheets. But I find spreadsheet programs such as Numbers and Excel to be a lot less flexible than a language such as Python. So I often export a spreadsheet to plaintext in a format called comma-separated values (CSV). In CSV, a row such as:

       A    B        C         D     E     ...
    1 305 Eugene Wallingford 3-5919 ...

is saved as a string in a text file:

    305,Eugene,Wallingford,3-5919,...

Now, I can write code to process the string.

Write a program that takes such a string as input and prints out the individual strings between the commas, one per line.

With my example, the program should print:

    305
    Eugene
    Wallingford
    3-5919
    ...

What do we need to do if we see a comma? If we see a character other than a comma?

How about this program?

    start = 0
    index = 0
    while index < len(user_str):
        if user_str[index] == ',':
            print(user_str[start:index])
            start = index + 1
        index += 1

The advantage of having operators such as [m:n] and range(m,n) not include n becomes apparent with an example like this.

We have one problem, though:

    >>> 
    305
    Eugene
    Wallingford
    3-5919
    >>> 

We lost the string that follows the final comma! How can we fix that?

After the loop ends, we need to print out the last string separately, as in this program:

    start = 0
    index = 0
    while index < len(user_str):
        if user_str[index] == ',':
            print(user_str[start:index])
            start = index + 1
        index += 1

    print(user_str[start:index])

What happens if the string begins or ends with a comma? Is the behavior we want? This is the behavior that most libraries for processing CSV files give us, so perhaps it is what most people want.



We reached this point at the end of class. Study the material below on your own. We'll discuss some of it in class next time.

String Interlude: The Many Shades of for

Chapter 4 shows you many ways to process the characters of a string using a for statement:

    for char in a_string:
       {suite}

    for index in len(a_string):
       {suite}

    for index, char in enumerate(a_string):
       {suite}

What does the last of these do? If it did not exist, how could we use one of the other forms to implement it?

Don't confuse the in that appears in a for statement with the in operator for strings. The latter creates a boolean expression:

    >>> phone_number = '273-5919'
    >>> '5' in phone_number
    True
    >>> favorite_number = '8'
    >>> favorite_number in phone_number
    False

Python doesn't have many examples of this kind of overlap in keyword or functionality, and in this case we usually don't have any trouble telling them apart:

    digits = '0123456789'

    counter = 0
    phone_number = '273-5919'
    for char in phone_number:
        if char in digits:
            counter += 1

    result = len(phone_number) - counter
    print('There are', result, 'nonnumeric chars.')

How could we write this without subtracting counter from len(phone_number)? (Hint: use not.)



Exercise 4: Largest Character

We can think of characters being smaller or larger depending on their position in the alphabet.

Write a program that prints the largest character in a given string.

For example:

    Enter a string: eugene
    u
    Enter a string: eugenewallingford
    w

We have been using while loops today, but this is for a for loop.

    for char in user_str:
        if char > max:
            max = char

    print(max)

This looks a lot like a the running total pattern we have seen so many times already. But in that pattern, we initialize the variable that keeps track of our answer to an identity value that lets the loop run correctly on the first pass. Here, that variable is max. What is identity value do we need?

Whatever value it is, it needs to be smaller than any character in the string, so that user_str[0] > max on the first pass.

Here are a few candidates. What do you think?

The first works in Python, but not many other languages.

The second works as long as the user enters at least one character.

What does the third do? It works as long as the user enters at least one lowercase alphabetic character.

What does the fourth do? Like the second, it works as long as the user enters at least one character.

... discuss: context, specification.

This program has the for loop and the fourth initial value for max.

How should we handle the situation where the user enters no characters at all? Perhaps guard the output statement to ensure it prints only if there was at least one characters:

    for char in user_str:
        if char > max:
            max = char

    if len(user_str) > 0:
        print(max)
    else:
        print('... empty string ...')

We see patterns like running total all the time. Even so, in each new case, we need to make adjustments in order to address the details of of the new problem.



Wrap Up



Eugene Wallingford ..... wallingf@cs.uni.edu ..... September 30, 2014