Session 12

Using Java I/O to Solve Problems


810:062
Computer Science II
Object-Oriented Programming


On Homework 4

Question 1: Testing Output Files

How can we tell that saveToFileSorted() works?

Interactions with the world outside of our Java program create challenges to writing tests. But we still need a way to know that our code works...

Question 2: Writing loadFromFileSorted()

I gave you an example of a method that sorts chars in an array last time, in the Sign class. It sortes by repeatedly selecting the smallest item from the unsorted array and adding it to the end of the target array. You could have done the same thing here by reading from the file into an array (or Vector), sorting the array, and then adding all the sorted items to the Vector instance variable.

But we can take advantage of the fact that we are reading the items from the file one at a time. Because this is true, we can find the "right spot" for each new item before we add it to the Vector instance variable. This means that, after we process the each new association from the file, the new part of the Vector is always in order.

I gave you a algorithm for doing this kind of 'insertion sort' in an e-mail message yesterday:

    put association at end of vector
    while that association < the one in the slot before it
        swap them

This loop should stop at the first new slot, just after all of the items that were already in the vector.

Or you could find the right slot first, and then insert:

    from the first new slot to the end of the vector
        if the new association < the one in the current slot
           break
    put association at slot where the loop stopped

If you used this last approach, then you could let the Vector insert the new item for you:

From http://java.sun.com/j2se/1.4.2/docs/api/java/util/Vector.html#add(int, java.lang.Object):

add

    public void add(int index, Object element)
Inserts the specified element at the specified position in this Vector. Shifts the element currently at that position (if any) and any subsequent elements to the right (adds one to their indices).

...
Parameters:
index - index at which the specified element is to be inserted.
element - element to be inserted.

...

So now you just need a loop to find the desired index for the new Association:

    private void insertStartingAt( Association assoc, int originalSize )
    {
        for ( int i = originalSize; i < list.size(); i++ )
        {
            Association existingAssoc = (Association) list.elementAt( i );
            if ( assoc.lessThan( existingAssoc ) )
            {
                list.add( i, assoc );
                return;
            }
        }
        list.addElement( assoc );
    }

And so:

    while ( true )
    {
        buffer = inputFile.readLine();
        if ( buffer == null ) break;

        int    split = buffer.indexOf( "]:[" );
        String key   = buffer.substring( 1, split );
        String value = buffer.substring( split+3, buffer.length()-1 );
        Association assoc = new Association( key, value );

        insertStartingAt( assoc, originalSize );
    }

You need time to solve tough problems -- and time to ask questions when you need help. Please start early enough so that you have the time you need!


What Did We Learn Last Time?


Sign and Squish?

Last session, you wrote two programs Sign.java and Squish.java. These small programs give you some praactice with Java I/O and string processing, so they would be worth our doing in any case. They also tell a good story about algorithm design.

I got this idea from Chapter 2 of Jon Bentley's second book of Programming Pearls. Bentley talks about the problem of finding all the anagrams in a file of words. Two words are anagrams if they contain exactly the same characters, just in a different order. Finding anagrams in an efficient way is quite difficult, and programs that can do so are quite complicated.

Let's suppose that we aren't too worried about efficiency, but that we care about the ease of writing the programm. In particular, what if we think about the problem in the "Unix way" of piping together several simple, perhaps built-in, programs?

Think about the programs you just wrote...

We can test this approach using the standard Unix dictionary. On my Mac OS X, that file is /usr/share/dict/words.

    cat /usr/share/dict/words | java Sign | less
    cat /usr/share/dict/words | java Sign | sort | less
    cat /usr/share/dict/words | java Sign | sort | java Squish | less

We can also apply the approach to Hamlet if only we first use our Echo program to convert the play file into a sequence of lines containing one word each:

    java Echo hamlet.txt | java Sign | less
    java Echo hamlet.txt | java Sign | sort | less
    java Echo hamlet.txt | java Sign | sort | java Squish | less

Two small Java programs, Unix's standard sort ... voilá -- anagrams!

[ We have Bentley's book in the UNI library. You can even download Bentley's original code -- a couple of two small C programs -- at http://www.cs.bell-labs.com/cm/cs/pearls/code.html (see Chapter 2). If you want to become a better programmer, you can't go wrong studying Bentley's pearls! ]

Wouldn't it be nice if we could do this sequence of pipes within our Java program? If Echo, Sign, sort, and Squish were objects, we would be one step closer. That's straightforward enough -- we know how to write classes with common interfaces. The second reqauirement is the ability to "pipe" one object's output to the input to another. You'll learn later in this course that Java gives us the tools we need to do this.


A Little Exercise: Process a File

Write the code and command that you need to print out the word that occurs most frequently in lines of text from standard input. Reuse any programs we already have written or learned about, and only write new code that you need. For example:

    mac os x > some command to process hamlet.txt
    the 1091

    mac os x > same command to process Sign.java
    line 12

    mac os x > same command to process Squish.java
    = 8

We have already written a program that converts a file of text into a stream of one-token lines, Echo. Unix provides a standard program for sorting streams, sort. What we need is a program like Squish, but which counts the number of times a word appears and updates a maximum value each time a new word appears.

Here's my program, called MostFrequentWord. It looks a lot like Squish. With this program, I can put together a command line that does the job:

    mac os x > java Echo < ../support/hamlet.txt | sort | java MostFrequentWord
    the 1091

    mac os x > java Echo < Sign.java  | sort | java MostFrequentWord
    line 12

    mac os x > java Echo < Squish.java | sort | java MostFrequentWord
    = 8

Unix is not an object-oriented operating system, but you can learn a lot form The Unix Way: Look for opportunities to put existing parts together when creating a new solution. In Unix, the parts are programs that process standard input and output. In object-oriented programming, the parts will be objects.


What Have We Learned?


Wrap Up


Eugene Wallingford ..... wallingf@cs.uni.edu ..... February 17, 2005