Programming Assignment 3

Using Lists to Create a Text Processing Class

Background: Text Processing

Often, one can tell a lot about a document just by examining it. Take, for example, the problem of determining the author of a play. Some scholars claim that the plays historically attributed to William Shakespeare were actually written by various other people, in particular Edward de Vere, 17th Earl of Oxford. On what evidence do they base their claims?

One of the techniques used to gather evidence involves analysis of the text itself -- the kind of words, average word and sentence length, and the like. Nowadays, one would write a computer program to "read" a text and gather relevant data about it. More sophisticated analyses rely on statistical testing of the results.

At this point, you have enough experience with C++ to implement a simple data gathering program of this type. You will use the stradt class from Laboratory Exercise 3 and the List<T> template from Laboratory Exercise 4.

1. Implement a WordCount class. Each WordCount object has exactly two pieces of member data: a stradt that is the word and an int that indicates how many instances of the word have been counted. A WordCount object responds to the following access messages:
• initialize(stradt word) initializes the object's word member to word and its count to 0.

• word() returns the object's word.
• count() returns the object's current count.

• increment() adds 1 to the word's count.
• reset() sets the word's count back to 0.

2. Implement a TextAnalyst class. A TextAnalyst can read a text from an input file and record and count the words that make up the text. In order to store this information, a TextAnalyst contains a List< WordCount > object as its member data.

A TextAnalyst responds to the following queries:

• How many different words occur in the text?
• What is the average word length?
• What is the longest word in the text? The shortest word in the text?
• Which word appears most often? How many times?
• Does the text contain a word longer than twice the average word length?
• Does the text contain a word that begins with 'q', 'x', or 'z'?

Note: A TextAnalyst does not display any of its answers to standard output, or any other stream, for that matter. It always returns an answer in response to a query -- say, a double when asked for the average word length, or true/false when asked if the text contains a word more than twice the average word length.

A TextAnalyst can be created using a constructor that takes one argument, a C++ string that is the name of a file containing the text to be read. At creation time, the constructor reads the text stored in the file with the given name and builds its list of word counts.

It can also be sent a read(char* file_name) message. In response to this message, the object reads the text in the file with the given name and builds its list of word counts. (Be sure to have the object clear its list before adding words or counts.)

3. Write a driver program that prints a table of data that compares Shakespeare's Hamlet and Much Ado About Nothing. Your driver should be nearly all display statements; all analysis should be done by sending messages to a TextAnalyst. (You can do simple arithmetic, say, to find the ratio of the average word lengths, but nothing beyond that.)

Source Files

In order to do your implementation, you may want to download the following source files:

Deliverables

As always, your files must begin with a header block that includes the file name, your name as author, and the creation and modification history of the file. You may use this file as a template. And don't forget to follow the programming style sheet for the course!

Submit to your instructor, by the due time and date, the following:

• five separate e-mail messages, each of which contains only a single C++ file.
• your driver program
• the interface for TextAnalyst
• the implementation for TextAnalyst
• the interface for WordCount
• the implementation for WordCount

• a print-out of your files, stapled in the order listed above.

Eugene Wallingford ==== wallingf@cs.uni.edu ==== October 15, 1997