Programming Assignment 3

Using Lists to Create a Text Processing Class


810:052

Data Structures

Fall Semester 1997


Due: by the beginning of your lecture session on October 14, 1997

EXTENDED TO: by the beginning of your lecture session on October 21, 1997


Background: Text Processing

Often, one can tell a lot about a document just by examining it. Take, for example, the problem of determining the author of a play. Some scholars claim that the plays historically attributed to William Shakespeare were actually written by various other people, in particular Edward de Vere, 17th Earl of Oxford. On what evidence do they base their claims?

One of the techniques used to gather evidence involves analysis of the text itself -- the kind of words, average word and sentence length, and the like. Nowadays, one would write a computer program to "read" a text and gather relevant data about it. More sophisticated analyses rely on statistical testing of the results.

At this point, you have enough experience with C++ to implement a simple data gathering program of this type. You will use the stradt class from Laboratory Exercise 3 and the List<T> template from Laboratory Exercise 4.


Your Tasks

  1. Implement a WordCount class. Each WordCount object has exactly two pieces of member data: a stradt that is the word and an int that indicates how many instances of the word have been counted. A WordCount object responds to the following access messages:

  2. Implement a TextAnalyst class. A TextAnalyst can read a text from an input file and record and count the words that make up the text. In order to store this information, a TextAnalyst contains a List< WordCount > object as its member data.

    A TextAnalyst responds to the following queries:

    Note: A TextAnalyst does not display any of its answers to standard output, or any other stream, for that matter. It always returns an answer in response to a query -- say, a double when asked for the average word length, or true/false when asked if the text contains a word more than twice the average word length.

    A TextAnalyst can be created using a constructor that takes one argument, a C++ string that is the name of a file containing the text to be read. At creation time, the constructor reads the text stored in the file with the given name and builds its list of word counts.

    It can also be sent a read(char* file_name) message. In response to this message, the object reads the text in the file with the given name and builds its list of word counts. (Be sure to have the object clear its list before adding words or counts.)

  3. Write a driver program that prints a table of data that compares Shakespeare's Hamlet and Much Ado About Nothing. Your driver should be nearly all display statements; all analysis should be done by sending messages to a TextAnalyst. (You can do simple arithmetic, say, to find the ratio of the average word lengths, but nothing beyond that.)


Source Files

In order to do your implementation, you may want to download the following source files:


Deliverables

As always, your files must begin with a header block that includes the file name, your name as author, and the creation and modification history of the file. You may use this file as a template. And don't forget to follow the programming style sheet for the course!

Submit to your instructor, by the due time and date, the following:


Eugene Wallingford ==== wallingf@cs.uni.edu ==== October 15, 1997