Let's write the first version of an interactive text analysis program. From Session 10 through Session 12, we know how to to process files of text one word at a time. Now, let's use those techniques to delve into the characteristics of written documents.
This assignment focuses on just one characteristic, vocabulary. Most authors use a fairly predictable vocabulary. When writing large documents, the set of words they will usually be roughly the same. Our text analysis program will let a human user begin to explore the vocabulary used in writing a documents.
Your programming task is defined below as a sequence of small requirements. Write your program by implementing one requirement at a time in order. That is how we will grade your program.
Programming Advice
As always, you should write your program in the style we have used all semester long:
Take small steps, and your tests will give you feedback as soon as possible. Use the GenerateTest program to create your test class.
The class java.lang.Math contains a number of useful utility methods for working with numbers, such as Math.max() and Math.min(), Math.random(), and a method you will use on this assignment, Math.log(). Math.log() computes the natural logarithm of a double value. The natural logarithm of a positive value is the power that the number e must be raised in order to equal the value.
For this assignment, you will also need to be able to sort an array. The code you wrote for Homework 4 is a good starting point for a sorting method, but I do not expect you to write your own sort. Instead, you can use the Arrays.sort() method to do the job. sort() is a class method defined in the class java.util.Arrays utility class. Arrays.sort() takes an array as an argument. It returns nothing. But it leaves the argument array sorted in ascending order.
This simple example shows Arrays.sort() in action. Try it out!
Arrays.sort() works on arrays whose values are of the base types, including ints and doubles, with no extra help. If you want to sort an array of Objects, then the class of your objects must:
Feel free to only sort arrays of doubles and ints for now.
If you have any questions about this new Java idea, please ask questions soon! You do not need to scour the web for more information about these classes and methods, beyond the on-line Java documentation linked above.
Write tests and code for each of the following requirements, in order. The words in bold indicate message names. Whenever a requirement says the user can "ask whether...", the expected answer is boolean. Whenever a requirement speaks of a "particular" item, then that item will be an argument to the method.
For example, if a Bag contains the following words and counts:
(a, 5), (b, 7), (c, 2), (d, 8), (e, 8), (f, 3), (g, 1), (h, 5)
then logDistribution will return this array:
[2.0794415416798357 2.0794415416798357 1.9459101490553132 1.6094379124341003 1.6094379124341003 1.0986122886681096 0.6931471805599453 0.0]
(That's one useful test case, but it's not the simplest. Be sure to test some simpler cases first!)
(You can use one of our early I/O examples, say Echo, as a basis for the this code.)
By the due date and time, submit the files
Be sure that your submission follows all homework submission requirements.