Write a quick-and-dirty main() program named CharacterCount that displays the number of non-space characters on the command line after the command. For example:
mac os x > java CharacterCount 0 mac os x > java CharacterCount a 1 mac os x > java CharacterCount a bc def ghij 10
Here is the quickie. It's nothing special, really. Remember, args is just an array of Strings. Your code needs to add the length of each String in the array to a running total. This is a standard looping pattern.
Study EchoWordsInArgumentV1.java, which takes a single command-line argument and echoes all of the words that it contains, one per line.
mac os x > java EchoWordsInArgument "a bc def ghij" a bc def ghij mac os x > java EchoWordsInArgument "eugene is a dandy fellow" eugene is a dandy fellow
Our new tool is the StringTokenizer. To "tokenize" means to break something down into the parts (tokens) that make it up. A StringTokenizer tokenizes a String. (Surprise!) Its interface lets us access the tokens one at a time.
while ( tokenizer.hasMoreElements() ) { String word = (String) tokenizer.nextElement(); process word }
You can use a loop just like this one whenever you work with a StringTokenizer. Replace "process word" with whatever your program needs to do.
But a default StringTokenizer "breaks" only on whitespace. That means, whenever it sees whitespace, it thinks it has found a boundary between two tokens. Every other character counts as part of a string. But consider this case:
mac os x > java EchoWordsInArgumentV1 "Eugene, you are a dandy fellow, and a scholar\!" Eugene, you are a dandy fellow, and a scholar!
We may want to process only the words, not strings that contain punctuation characters, such as "Eugene," and "scholar!".
A StringTokenizer can do this for us if we tell it what characters to treat as delimiters. When we create a StringTokenizer, we can tell it what characters to use to break the string into words. Consider the small change in EchoWordsInArgumentV2.java:
String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r"; StringTokenizer words = new StringTokenizer( args[0], delimiters );
Now:
mac os x > java EchoWordsInArgumentV2 "Eugene, you are a dandy fellow, and a scholar\!" Eugene you are a dandy fellow and a scholar
One more issue... Sometimes when we are processing a long stream of words, the mixture of upper- and lowercase characters can cause a problem. Suppose that we are creating an index of the words in a file. So we would need to sort the words we find:
mac os x > java EchoWordsInArgumentV2 "Eugene, you are a dandy Computer Scientist, and a scholar\!" | sort Computer Eugene Scientist a a and are dandy scholar you
The capitalized words appear out of order, because:
The StringTokenizer can't do much for us here, but Java Strings can. We can ask a String for an all-lowercase version of itself. Consider the one new line in EchoWordsInArgumentV3.java:
String word = (String) words.nextElement(); word = word.toLowerCase(); System.out.println( word );
And now:
mac os x > java EchoWordsInArgumentV3 "Eugene, you are a dandy Computer Scientist, and a scholar\!" | sort a a and are computer dandy eugene scholar scientist you
The lesson of all this? We can use a StringTokenizer to access individual words from a string. Eventually, we will want to process more than one line of text, or text that already exists in a file. How can we do that?
Study Echo.java, which echoes all the words in one file to an output file, one per line.
mac os x > less hamlet.txt 1604 THE TRAGEDY OF HAMLET, PRINCE OF DENMARK by William Shakespeare Dramatis Personae Claudius, King of Denmark. Marcellus, Officer. Hamlet, son to the former, and nephew to the present king. ... mac os x > java Echo hamlet.txt hamlet.out mac os x > less hamlet.out 1604 the tragedy of hamlet prince of denmark by william shakespeare dramatis personae claudius king of denmark marcellus ...
Our new tools are two of Java's reader and writer classes, with which we can do input/output. For now, we will focus on two, BufferedReader and PrintWriter, which will use helper objects to access files in your directory.
while ( true ) { buffer = inputFile.readLine(); if ( buffer == null ) break; process buffer }
Now watch this:
mac os x > cat hamlet.out | sort | less 1 1 1 1 1 1 1604 a a a ...
Turn Echo.java into WordCount, which prints the number of lines, words, and characters in a file to standard output. For example:
mac os x > java WordCount hamlet.txt 4792 32889 130156
Here is one possible solution. How much about it is different from Echo? How much is the same?
Let's compare WordCount's output to that of the built-in Unix command wc, which does the same task:
mac os x > java WordCount hamlet.txt ;; our Java program 4792 32889 130156 mac os x > wc hamlet.txt ;; standard Unix command 4792 31957 196505 hamlet.txt
Quick Exercise: Why do you suppose that our word and character counts don't match, and in opposite directions?