Here is a quickie:
Write a class RemoveReturns whose main() method echoes a standard input to standard output, except that it replaces each new line character with a space.
In ASCII, the new line character is written '\n'.
For example:
> less test-in.txt a b c 1234 e 567890 ff ghij kl 12 34 56 78 90 next > java RemoveReturns test-in.txt a b c 1234 e 567890 ff ghij kl 12 34 56 78 90 next > _
This sounds like a perfect job for CensorInputStream! We need to replace the new line character with a single space, we can write a main() method that reads characters from an CensorInputStream that knows to make that substitution. The all we have to do is write a loop that echoes the characters it receives to stdout.
Whenever possible, let other objects help us do our work! Laziness of this sort is a virtue in object-oriented programs, and object-oriented programmers. (Indeed, it is a virtue for all programmers.)
Of course, if you insist on reinventing wheels, you could write a different main() method that reads characters from an ordinary InputStream, prints a space whenever it sees the new line character, and prints the character itself when not.
Both of these solutions approach the problem at the level of individual characters. We could instead approach it at the level lines of text. If we could read whole lines of text, we could echo them back to stdout one at a time, followed by a space. InputStreams don't do that for us, so we will need some new tools.
We often want to work with files and other sources of text at levels higher than individual characters, so it's worth learning some new tools now.
By the end of the period, we will be able to write a line-based version of RemoveReturns.
The program Echo.java introduces us to a few new Java classes that allow us to work with files of text. Echo echoes the words from one file to another file, one word per line. For example:
> less hamlet.txt 1604 THE TRAGEDY OF HAMLET, PRINCE OF DENMARK by William Shakespeare... > java Echo hamlet.txt hamlet.out > less hamlet.out 1604 the tragedy of hamlet prince of denmark by william shakespeare ...
We can understand many of the new features you see here in terms of concepts you have already learned.
One design pattern can helps us to understand several new ideas relatively quickly! Then we can focus in on a few handy messages, such as BufferedReader.readLine() and PrintWriter.println(), and be relatively productive relatively quickly, too. Note that readLine() returns null (in place of a String) when it reaches the end of the stream, rather than read()'s -1 (in place of a character's integer value).
Another new tool in this code is the StringTokenizer. To "tokenize" means to break something down into the parts (tokens) that make it up. A StringTokenizer tokenizes a String. (Surprise!) Its interface lets us access the tokens one at a time. We generally send a tokenizer two messages: hasMoreElements() and nextElement. The former returns true or false, depending on whether the tokenizer has a String for us that we haven't seen yet. The latter returns the next token we haven't seen yet, if one exists.
We use a common loop to process the sequence of Strings that the tokenizer gives to us:
while ( tokenizer.hasMoreElements() ) { String word = (String) tokenizer.nextElement(); process word }
You can use a loop just like this one whenever you work with a StringTokenizer. Replace "process word" with whatever your program needs to do.
But a default StringTokenizer "breaks" only on whitespace. That means, whenever it sees whitespace, it thinks it has found a boundary between two tokens. Every other character counts as part of a string. But consider this case:
"Eugene, you are a dandy fellow, and a scholar!"
A default StringTokenizer will return:
Eugene, you are a dandy fellow, and a scholar!
We may want to process only the words, not strings that contain punctuation characters, such as the ',' in Eugene, and the '!' in scholar!.
A StringTokenizer can do this for us if we tell it which characters to treat as delimiters. We can tell a tokenizer which characters to use to separate the string into words by passing a string of delimiters to its constructor. In Echo, I pass a whole bunch of characters that I do not want to be treated as part of a word, including some characters that need to be "escaped" in a Java program.
Turn Echo into WordCount, which prints the number of lines, words, and characters in a file to standard output. For example:
> java WordCount hamlet.txt 4463 32885 130145
Here is one possible solution. How much about it is different from Echo? How much is the same?
Let's compare WordCount's output to that of the built-in Unix command wc, which does the same task:
> wc hamlet.txt // standard Unix command 4463 31956 191734 hamlet.txt
Quick Exercise: Why do you suppose that our word and character counts don't match, and in opposite directions?
Sometimes, we'd like to give the user an option of providing a file name or using standard I/O. Most Unix commands work this way. How can we make our Java programs do the same thing?
The critical lines in Echo.java are these:
buffer = inputFile.readLine(); outputFile.println( word );
Why are they critical? They are the only lines in our processing code that interact with the files. So, if we want to use standard input in place of the input file, we need a way to change the readLine statement; if we want to use standard output in place of the output file, then we need a way to change the println statement.
BufferedReaders and PrintWriters are virtual. They rely on other readers and writers to help them do their jobs. Unfortunately, standard input and output are streams, not readers and writers.
Let's again take advantage of an object-oriented idea: We ought to be able to substitute an object with a common interface, even if different behavior, in place of one another, and let the new object fulfill the responsibilities of the replaced one.
Java give us the classes we need: InputStreamReader and OutputStreamWriter. They decorators that convert stream input into reader input and reader output into stream output, respectively.
Take a look at this new, improved version of Echo. It does the same job as the original, but it allows the user to work with standard input and standard output as well as input and output files. The only changes to this file from the original are in these four set-up lines:
BufferedReader inputFile = new BufferedReader( new InputStreamReader( System.in ) ); PrintWriter outputFile = new PrintWriter( new OutputStreamWriter( System.out ) ); if ( args.length > 0 ) inputFile = new BufferedReader( new FileReader( args[0] ) ); if ( args.length > 1 ) outputFile = new PrintWriter( new FileWriter( args[1] ) );
By default, the program reads from standard input and writes to standard output. If the user gives one command-line argument, it is the name of an input file. If the user gives more than one command-line argument, the second is the name of an output file. Here are some example of how the new code works:
> java Echo foo bar baz big Eugene Wallingford teaches this course. foo bar baz big eugene wallingford teaches this course > java Echo hamlet.txt | less 1604 the tragedy of hamlet prince of denmark by william shakespeare ... > java Echo hamlet.txt hamlet.out mac os x > less hamlet.out 1604 the tragedy of hamlet prince of denmark by william shakespeare ... > cat hamlet.txt | java Echo | less 1604 the tragedy of ...
Whenever you need to read from standard input instead of a file, or write to standard output instead of a file, you can use objects created in this way. If you would like to do something more sophisticated, feel free to look into the details of FileReader, BufferedReader, FileWriter, and PrintWriter, or even InputStreamReader and OutputStreamWriter.
Notice again: the processing code in this program stays exactly the same. This demonstrates yet again a wonderful degree of say what you mean and say it once and only once. Objects give us the power to do this in a variety of ways.
Expound with great fervor.
Finally, a closing comment. That is a busy main() method. I'm already eager to find an object in the mess and factor it out of this code into a class, so that I could reuse it in different contexts. We're well on our way. We've managed to separate the creation of the input/output objects from the code that uses them to process a sequence of lines and words. Soon!
After learning about BufferedReader, we are ready to write a third main() method to echo a file, replace new line characters with spaces. It some ways, this version is more straightforward than the InputStream versions, and in other ways it is more complex.
Whether the latest version is better or worse than the CensorInputStream version depends in part on stylistic preference and in part on the context. That is not unusual. The relative quality of competing designs depends on a lot of factors outside the programs themselves.