Why Worry About Program Style, Part 2

Programming style is much like writing style. Each person tends to have their own. Programming is, however, somewhat different than writing because the programs typically do not belong to the individual programmer. Instead they belong to the company or organization that employs the progammer. Indeed, a single programmer almost never writes a program by herself/himself. And, the program must be maintained (corrected, modified, etc.) by programmers other than the creators. Companies and organizations often have style guides that their programmers are expected to follow.

Programming style is considered important because good style is supposed to make programs more readable and, therefore, understandable. Understanding program code is required if one is to make reasoned changes to it. Understanding will be needed in various situations, e.g., the programmer is finding and correcting errors, the programmer is seeking help in identifying errors/problems, the program is being modified to accomplish a different task.

When reading a document or program, one's mind immediately interprets the words or code encountered in a certain way. Characteristics of the document or code can make that interpretation easier or harder. Those same characteristics also make the document more or less likely to be misinterpreted. Good programming style is supposed to allow for easier and correct interpretation.

Programming style and the way a programmer approaches the programming process are interrelated. Adhering to a personal style is a habit of mind that indicates a disciplined approach to the work of programming. Programs exhibiting good style are thought to be more likely to be correct due to the thought and effort (presumably) indicated by the good style.

Finally, from a student perspective, it is useful to develop good programming style because teachers are likely to be favorably influenced code they can readily understand, i.e., by students using good programming style.

What is "Good" Programming Style

There are at least two aspects of program style. One relates only or primarily to the code itself. The other relates primarily to the algorithm underlying the code—the program design. These two aspects are discussed separately below.

Code Style/Readability

Code or coding style is concerned with the content and appearance of the code. Aspects like variable names, code spacing, and documentation are commonly discussed under this topic. The following topics were addressed in earlier discussion of program readability

    Naming (Variables)
        Readability of Names
        Uniqueness/Dissimilarity of Names
    Code Formatting
    Documentation

Design Style

Naming (of modules)

We have seen that the naming of variables aids readability/understanding of code. Other programmer-determined names also appear in programming. In particular the program file names provide insight into the program before any code at all is seen. But also, modules (functions, procedures, etc.) that are created also have names and the names used can be useful (or not) for helping the code reader understand what is going on.

Function names should follow all the rules of variable names—various aspects of readability and dissimilarity as discussed in our reading about coding style/readability. And, of course they should accurately and concisely describe the function.

Some people believe it is useful to make the name be a noun or a verb based on whether the function/procedure/module/routine returns a value or carries out a task. For example you might name a module that determines and reports an average, averageValue(). Alternatively, if the module determines and writes the average value to the screen or a file, it might be named reportAverageValue(). You will need to decide on your own preference.

One final thought is that some people suggest it is useful to differentiate between module names and variable names. This can be done effectively by capitalizing the first letter of module names (and using a lower-case letter to start all variable names). Again, you will need to decide on your own preference.

Modularization

Most texts and instructors encourage the use of modularity. A module (function, procedure, subprogram, etc.) is a chunk of code that accomplishes a given task and is placed in its own named routine. It is believed that making use of modularization will make a program more correct and, at the same time, make the program easier to understand and modify when the need arises.

Modularization Reflects Program Logic/Algorithm

A programmer initiates the use of modularization when first thinking about the problem being worked on. Generally one starts by thinking about the tasks that need to accomplished by the program. And usually those tasks require several to many statements (lines of code). The sequence of tasks illustrates our thinking about the problem much more so than does code, even well-documented code. So, start by writing down the tasks and then filling them out as modules or functions. This is essentially like writing an outline when working on a paper. The outline will have several large tasks, most of which can/must be broken down into smaller tasks before solving the problem.

This process was demonstrated during Fundamentals of Programming and was encouraged but not prescribed. As one learns to program, the programs that solve problems get larger and it becomes much more important to use modularization.

Module Size and Cohesion

We highly recommend that you attack your programming problems by decomposing them into subproblems which may be further decomposed into smaller problems. Programs should seldom, if ever, have pieces that are more than one screenful. If you have a chunk of code larger than a single screenful, you need to reconsider—perhaps change—your design (probably replacing a large segment of code with several modules). Another rule of thumb about module size relate to task rather than size—each module should accomplish a single task. That might be one line of code or it might be substantially more lines of code. (This is referred to as cohesion. Think of it as in writing prose; each paragraph should all be about a single idea.)

Avoid/Remove Duplicate/Redundant Code

There are multiple reasons for encouraging programmers to avoid duplicated code. Probably the most important is program correctness. If the same code appears multiple times in a program each copy of the program must be changed when a change is needed. The person making the change may not realize that the same code appears in other places in the program or may know copies exist but not find them all. Or one copy of the duplicated code may get fixed differently than the others.

Though less important, having a single copy of code typically makes for less work on the programmer's part. A little more time may need to be spent thinking about what the module needs to do (and how it will do it for possibly different contexts), but then the code-writing, testing, and debugging will be reduced because only one piece of code needs to be made to work. Clearly, if some change is needed in the code sometime in the future, it will be less work to understand, change, and test the one module than to find, understand, change, and test all the copies of the code.

So, ... any time you find yourself copying and pasting code or writing nearly identical code, stop! Examine the situation and determine how you can define the task so that only one module can accomplish all the tasks.

Some Example Contexts

Duplicate code for different values

Sometimes, perhaps rarely, code is exactly duplicated. This might happen in different parts of a program where the same task is needed. Searching for a particular value or counting occurrences of a value are some examples. In that case, preparing a module is very straightforward—create the module and use the value in question as a parameter. For example, code for counting vowels might start with code to count occurrence of "a", followed by code to count occurrences of "e", followed by code to count occurrences of "i", etc. Alternatively, one might define a module CountVowel( theVowel ) and then call it several times, e.g.

    CountVowel( "a" )
    CountVowel( "e" )
    CountVowel( "i" )
    CountVowel( "o" )
    CountVowel( "u" )

    vowelsCount = 0
    vowelsCount += CountOf( "a" )
    vowelsCount += CountOf( "e" )
    vowelsCount += CountOf( "i" )
    vowelsCount += CountOf( "o" )
    vowelsCount += CountOf( "u" )

A more general module could be used to count occurrences of any character or a string of characters— CountOccurrences( targetValue ) (which can be written either as a function or a procedure, as shown above)

Use a loop to process different values

We could further shorten (eliminate duplications) in the example above by using a for loop. That would avoid the chance of an error when we copy and paste the code (e.g., forgetting to change the "a" to "e", "i", ...). That solution might look like:

    vowelsCount = 0
    vowels = "aeiou"
    for letter in vowels:
    	vowelsCount += CountOf( letter )

This is an even more useful idea if instead of vowels we were looking for digits or some even larger set of characters, i.e., many values have to be examined.

Remove duplicate code in parts of `if ... else`

Sometimes we don't think carefully enough about selection (if) problem contexts and have duplicate code in both the then and else parts of a selection statement. If some code is duplicated in both parts then either:

A single copy should be placed before the if
A single copy should be placed after the if
Something else must occur before or after the duplicate code and it needs to appear in both then and else parts

Testing Values in a Continuous Range

The classic instructional example of one such situation is determining grades. If the grading scale is 90-100 → A, 80-90 → B, etc. one can "calculate" grades in various ways, e.g.,

    if pointTotal >= 90:
        grade = "A"
    if pointTotal >= 80 and pointTotal <90:
        grade = "B"
    if pointTotal >= 70 and pointTotal <80:
        grade = "C"
    if pointTotal >= 60 and pointTotal <70:
        grade = "D"
    if pointTotal <60:
        grade = "F"

    if pointTotal >= 90:
        grade = "A"
    elif pointTotal >= 80:
        grade = "B"
    elif pointTotal >= 70:
        grade = "C"
    elif pointTotal >= 60:
        grade = "D"
    else:
        grade = "F"

Notice that the connected if ... then ... else does not require that both ends of the range be tested because it no longer matters, e.g., if the first test (>= 90) fails you know something about the value when you test >= 80. In particular you know that it is not greater than 90—that it is less than 90. Note that can start at either end of the range but you have to go in order from lowest to highest or highest to lowest.

Take Advantage of Program State Knowledge

A somewhat similar context involves nested if statements (as opposed to serially connected ones). Programmers, particularly novice programmers forget what is known when you get to the else part of an if statement. I like the example of determining leap year status.

    if year mod 400 = 0:
        flag = True
    else:
        if year mod 100 = 0 and year mod 400 != 0:
            flag = False
        else:
            if year mod 4 = 0:
                flag = True
            else:
            	if year mod 4 != 0:
                    flag = False

In this case the highlighted portions are not needed because if we get to those points we already know the result because of an earlier test.

Double-checking Boolean values

(Note that the above title is made-up. It is not terminology for you to remember.) At the end of the above code for determining leap year you might see or use something like the following.

    if flag == True:
        print( year + " is a leap year" )
    else:
        print( year + " is NOT a leap year" )

The code is relatively straightforward and works as is but "== True" is not needed. flag has a value of either True or False and adding "== True" changes nothing.

One thing you might do in such situations is suggest that while flag indicates how we are using the value, it doesn't add much to a reader's understanding of the code. Perhaps using a good (better) variable name like isLeapYear might make adding "== True" less desirable.

Be aware that novice programmers may not need to be corrected in all these coding behaviors when first learning to program. But at some point in time, they probably should be corrected. That has to be decided by each teacher.

Correctly Functioning, Misplaced Code

The classic instructional example for this situation is probably calculating an average inside the loop while summing and counting the values. The follow example illustrates this idea.

    total = 0
    count = 0
    for value in scores:
        total += value
        count += 1
        average = total / count

Again, the code works fine, but it needlessly calculates average after each score rather than just one time after including all the scored in the total. It is better to only calculate the average one time, i.e.,

    total = 0
    count = 0
    for value in scores:
        total += value
        count += 1
    average = total / count