Why Worry About Program Style, Part 1
Programming style is much like writing style. Each person tends to have their own. Programming is, however, somewhat different than writing because the programs typically do not belong to the individual programmer. Instead they belong to the company or organization that employs the progammer. Indeed, a single programmer almost never writes a program by herself/himself. And, the program must be maintained (corrected, modified, etc.) by programmers other than the creators. Companies and organizations often have style guides that their programmers are expected to follow.
Programming style is considered important because good style is supposed to make programs more readable and, therefore, understandable. Understanding program code is required if one is to make reasoned changes to it. Understanding will be needed in various situations, e.g., the programmer is finding and correcting errors, the programmer is seeking help in identifying errors/problems, the program is being modified to accomplish a different task.
When reading a document or program, one's mind immediately interprets the words or code encountered in a certain way. Characteristics of the document or code can make that interpretation easier or harder. Those same characteristics also make the document more or less likely to be misinterpreted. Good programming style is supposed to allow for easier and correct interpretation.
Programming style and the way a programmer approaches the programming process are interrelated. Adhering to a personal style is a habit of mind that indicates a disciplined approach to the work of programming. Programs exhibiting good style are thought to be more likely to be correct due to the thought and effort (presumably) indicated by the good style.
Finally, from a student perspective, it is useful to develop good programming style because teachers are likely to be favorably influenced code they can readily understand, i.e., by students using good programming style.
What is "Good" Programming Style
There are at least two aspects of program style. One relates only or primarily to the code itself. The other relates primarily to the algorithm underlying the code—the program design. These two aspects are discussed separately below.
Code Style
Code or coding style is concerned with the content and appearance of the code. Aspects like variable names, code spacing, and documentation are commonly discussed under this topic.
Naming (of variables)
Computer programming must make use of variables. Another activity that uses variables is mathematics. In mathematics, variables are often single characters such as x
, y
, etc. In computer programming a single letter can be used but that practice is discouraged. In programming, the advice is to use meaningful variable names.
So, what makes a variable name meaningful? Some of the following should probably be considered.
- The value being stored is described
For example, if we have a variable representing a "total", we might use any of the following—
t
ortot
ortotal
. - The variable name addresses significant aspects of value's use or source
A variable representing a "total" might be named
total
oraccumulatedTotal
ortotalFromUser
orsumFromUser
orcalculatedTotal
ortotalHours
depending on the context of the code. In many cases the simplest (total
) is more than satisfactory, but sometimes it may not be. - The variable name is concise
It is useful to supply as much information as is needed about the variable but no more than is needed. Context is important. Programs are written to be reused. Naming of variables needs to consider that reuse. Will you, as the programmer, remember the context next week when you need to reuse or revise the program? (Or next month or next year?) Would some other programmer who needs to revise the program be able to easily understand the full meaning of the variables?
Readability (of variable names)
In addition to choosing or making up variable names, good coding includes another aspect related to variable names—readability. The discussion above suggests that variable names will often need to be multiple words. When we read English, the words have spaces between them. In a programming language that is not feasible. So, we need help the reader/programmer distinguish the individual words. The two most widely acceptable means to do this are using camel case or underscores.
Camel case is the capitalization of each word (except the first). Thus we have variable names such as accumulatedTotal
, totalFromUser
, and sumFromUser
. Note that the programmer can choose words that aid readability when camel case is used. sumFromUser
might be considered more readable than totalFromUser
because of the difference between the m
and l
before the F
in From
. Of course the context would also affect the choice of words.
(The first word in multiple-word names is typically not capitalized. Some possible reasons for this are that programmers are lazy or that variable names start with lower case letters and other names start with upper case letters or for consistency, e.g., single word variables are lower case so multi-word variables should start with a lower case letter.)
Underscores are also commonly used between words. (In some languages, one could hyphenate variable names but other languages would interpret the hyphen as a minus sign and try to do subtraction. So, for generality, we suggest not using hyphens, even when they are legal.) The userscore may make the variable names even more readable, e.g., total_from_user
vs totalFromUser
. But, this lengthens the name and adds a tiny bit of work when typing the name.
Choosing to use camel case or underscores is a matter of personal taste but probably one or the other should be used.
Uniqueness/Dissimilarity (of variable names)
Another consideration when naming variables has to do with avoiding confusion between similar names. Mostly variable names should be unique and readily distinguishable from others. For example, total
and Total
are unique but are not readily distinguishable. Similarly, you would not want to use totalA
and totalB
. Names that are similar will lead to confusion unless the programmer is very, very attuned to context of the code.
The one time when it is okay to use duplicate variable names is when values are passed to a module (function) as parameters. Usually, the values have the same exact meaning as the value passed in. And, the code inside the module wants to work on/with the local version of the value. Having the same name ensures that the original value is not changed and just generally makes sense with respect to understandability and lack of confusion.
FYI
Some background/historical information for the curious. Early in computer history, main memory (RAM) was very limited. Programmers were advised to use single letters or a letter and a digit as variable names. That was because the program was read into memory and translated from the programming language into machine language. The letters used in variable names increased the size of the program, perhaps to an extent that either the translation process was slowed down or the program was too large for it and the translation program to be in computer memory at one time. Also, for your information, in machine language the variables are just numbers indicating where the value is stored in memory—length of variable names makes no difference in the size of the machine language code. Now that RAM is essentially unlimited (for most applications), the number of characters used in variable names no longer matters insofar as the computer is concerned. So, variables can be named for the benefit of humans.
Code Format/Layout
Code format refers to the white space used in your program. White space consists of blanks and tabs entered in the code between the various elements of the program—variable names, operators, function names, etc. White space also refers to blank lines (the newline characters used to provide blank lines).
The idea is to insert spaces to make the code more readable. Different people think different aspects of spacing make the code more readable. Some of the possibilities are:
- Horizontal (inline) spacing
Most everyone agrees that spaces—at least one—should separate operators and variables, e.g.,
grossPay = hours * rate
vsgrossPay=hours*rate
(even though the no-spaces version can be interpreted correctly by the computer). There can be exceptions, but they are fairly rare and depend on the language. For instance, in some languages thefor
loop specifies the repetition in detail, i.e.,for( i=0; i<=10; i=i+2 )
. The elements of the loop—initializer, continuation condition, and incrementation—are spaced out but within the elements there is no spacing.Note that there may be spacing next to the parentheses as shown in the
for
loop code example above. Sometimes, there are no spaces next to parentheses. Sometimes there might be a mix of spaces and no spaces when one nests elements within parentheses, e.g.,grossPay = ( hours * rate ) + ( (hours - 40) * rate / 2 )
Consistency is important. As people read code, their eyes/minds become used to seeing things and slight differences may be ignored.
Code formatting is left to the programmer. But, there may be a specified style guide; specified by the employer (or the teacher). In these instances, programmers are expected to follow the style guide, consistently
- Vertical (between line) spacing
Spacing in code should be similar to spacing in prose. Things that go together should be placed together, no blank lines. This is similar to paragraphs in prose which are supposed to all relate to a single topic. Some space typically separates paragraphs and a blank like should be placed between separate tasks. Note that the individual programmers decides what constitutes a task.
- Horizontal alignment of statements in a block
A block is a collection of statements/instructions that the computer will consider to be a unit. Examples of blocks can be seen in the following code.
answer = input("How many total points were possible? ") totalPossible = float(answer) percentage = totalEarned/totalPossible grade = getLetterGrade(percentage) print("Your final percentage was "+str(percentage*100)) print("That means you earned "+grade)
and/or
input("How many quizzes did you take? ") totalQuizzes = int(answer) if totalQuizzes<=0: print("Please enter a positive integer answer.") else: totalEarned = 0 for quiz in range(totalQuizzes): number = quiz +1 answer = input("What did you get on quiz #"+str(number)+"? ") totalEarned = totalEarned + float(answer)
Note that the
if
statement contains two other blocks (each consisting of a single statement) and thefor
statement contains a block (consisting of three statements).The languages we have used enforce this aspect of alignment. In Scratch, the left edge of blocks are aligned vertically. If you put something inside an
if
orrepeat
(or other) instruction they are automatically aligned. In Python, the alignment of the start of statements determines which block the statement is in. Care must be taken to insure that each statement is indented appropriately.In many languages, however, programmers can place statements wherever they wish, including all one long line. It is suggested that each statement occur on its own line and that its alignment (starting location) be identical to other statements in the block.
- Horizontal alignment of instruction/statement elements
For instructions that can contain blocks of statements (e.g.,
if
andfor
orrepeat
instructions) the languages we have experienced in our program do not have explicit ending elements. However, other languages often have specific ending elements. For example,// this is Java code if (isMoving) { currentSpeed--; } else { System.err.println("The bicycle has already stopped!"); }
or, alternatively
// this is Java code if (isMoving) { currentSpeed--; } else { System.err.println("The bicycle has already stopped!"); }
and
; this is NetLogo code to square [sideSize] repeat 4 [ forward sideSize right 90 ] end
In the Java code, blocks are included in curly brackets or braces. The ending elements of the statements align with the start of the statement (the "
if
"). The two samples show the typically accepted alignments. The first examples saves space (and shows the statement elements). The second, perhaps, more clearly shows the statement elements (but takes up more space).In the Logo code, the procedure (module)
square []
has an explicitend
indicator which is aligned with the beginning of the module's definition. This example also shows how the beginning and ending of the block within therepeat
statement is indicated (using square brackets and should be aligned. Note that as with the braces in the Java code the initial or opening bracket could have been placed at the end of the line starting the statement (afterrepeat 4
). - Comment alignment
You might have noticed in the code samples above that the comments were separately indented from the code. Thus, the reader can easily concentrate on either the code or the comments as they scan down the page.
Documentation
Documentation is descriptive information about program code. It is typically included in the program using comments that are preceded by or included in special (sets of) characters. Professional/industrial programs often have external documentation that exists outside the program and is typically written in English prose. For our purposes, program documentation is internal to the program and consists of program, module, task, or statement level documentation.
In the ideal situation a program has no documentation—the code is fully understandable because it has been well-designed and names of variables and modules have been carefully selected. The ideal situation (or fully self-documenting code may occur for small programs or program segments but typically does not occur otherwise. The discussion below discusses documentation from the bottom up.
- Statement documentation
If variable names are chosen carefully, there is seldom a need to document or explain single statements. Occasionally, operations on data are numerous and somewhat difficult to follow and an explanation of the result of the statement is necessary. An alternative to documenting a complex statement is to separate some of the operations into their own statements with well-chosen variable names to overcome the complexity of the large number of manipulations.
Another instance when a single statement might need documenting is there is a need to emphasize or point out the existence of the statement. This is done when the programmer thinks someone might overlook the statment. One example is when a
return
orbreak
statement is included in the middle of a module or task rather than at the end. - Task documentation
Often several statements are needed to accomplish a particular task. Sometimes the task can be so complicated that an explanation of the goal of the task (or perhaps the technique used to accomplish the goal) is so complicated that extra documentation (besides good naming) is needed. Also, sometimes, the programmer merely wants to save reader effort and supplies a comment saying concisely what the code does. In this case the reader need not examine the code in detail to understand what is happening—detailed examination is needed only when the task's code itself is being examined rather than the overall module or program code.
- Module documentation
As with variables, module or function names need to be chosen carefully. This can often make the code understandable without the need for additional documentation. When it doesn't documentation is needed.
The critical aspects of a module are
- Purpose, what it accomplishes or produces, in general
- Input(s), any data that is required
- Result(s), specifics of what is returned or accomplished by the module
Typically, if needed, this information is included in program comments immediately before or after the module's definition line.
- Program documentation
Program documentation is similar to module documentation but at a more general level. Program purpose, source of data, and specific results need to be identified. Often programmers will include authorship and date information as well. Finally, the general approach to solving the problem will usually need to be included also.
Again, the goal is that programs be self-documenting. Programmers need to consider whether that goal has been met and when it is not, provide appropriate, but minimal program documentation. The use of well-named modules to accomplish nearly all program tasks can minimize the documentation that is needed.