TITLE: Global Variables Considered AUTHOR: Eugene Wallingford DATE: March 29, 2011 8:13 PM DESC: ----- BODY: Last week, a student stopped in to ask a question. He had written a program for one of his courses of which he was especially proud. It consisted in large part of two local functions, and used recursion in a way that created an elegant, clear solution. Yet his professor dinged his grade severely. The student had used a global variable. That's when the question for me arrived.
But why are we taught not to use global variables?
First, let me say that this a strong student. He is not the sort to beg for points, and he wasn't asking me this question as a snark or a complaint. He really wanted to know the answer. My first response was cynical and at least partly tongue-in-cheek. We teach you not to use global variables because we were taught to use global variables. My second response was to point out that "global" is a relative term, not an absolute one. In OO languages, we write classes that contain instance variables and methods that operate on them. The instance variables are global to class's methods and local to class's clients. The programming world seems to like such "globals" just fine. That is the beginning of my trouble trying to create an argument that supports the way in which his program was graded. In his program, written in a traditional procedural language, the offending variable was local to one procedure but global to two nested procedures. That sounds awfully similar to an ordinary Java class's instance variables! On the extreme end of the global/local continuum we have a language like Cobol. All data is declared at the top of a program in an elaborate Data Division, and the "paragraphs" of the Procedure Division refer back to it. Not many computer scientists spend much time defending Cobol, but its design and organization make perfectly good sense in context, and programmers are able to write understandable, large programs. As the student and I talked, I explained two primary reasons for the historical bias against globals: Readability. When a variable lives outside the code that manipulates it, there is a chance that it can become separated in space from that code. As a large program evolves over time, it seems that the chance the variable will become separated from the related code approaches 1. That makes the code hard to understand. When the reader encounters a variable, she may have a hard time knowing what it means without seeing the code that uses it. When she encounters a procedure with a reference to a faraway variable, she may have a hard time knowing what the code does without easy reference to the variable and any other code that uses it. This force is counteracted effectively in some circumstances. In OOP, we try not to write classes that are too long, which means that the instance vars and the methods will be relatively close to one another in the file or on the printed page. Furthermore, there is a convention that the vars will be declared at the top or bottom of the class, so the reader can always find them easily enough. That's part of what makes Cobol's layout work: readers study the Data Division first and then read the Procedure Division with an eye to the top of the file. My student's programming had a structure that mirrored a small class: a procedure with a variable and two local procedures of reasonable size. I can imagine endorsing the relatively global variable because it was part of a understandable, elegant program. Not-Obvious Dependencies. When two or more procedures operate on the same variable that lives outside all of them, there is a risk of that lack of readability rises to something worse: an inability to divine how the program works. The two procedures exert influence over each other's behavior through the values stored in the shared variable. In OO programs, this interaction is an expected part of how objects behave, and we try to keep methods and the class as a whole small enough to counteract the problem. In the general case, though, we can end up with variables and procedures scattered throughout a program and interacting in non-obvious ways. A change to one procedure might affect another. Adding another procedure that refers to or changes the variable complicates matters for all existing procedures in the co-dependent relationship. Hidden dependencies are the worst kind of all. This is what really makes global variables bad for us. Unless we can counteract this force effectively, we really don't want to use them. These are two simple technical reasons that programmers prefer not to use global variables. CS professors tend to simplify them into the dictum, "no global variables allowed", and make it a hard and fast rule for beginners. Unfortunately, sometimes we forget to take the blinders off after our students -- or we ourselves! -- become more accomplished programmers. The dictum becomes dogma, and a substitute for professional judgment. I have what I regard as a healthy attitude about global variables. But I admitted to my student that I have my own weakness in the dictum-turned-dogma arena. When I teach OOP to beginners, we follow the rule All instance variables are private. I'm a reasonable guy and so am willing to talk to students who want to violate the rule, but it's pretty much non-negotiable. I've never had a first- or second- year OOP student convince me that one of his IVs should protected or -- heaven forbid! -- public. But even in my own code, after a couple of decades of doing OOP, I rarely violate this rule. I say "rarely" only in the interest of being conservative in my assessment. I can't remember the last time I wrote a class with a public instance variable. Not all teachers are good at giving up their dogma. Some professors don't even realize that what they believe is best thought of as an oversimplification for the purposes of helping novices develop good habits of thought. Ironically, last semester I ran across the paper Global Variable Considered Harmful, by Bill Wulf and Mary Shaw. (If you can't get through the ACM paywall, you can also find the paper here.) This is, I think, the first published attempt to explain why the global variable is a bad idea. Read it -- it's a nice treatment of the issues as they existed back in 1973. Forty years later, I am comfortable using variables that are relatively global to one or more procedures under controlled conditions. At this point in my programming career, I willing to use my professional judgment to help me make good programs, not just programs that follow the rules. I shared the Wulf and Shaw paper with my student. I hope he got a kick out of it, and I hope he used it to inform his already reliable professional judgment. The paper might even launch him ahead of the profs who teach the prohibition on global variables as if it were revealed truth. -----