Bonus Video: When I taught this course in 2021, I made a short video for the non-quiz part of this session. If you would like to see and hear me work through some of the ideas in these notes, feel free to watch. Note, though: I have changed this session a bit over the last two years, so these notes are a bit different from the material in the video.
While noodling around to get ready for this session, I ran across a short video with an interesting puzzle. Can you solve it?
Try, then explain.
Problems like this are a lot of fun, and they make fine examples of inductive reasoning. But most feel like tricks, puzzles meant to challenge us for the sake of the challenge. Some people like such puzzles (I am one), but they may not be the best way for a computer scientist to learn induction or recursion. We computer scientists work with structures that make induction more real to us.
If you think about it, we just spent two and half weeks learning a programming language, and even a new programming style, doing something like this. I showed you examples of expressions in Racket, with occasional commentary that pointed out similarities and differences. But mostly you learned from examples and filled in a lot of blanks for yourselves.
That this works at all, let alone as well as it does, owes itself to several factors. Human brains are pattern-matching machines, which means that you are able to see and make patterns surprisingly well. You also all have experience with at least two programming languages, which gives you cognitive structures on which to hang knowledge of Racket and functional programming and with which to understand important differences.
But even in Intro to Computing, when you learn your first programming language and style, you mostly learn by example. We don't give you many formal rules or definitions. Yet somehow you learn.
The programs we write aren't so lucky.
Recall that a data type consists of two things:
As programmers, we are also concerned with other pragmatic issues, such as how values are represented as literals in programs and how values are represented for reading and printing. But these are interface issues that are essentially external to the data type itself.
In order to define operations on the values of a data type, we need a way to specify the set of values in the type. That way, we can be sure to handle all possible values of an object. More generally, we want to be able to recognize whether or not a value is in the set in order for our language processors to do type-checking.
Suppose that I want to specify the set of lists made up of symbols. We might write:
We all understand this definition because we intuitively recognize the pattern. This specification relies on the reader to determine the pattern, add one more symbol to the previous list, and to continue it. But what if the "reader" is a computer program? Programs aren't nearly as strong at recognizing conceptual patterns in infinite lists ("--yet", says the old AI guy).
This type of specification can even cause problems to a human reader. Does "and so on" used above mean what we think it means? I think it means, for example, that if s1, s2, and s3 are symbols, then (s1 s2 s3) is in the set. But can I be sure?
So we can only use such specifications safely with human readers if we have a shared understanding of how to identify the pattern. When multiple reasonable patterns exist, we must have a common sense of simplicity. (This is one of the things that always bothered me about analogy problems on exams such as the ACT and SAT.)
Rather than list examples and rely on the reader to determine the pattern, we can usually write a more concise and more explicit definition of a set by defining the pattern in terms of other members of the set. For lists of symbols, I might write:
Now we know that (a b c) is the next member in the set and that, using this definition, we can generate all list of symbols of length 0 or more. The only thing we depend upon is the cons operation.
This way to define a set is called inductive.
Inductive definitions are a powerful form of shorthand. Though many of you cringe at the idea of an inductive proof, most of you think about certain problems in this way with ease. (Writing a proof requires a few proof-making skills, along with more precision -- and thus more discipline.)
The second part of our definition above is the inductive part. It defines one member of the set in terms of another member of the set.
We can use this style to define other kinds of sets, too. For example, if I want to describe the sequence 1, 2, 4, 8, 16, ..., I can write:
Quick Exercise: Using the 2n inductive definition above, how might we convince someone that 32 is in the set?
We could reason forward:
1 is in the set. (Rule 1)
Therefore, 2 is in the set. (Rule 2)
Therefore, 4 is in the set. (Rule 2)
Therefore, 8 is in the set. (Rule 2)
Therefore, 16 is in the set. (Rule 2)
Therefore, 32 is in the set. (Rule 2)
Or we could reason backward:
32 would be in the set if 16 were. (Rule 2)
16 would be in the set if 8 were. (Rule 2)
8 would be in the set if 4 were. (Rule 2)
4 would be in the set if 2 were. (Rule 2)
2 would be in the set if 1 were. (Rule 2)
1 is in the set. (Rule 1)
Therefore, 32 is in the set.
These two arguments correspond roughly to the forward chaining and backward chaining techniques we learn in a course on artificial intelligence.
Side note: It turns out that the effort needed to convert our two demonstrations into honest-to-goodness proofs is not that large. The first derivation leads naturally to a proof by construction. The second is the basis of a proof by contradiction that begins, "Assume that 32 is not in the set." Proof by contradiction generalizes to much larger numbers, too, because we can use Rule 2 n times to reach 0 from any n > 0. Proof by contradiction can generate an efficient proof, because it is directed toward a goal, and not left with a bunch of decisions about how to derive the statement to be proved. This is useful when there are many possible rules to use reasoning forward.
Reasoning inductively is something humans do pretty well automatically, and it turns out to be a useful technique to embody definitions in computer programs, too. We relied on an inductive definition in Session 4 when we said A list is a pair whose second item is a list.
We can even use an inductive definition to specify what counts as an arithmetic expression:
This can be handy for type checking computer programs... and much more. We return to this idea soon!
Inductive specifications are so plentiful and so useful that computer scientists have developed a specialized notation -- a language -- for writing them.
In case you haven't noticed, most of computer science is about notation -- and thus about language -- in one form or another. |
The notation we use to write inductive definitions is called Backus-Naur Form, or BNF. You will see BNF used in a number of places, including most programming language references, and computer scientists will expect that you know what BNF is.
BNF has three major components:
You will sometimes see terminals ignored altogether, because they are obvious to the reader. Suppose that I wished to define a "list of numbers". I may assume that my readers will know what I mean by a "number" and so not define it with a rule. If that turns out to be a problem (for example, do negative numbers count?), I can always add a rule or two to define numbers.
Here is the BNF definition for a list of numbers in Racket:
<list-of-numbers> ::= () <list-of-numbers> ::= (<number> . <list-of-numbers>)
This definition is inductive, because the second rule defines one list of numbers in terms of another (shorter!) list of numbers. It matches the catchy definition I gave for a Racket list back in Session 4: a list is a pair whose cdr (second item) is a list.
There are a few things to notice about the BNF definition:
. ( )are the terminals in this definition.
What about <number>? Is it a terminal or a non-terminal? It is written as a non-terminal. We might assume that it stands in place of the numeric literals 0, 1, 2, and so on, and thus requires no formal definition. But we really should define <number> formally, too:
<number> ::= 0 <number> ::= <number>+1
As you can see from these definitions, there may be more than one rule for a non-terminal. This happens so often that there is a special notation for it:
<list-of-numbers> ::= () | (<number> . <list-of-numbers>) <number> ::= 0 | <number>+1
We read the vertical bar, |, as "... or ...".
As a shorthand, we will sometimes use another piece of notation, called the Kleene star, denoted by ^{*}. The star says that the preceding element is repeated zero or more times. Using the Kleene star, we can define a list of numbers simply as:
<list-of-numbers> ::= ( <number>* )
Another variation is the Kleene plus, denoted by ^{+}. This notation differs from the Kleene star in that the structure within the brackets must appear one or more times.
In this course, we usually write our definitions of data types using the more verbose notation given in the first definition above. The more verbose notation provides us with more guidance as we think about the code that processes our data.
We can use our inductive definitions, whether in BNF or not, to demonstrate that a given element is or is not a member of a particular data type. This is just what we did above when I asked "Convince me that...".
For example, consider the list (4 3). We can use the technique of syntactic derivation to show that it is a member of the data type <list-of-numbers>:
<list-of-numbers> (<number> . <list-of-numbers>) (4 . <list-of-numbers>) (4 . (<number> . <list-of-numbers>)) (4 . (3 . <list-of-numbers>)) (4 . (3 . ()))
Recall that(4 3) and (4 . (3 . ())) are two ways of writing the same Racket value!
You shouldn't be surprised that syntactic derivation resembles Racket's substitution model. They are closely related. As in the substitution model, the order in which we derive the sub-expressions is not important.
We have just seen how we can specify a list of numbers using BNF. We can use BNF to specify other data types as well. For example, we can specify a binary tree that has numbers as leaf nodes and symbols as internal nodes as:
<tree> ::= <number> | (<symbol> <tree> <tree>)
This BNF specification describes trees that look like the following:
1 (foo 1 2) (bar (foo 1 2) (baz 3 4)) (bif 1 (foo (baz 3 4) 5))
We will see many more examples of BNF throughout the course. It goes hand in hand with inductive definitions. Earlier today, we defined a simple set of arithmetic expressions inductively. (Can you give a BNF definition for that set?) That definition specifies a set of numbers, but it also specifies the sort of expressions we write in computer programs. Is that resemblance accidental? No, it reveals something deeper.
An important use of BNF is in specifying the syntax of programming languages. It may seem odd that a method we just described as being useful for specifying data types would be useful for specifying syntax, but remember: a compiler is just a program that treats other programs as data. In other words, syntax is just another data type!
This is a powerful idea. Let it sink in.
The notion that syntax is just a data type actually takes us a bit farther along our path of studying programming languages. How might we understand a language? With some help from Shriram Krishnamurthi's book, Programming Languages: Application and Interpretation I can think of at least four ways:
All four of these are important to programmers, especially those who want to use the language to build big systems. In a course on programming languages, our focus is on syntax and semantics, with an emphasis on the latter: How can a program implement the behaviors we desire? How can a programming language describe those behaviors? How can an interpreter or compiler process a program?
Though it might seem like it to yet, because you are still early in your study of computer science, but in many ways, syntax is the least important of these four features. Familiar or not, terse or not, it just doesn't matter. You'll see.
BNF gives us a tool for describing syntax. Programs that process BNF descriptions recursively give us a tool for understanding behavior.
A data type consists of two things:
This session shows how we can use BNF notation to define sets of values. Next session, we will begin learning a family of techniques for writing programs that process values in the sets, using the BNF definition as a guide.
Following an inductive definition will help us write complete programs, ones that handle all possible inputs. But doing so will also help us write correct programs, converting a big problem into three or more smaller problems that are easier to solve. On to Session 9!
I sometimes introduce John Backus in Session 1, because he was the leader of the first compiler project in history. For this and his many other contributions to computing, Backus received the Turing Award in 1977. The Turing Award is computing's equivalent of the Nobel Prize. The other half of Backus-Naur is Peter Naur, who received the Turing Award in 2005. Naur's award also recognizes fundamental contributions to programming languages: For fundamental contributions to programming language design and the definition of Algol 60, to compiler design, and to the art and practice of computer programming.
The first Turing Award was given in 1966 to Alan Perlis -- another name we've seen this semester, in this quote on languages changing how we think. By now, I hope that you are not surprised that so many of the giants of computing made their biggest contributions to the discipline in the area of programming languages. As mentioned above, most of computer science is about notation -- and thus about language -- in one form or another.