New languages come along all the time, sometimes with the sort of corporate oomph that helped Swift and Go become popular quickly.
Most of the reasons why languages are favorites are common from class to class: simplicity, utility, familiarity. This time around, a few students mentioned that they like object-oriented programming or higher-order functions, and that their favorite language supports them well.
I suspect that Python's popularity derives much from how easy it is to whip up useful programs quickly. For example, to generate the tag cloud above from the list of languages and their counts, I needed a text file containing each language's name as often as it appeared in the survey. It took me only a minute or so to write this Python program to generate the text file for me. This combination of speed and utility stands out starkly in comparison to Java, C, and Ada, the other languages most of you know. Languages like Ruby would fare as well.
A common 'favorite' comment is: "... because it is the only one I know". This perfectly reasonable, and the only answer possible if you know only one language, or know only one language well. I hope that this course, and the others you take in our program, give every CS student the chance to make a reasoned choice from a much larger set of choices. Knowing more languages won't diminish your love for your favorite; it will give depth to your preference.
My favorite answer this time to "Why is this language your favorite?" was I liked the challenges we had to solve while I was learning Java. That says nothing about the language, of course, but it reminds me that solving fun problems is why so many of us like to program at all.
Now we know which programming languages we know. Let's begin to learn what we can know about programming languages and learn a new one as both a case study and a tool. The new language will help us to achieve the course objectives.
In this course, we will describe languages and their behavior in two ways:
We could also use mathematics to reason about programs and computation, but that is not a major concern of this course.
Natural language works well for introducing ideas because it lets us tell stories that people understand quickly. People are motivated by good stories. So why do we need to use programs -- or anything other than natural language -- to describe languages?
English allows too much ambiguity. For example, suppose that I tell you,
The meaning of a procedure call, f(E), is running f on E.
I have left many essential questions unanswered. Where do we find E? How do we interpret the expression? Is E passed by value or by reference? We can ask the same questions about f as well!
We need a language with formal semantics in order to eliminate such ambiguities. For this reason, we will use computer programs as our main way of explaining what sentences in a language mean. In particular, we will use an interpreter, which takes a sentence and produces its behavior.
But wait a minute... Can a program be unclear as an explanation? It surely can! Consider a few possibilities:
We will do our best to circumvent these these sources of ambiguity. First, we do not use the first three in our interpreters. Second, we try to avoid the effects of the fourth through abstraction: an abundant numbers of sub-programs and high-level data structures.
This is where functional programming and Scheme come in handy as tools. Functional programming encourages decomposition into smallish procedures, with no assignment statements, and Scheme's flexibility allows us to abstract in ways that other languages do not.
We begin our study of programming languages by learning Scheme. This part of the course allows us to do three different things:
Let us begin.
What kinds of things must be present in every programming language?
Your answer to this question is almost certainly limited by the languages you know now, both the number and the variety. If you learn only one language, or one kind of language, then your perspective on what is essential will be determined by those experiences.
Every programming language has three kinds of things:
Some languages have more kinds of features than these, but they cover most of most languages.
Quick Exercise: Make a list of five features from a programming language you are familiar with. Categorize each item on your list as a primitive, a means of combination, a means of abstraction, or none. Where, for example, would you classify conditional statements such as if?
Notice that there are lots of things not on the list. For instance, if statements are not one of the three things that every language has. Do you know of any programming language without conditional statements? Probably not.
In CS1, you may have learned that, in order to implement the computations that programmers need to solve most problems, a language must support three kinds of control flow: sequence, selection, and repetition. So any complete programming language will offer programmers a way to make choices.
But there is a difference between being able to make a choice and having an if statement, or any explicit conditional statement, in a language.
Take, for example, Smalltalk. I use Smalltalk as an example occasionally throughout this course, because it is different from the languages you know in several ways. Smalltalk has no conditional statement, nor any special form for making selections. (We'll talk about the notion of a "special form" soon.) How can that be?
It turns out that Smalltalk has no statements or special forms for control flow of any kind. All Smalltalk has is message passing: objects sending messages to other objects. In this language, True and False are objects. They respond to messages like any other object. Of course, they respond to the same messages, only in different ways.
This gives programmers the ability to make decisions in a program. When I send a message to True, it behaves one way (If you are true, ...). When I send a message to False, it behaves another way (If you are false, ...). So Smalltalk does provide the ability to make choices, but not through a conditional statement. There is no place in the Smalltalk compiler where you can find the behavior for "if...".
Your view of what a programming language must have should be changing already.
Likewise, a language need not have a looping statement such as for or while. How so? As Java programmers are learning in the newest version of the language, a collection of objects can know how to return a subset that meets some condition, return a sorted version of itself, or determine whether a particular kind of object exists. The language still has for or while statements, but programmers won't use them much. We can imagine them disappearing entirely.
Why does it help us to know that some things must be present in a language, but other do not? It helps us to know what to look for. If I think that a programming language must have an if statement, then when I encounter a language that doesn't, I may become disoriented, disappointed, or even angry. None of those emotions help me to learn the new language, and they may well close my mind to learning something useful.
Whenever you are faced with the task of learning a new language, first try determining what is primitive in the language, how things get combined, and how details are abstracted away. This will give you a framework to guide your task. Over the next several lectures, we will be learning Scheme. Our efforts will be guided by this framework.
Learn a new language and get a new soul.
-- Czech proverb
(... you may even get a new personality!)
Scheme has two kinds of primitive expressions, both of which will be quite familiar to you:
The set of values that a variable can hold include the values that can be expressed literally, as well as higher-order objects. The most surprising of these to you may be procedures.
These primitive expressions can be combined to create more complex expressions. There is exactly one means of combination: the operator application. In Scheme, all non-primitive expressions have the following features:
Note that there are no "statements" in Scheme. Every combination is an expression formed like any other. Nearly every such expression has a value. This is an important difference from most other languages you know, because we will use these values to drive our programming in Scheme.
The syntax of an application expression is:
(<operator> <operand1> <operand2> <operand3> ...)
Most of the operators we use are procedures. To evaluate a compound expression built with a procedure, evaluate each of the operands and passes their values to the procedure, which then produces the value of the expression.
Finally, Scheme has a number of mechanisms for abstraction. For now, we will focus on just one of them, the ability to define a name for a value. We create definitions in the same way we create any compound expression, where the operator is the keyword define and the operands have specific meaning:
(define <name> <expression>)
The <expression> is evaluated and associated with the <name>. Definitions are evaluated for their side effect -- the naming of a value -- and not for their value.
For the most part, definition is the only form of side effect we will use. It is how we name our programs, the top-level data used by our programs, and the data we use for testing.
Notice that define is not a procedure, because it does not evaluate all of its arguments. The first argument is taken literally, as the symbol to be used as the name. In Scheme, we call such an operator a special form. In a session or two, we will study another important special form, lambda, which creates procedures.
In addition to define and lambda, Scheme must offer a few other special forms:
We will study all of these in due time. We will also see that two of these forms are not strictly necessary, because Scheme has another way implement the same behavior.
Every other operator in Scheme is either a procedure or a syntactic abstraction, something that is defined in terms of something else. We will also study the idea of syntactic abstraction in some detail later in the course.
Let's explore some examples of Scheme expressions, using our Dr. Racket interpreter. This will give us a chance to learn a bit about how Scheme works and also get to know our programming environment better.
Some things to pay attention to in Dr. Racket:
Primitives and Simple Expressions.
When we enter some expression at the prompt, for instance, a number, Dr. Racket will print the result of evaluating the number. Because numbers are numeric literals, the value is the same as the expression.
> 25 ;; a number 25 > 1.2 ;; handles integers and floats in the same way 1.2 > #t ;; a boolean ... also #f #t > #\a ;; a character ... we won't use these much #\a > "Eugene" ;; a string ... ditto "Eugene" > 'a ;; a symbol -- both identifier and value ... a ;; we use these a lot as data! Notice the quote. > (quote a) ;; quote is a special form a > 'a-symbol ;; a symbol -- Scheme has fewer constraints on what a-symbol ;; can be a symbol than most other languages > '123->321 ;; see what I mean? 123->321
Here we see an important behavior of Scheme: the Scheme interpreter evaluates every expression it reads. All primitive objects evaluate to themselves, and the "print form" of the object (what we see in an answer) is usually the same as the form we write. Those of you who know Python have probably used an interpreter such as IDLE and seen similar behavior.
What happens if we evaluate the symbol a , an "a" without the character escape sequence and without the quote?
> a a: undefined; cannot reference undefined identifier > (define a 5) > a 5
Symbols serve as identifiers in Scheme; that is, they can name values. We will talk quite a bit more about definitions, identifiers, and values soon.
Some identifiers have values when we first start a Scheme session. Watch this:
> min #<procedure:min> ; a primitive procedure on numbers > not #<procedure:not> ; a primitive procedure on booleans > string-length #<procedure:string-length> ; a primitive procedure on strings > list #<procedure:mlist> ; a primitive procedure on any args > + #<procedure:+> ; even + is a procedure!
These are some of Scheme's primitive procedures, the built-in behaviors provided by the language.
Scheme procedures are named by symbols, just like the variable names we would use in Java for ints and objects. That is correct: all Scheme procedures, even the primitive operation for adding numbers. This turns out to be a remarkably powerful and useful idea, one that we'll come back to later.
Combinations and Compound Expressions.
We combine primitive objects with operators to form more complex expressions. As noted above, Scheme's mechanism for building compound expressions is the prefix expression. A compound expression is always enclosed in parentheses. This is probably different from your experience with other programming languages, where parentheses are usually optional. Always keep this in mind:
You cannot insert or delete parentheses from any Scheme expression without changing its meaning.
Random programming by inserting or deleting parentheses will generally get you nowhere, even more so than in other languages. Think about what you want to say, and ask questions if you don't know how to make it work.
Here are some more examples from Dr. Racket:
> (* 2 2) 4 > (- 4 2) 2 > (+ 3 5.2) ; handles integers and floats with equanimity 8.2 > (/ 4 2) 2 > (/ 1 3) ; and rationals are numbers, too! 1/3 > (- -3 -5) 2
What happens if we insert a pair of parentheses somewhere?
There are several important points for you to note about this Scheme session. First, note that the leftmost element in a compound expression is the operator, followed by the operands. The Scheme evaluator determines the value of the expression by applying the procedure specified by the operator to the values specified by the operands.
That last sentence is extremely important, and more complicated than you might think at first, so make sure you understand what it says.
Notice that in the last expression, the "-" occurs three times and means two different things. When it's the leftmost element in the expression, it represents the operation that is to take place; when its appended to the front of a number, it means that the number is negative. Spacing is important here: (- - 3 -5) would produce an error:
> (- - 3 -5) -: contract violation expected: number? given: #<procedure:-> argument position: 1st other arguments...: 3 -5
This points out a feature of Scheme we will talk more about soon: we are allowed to pass procedures as arguments to other procedures! (But not to -.)
Another thing that you will note is that Scheme's procedures for numbers accept both integers and real numbers without any explicit type coercion or casting. The result of adding 3 to 5.2 is 8.2. That's what most people would say, too. The result of dividing 1 by 3 is 1/3, a fraction, or a rational number. This, too, is obvious to people with no programming experience. Rational numbers are a data type in Scheme.
Other computer languages usually make all sorts of distinctions among different types of numbers, but those distinctions are driven by the implementation of the language and processors for it, and not by our understanding of numbers.
Somehow, we programmers become conditioned by our languages into thinking that these distinctions are a necessary ones. They are not. Scheme characterizes numbers as exact or inexact and makes distinctions in behavior driven by this mathematical idea.
If you need one more example of how Scheme hides implementation details about its numbers, execute this Scheme program. Try that in Java [ loop or recursive ], Python or Ada!! You can see the results in this file of sample runs.
Note, too, that while Scheme has no for or while loops, it does quite nicely with a deeply recursive program, thank you. You will learn the "magic" that makes this possible in just a few weeks.
NOTE: By the way, these programs, along with the sample interactions, are available in the .zip file for today's session notes. I'll bundle up a .zip file of code for you for each class session. Be sure to download the code, study it, run it, and modify it. That's the best way to learn the ideas we are studying!
Benefits and Costs of Prefix Notation.
The prefix notation that Scheme uses has several advantages over other notations, such as infix:
3 4 8 6 5 4 7 6 5 8 + 9 --- 65
Using prefix notation, however, the issues of precedence are clear without anyone memorizing precedence rules. The above example would be written (- (+ 3 (/ (* 4 5) 6)) 7) in prefix notation.
Now, you may be saying to yourself at this point, "This expression isn't clear at all". But it is; it simply requires attention to different details than you are used to. I think you will find your comfort level with such expressions is mostly a matter of exposure. You will be as comfortable with this system as any other after you use it for a while.
which builds a complex expression out of two simpler ones. Of course, this nesting can go on for any number of levels. We'll see later how Scheme deals with nested expressions. Hint: They are nothing special.
One problem with prefix notation is that sometimes there are so many parentheses that we get lost:
(* (* (+ 3 5) (- 3 (/ 4 3))) (- (* (+ 4 5) (+ 7 6)) 4))
If we really try, we could probably figure this out, but it is not clear what this means upon a casual reading. In order to help with this kind of confusion, most Scheme programmers adopt some sort of indentation standard that allows them to read programs easily. If we write the above expression over a number of lines and indent them carefully, we can more easily see how it should be evaluated:
(* (* (+ 3 5) (- 3 (/ 4 3))) (- (* (+ 4 5) (+ 7 6)) 4))
Of course, this isn't the only way to indent this expression, and you may prefer another style. But I think anyone will agree that this is "better" than the original. It doesn't matter what style of indentation you use in this course as long as your meaning can be clearly understood.
Let your editor help you, too. Dr. Racket will indent your code for you, and you can use the Racket > Reindent menu option to indent code that you've edited in strange ways. You'll have to indent data expressions on your own.
Finally, we can use abstraction to hide such complexity as well. Would anyone really write this arithmetic expression in a program? Probably not. If we did have use for such an expression, then we would almost certainly attribute meaning to parts of the expression, say, to the sub-expression (* (+ 4 5) (+ 7 6)). Why not name it what it means? When we do, two levels of nesting -- and 6 parentheses -- disappear from the expression.
(define area (* (+ 4 5) (+ 7 6))) (* (* (+ 3 5) (- 3 (/ 4 3))) (- area 4))
One of the ways that Scheme programmers avoid this sort of parenthesis paralysis is to use local names and write shorter, more expressive code. Another is to keep the bodies of our procedures short, so that they do not introduce too much nesting. These are the things you will want to learn as we proceed.
By the way, Dr. Racket tells me that the answer is 4520/3.
Scheme is an interactive language. Rather than writing a large program as a whole and then translating it "in batch", in Scheme one writes a little piece of program, translates it, and then does the same with another piece. Quite different from batch-style programming, this allows quick turnaround in software development -- which accounts for the prevalent use of interactive languages in (rapid) prototyping.
Do you use any interactive languages? Some of you probably do, Python, Perl, and shell scripting among them. If you're lucky, you may even have tried Ruby.
An interactive language can be either interpreted or compiled. So can batch languages, but there usually isn't much point in interpreting them. Dr. Racket both interprets and compiles: It interprets expressions but compiles procedure definitions. The result is faster response time than pure interpretation. Generally, though, I will refer to "the Scheme interpreter".
Perl and Java are interpreted, but they aren't generally used in an interactive fashion. Scheme's cousin Common Lisp is another interactive language that uses both interpretation or compilation.
What does the Scheme interpreter do with the Scheme expressions that it reads? It evaluates them! The Scheme interpreter works in much the same way that any interactive language interpreter works:
The "top-level" behavior of the interpreter is to cycle through this sequence of actions repeatedly. In Scheme this is implemented recursively. A simplified version of the process might look like:
(define run (lambda () (print (eval (read))) (run)))
This cycle is called the read-eval-print loop. This behavior is the foundation for all that we will be doing with language interpretation in this course. Be sure to understand it!
The technical term for the read step is parsing, which translates the string of characters that the user enters into a data structure on which the evaluator can operate. During this semester, we will discuss how syntax is described, and we will use Scheme's parser for our own interpreters, but we will not dig into the mechanics of parsing. They are properly a topic of the compiler course, 810:155 Translation of Programming Languages -- a course I strongly recommend!
The evaluation of expressions is the main focus of this course. The basic mechanism for evaluation is one with which you are probably already familiar, if only by example in other languages: To evaluate the expression (operator operand1 operand2 ...),
Because some operators are special forms, we know that a Scheme interpreter must evaluate the operator first and the operands next. Otherwise, it wouldn't know whether to use the standard procedure application rule or the special form's rule.
eval is recursive. Evaluating the subexpressions means making a recursive call to eval for each expression. If one of the subexpressions is a compound expression, then it, too, is evaluated in this same way. Simplicity can give rise to power...
This explanation is a bit oversimplified, because it omits answers to some potentially important questions. For example, in what order are the subexpressions evaluated? But this general algorithm will serve as a workhorse for us this semester.
Quick Exercise: Can you think of a situation in which the order of evaluation matters? Think of a scenario in another programming language, such as Java. (Try x + ++x.)
Printing the result of evaluating an expression -- which is itself an expression! -- is the inverse of reading one. The printer must translate the internal representation of the expression into a string of characters that can be written to the terminal. Again, we will largely rely on the Scheme interpreter's printing mechanism this semester.
The exact interface to any Scheme interpreter is implementation-specific. But the basic read (an expression) - eval(uate the expression) - print (the result) loop is a constant.
In our first session, we discussed the goals of this course in very broad terms. Such a discussion is useful for getting a "feel" for the course but does little to prepare you for the tasks ahead. Now let's state what we intend to accomplish in greater detail. At the end of this semester, you should be able to:
These skills are the essential ideas in the study of programming languages because:
With a new mindset about languages, you will be more inclined to learn how to program the environments you use, including tools such as bash, emacs, and Eclipse. (One of my favorite examples from an alumnus: a Flash plug-in to Zend.)