Session 2

Learning a New Language: Scheme


CS 3540
Programming Languages and Paradigms


Survey Results: Your Favorite Programming Languages

Numbers.

My favorite comment was: "... because it is the only one I know". This perfectly reasonable, and the only answer possible if you know only one language, or one language well. I hope that this course, and the others you take in our program, give every CS student the chance to make a reasoned choice from a much larger set of choices. Knowing more languages won't diminish your love for your favorite; it will give your preference depth.



Our Objectives This Semester

In our first session, we discussed the goals of this course in very broad terms. Such a discussion is useful for getting a "feel" for the course but does little to prepare you for the tasks ahead. Now let's state what we intend to accomplish in greater detail. At the end of this semester, you should be able to:

These skills are the essential ideas in the study of programming languages because:

With a new mindset about languages, you will be more inclined to learn how to program the environments you use, including tools such as bash, emacs, and Eclipse. (One of my favorite examples from an alumnus: a Flash plug-in to Zend.)



Describing Programming Languages

In this course, we will describe languages in two ways:

We could also use mathematics to reason about programs and computation, but that is not a major concern of this course.

Natural language works well for introducing ideas because it lets us tell stories that people understand. People are motivated by good stories. So why do we need to use programs -- something other than natural language -- to describe languages?

English allows use too much ambiguity. For example, suppose that I tell you, "The meaning of a procedure call, f(E), is running f on E". I have left many essential questions unanswered. How and where is E found? Is E passed by value or by reference? We can ask the same questions about f as well!

We need a language with formal semantics in order to eliminate such ambiguities. So we will use computer programs as our main way of explaining what sentences in a language mean. In particular, we will use an interpreter, which takes a sentence and produces its behavior.

But wait a minute... Can a program be unclear as an explanation? It surely can! Consider a few possibilities:

We will do our best to circumvent these these sources of ambiguity. First, we do not use the first three in our interpreters. Second, we try to avoid the effects of the fourth through abstraction: an abundant numbers of sub-programs and high-level data structures.

This is where functional programming and Scheme come in handy as tools. Functional programming encourages decomposition into smallish procedures, with no assignment statements, and Scheme's flexibility allows us to abstract in ways that other languages do not.



Learning a New Language

We begin our study of programming languages by learning Scheme. This part of the course allows us to do three different things:

  1. As you learn the language, we can discuss it in the terms that we will use to study languages in general. This provides a concrete context for learning the terminology and techniques of PL study.
  2. It makes it possible for us to use Scheme as a tool for studying languages and paradigms by simulating their features in Scheme programs.
  3. It introduces you to a second (or third) programming paradigm: functional programming.

Let's begin.

What kinds of things must be present in every programming language?

Your answer to this question is almost certainly limited by the languages you know now, both the number and the variety. If you learn only one language, or one kind of language, then your perspective on what is essential will be determined by those experiences.

Every programming language has three kinds of things:

Some languages have more kinds of features than these, but they cover most of most languages.

Quick Exercise: Make a list of five features from a programming language you are familiar with. Categorize the items on your list as primitive, means of combination, means of abstraction, or none. Where, for example, would you classify conditional statements such as if?

Notice that there are lots of things not on the list. For instance, if statements are not one of the three things that every language has. Do you know of any programming language without conditional statements? Probably not.

In CS1, you may have learned that, in order to implement the computations that programmers need to solve most problems, a language must support three kinds of control flow: sequence, selection, and repetition. So any complete programming language will offer programmers a way to make choices.

But there is a difference between being able to make a choice and having an if statement, or any explicit conditional statement, in a language.

Take, for example, Smalltalk. I use Smalltalk as an example occasionally throughout this course, because it is different from the languages you know in several ways. Smalltalk has no conditional statement, nor any special form for making selections. (We'll talk about the notion of a "special form" soon.) How can that be?

It turns out that Smalltalk has no statements or special forms for control flow of any kind. All Smalltalk has is message passing: objects sending messages to other objects. In this language, True and False are Boolean objects. They respond to messages like any other object. Of course, they usually respond to the same messages, only differently, and that gives a programmer the ability to make decisions in a program: "If you are true, ...". So Smalltalk does provide the ability to make choices, but not through a special conditional statement or form.

Already, your view of what a programming language simply must have should be changing.

A more conventional way to eliminate the if control structure is through the use of a while statement. What does this piece of code do?

    while ( x )
    {
        y;
        break;
    }

This loop behaves just like an if statement! So we could eliminate the if statement without diminishing the range of behavior available in programs.

Why does it help us to know that some things must be present in a language, but other do not? It helps us to know what to look for. If I think that a programming language must have an if statement, then when I encounter a language that doesn't, I may become disoriented or disappointed or even angry. None of those emotions help me to learn the new language, and they may well close my mind to learning something useful.

Whenever you are faced with the task of learning a new language, first try determining what is primitive in the language, how things get combined, and how abstraction is done. This will give you a framework to guide your task. Over the next several lectures, we will be learning Scheme. Our efforts will be guided by this framework.



Learning Scheme

Learn a new language and get a new soul.
-- Czech proverb

(... you may even get a new personality!)


Scheme has several different kinds of primitive expressions, many of which will be quite familiar to you:

These primitive expressions can be combined to form more complex expressions. The sole means of combination is the operator application. In Scheme, non-primitive expressions have the following features:

Note that there are no "statements" in Scheme. Every combination is an expression formed like any other. Nearly every such expression has a value. This is an important difference from most other languages you know, because we will use these values to drive our programming in Scheme.

The syntax of an application expression is:

     (<operator> <operand1> <operand2> <operand3> ...)

Most of the operators we use are procedures. To evaluate a compound expression built with a procedure, evaluate each of the operands and passes their values to the procedure, which then produces the value of the expression.

Finally, Scheme has a number of mechanisms for abstraction. For now, we will focus on just one of them, the ability to define a name. Definitions look just like expressions, where the <operator> is the keyword define and the operands have specific meaning:

     (define <name> <expression>)

The <expression> is evaluated and associated with the <name>. Definitions are evaluated for their side effect -- the naming of a value -- and not for their value.

For the most part, definition is the only form of side effect we will use. It is how we name our programs, the top-level data used by our programs, and the data we use for testing.

Notice that define is not a procedure, because it does not evaluate all of its arguments. The first argument is taken literally, as the symbol to be used as the name. In Scheme, we call such an operator a special form. Above, I mentioned another special form, lambda, which is used to create procedures. We will study lambda in more detail in a session or two.

In addition to define and lambda, Scheme must offer a few other special forms:

Every other operator in Scheme is a procedure or a syntactic abstraction, something that is defined in terms of something else. We will study the idea of syntactic abstraction in some detail in the course. We'll also learn about quote, if, cond, let, set!, and begin as we go along. We will also learn that two of these forms are not strictly necessary, because Scheme has another way implement the same behavior.



Examples from our Scheme Interpreter

Let's explore some examples of Scheme expressions, using our Dr. Racket interpreter. This will give us a chance to learn a bit about how Scheme works and also get to know our programming environment better.

Some things to pay attention to in Dr. Racket:

Primitives and Simple Expressions.

When we enter some expression at the prompt, for instance, a number, Dr. Racket will print the result of evaluating the number. Because numbers are numeric literals, the value is the same as the expression.

    > 25                 ;; a number
    25

    > 1.2                ;; handles integers and floats in the same way
    1.2

    > #t                 ;; a boolean ... also #f
    #t

    > #\a                ;; a character ... we won't use these much
    #\a

    > "Eugene"           ;; a string ... ditto
    "Eugene"

    > 'a                 ;; a symbol -- both identifier and value ...
    a                    ;; we use these a lot as data!  Notice the quote.

    > (quote a)          ;; quote is a special form
    a

    > 'a-symbol          ;; a symbol -- Scheme has fewer constraints on what
    a-symbol             ;; can be a symbol than most other languages

    > '123->321          ;; see what I mean?
    123->321

Here we see an important behavior of Scheme: the Scheme interpreter evaluates every expression it reads. All primitive objects evaluate to themselves, and the "print form" of the object (what we see in an answer) is usually the same as the form we write.

What happens if we evaluate the symbol a , an "a" without the character escape sequence and without the quote?

    > a
    a: undefined;
    cannot reference undefined identifier

Symbols serve as identifiers in Scheme; that is, they can name values. We will talk quite a bit more about symbols and identifiers and values soon.

Some identifiers have values when we first start a Scheme session. Watch this:

    > min
    #<procedure:min>             ; a primitive procedure on numbers

    > not
    #<procedure:not>             ; a primitive procedure on booleans

    > string-length
    #<procedure:string-length>   ; a primitive procedure on strings

    > list
    #<procedure:mlist>           ; a primitive procedure on any args

    > +
    #<procedure:+>               ; even + is a procedure!

These are some of Scheme's primitive procedures, the built-in behaviors provided by the language.

Scheme procedures are named by symbols, just like the variable names we would use in Java for ints and objects. That is correct: all Scheme procedures, even the primitive operation for adding numbers. This turns out to be a remarkably powerful and useful idea, one that we'll come back to later.

Combinations and Compound Expressions.

We combine primitive objects with operators to form more complex expressions. As noted above, Scheme's mechanism for building compound expressions is the prefix expression. A compound expression is always enclosed in parentheses. This is probably different from your experience with other programming languages, where parentheses are usually optional. Always keep this in mind:

You cannot insert or delete parentheses from any Scheme expression without changing its meaning.

Random programming by inserting or deleting parentheses will generally get you nowhere, more so than in other languages. Think about what you want to say, and ask questions if you don't know how to make it work.

Here are some more examples from Dr. Racket:

    > (* 2 2)
    4

    > (- 4 2)
    2

    > (+ 3 5.2)            ; handles integers and floats with equanimity
    8.2

    > (/ 4 2)
    2

    > (/ 1 3)              ; and rationals are numbers, too!
    1/3

    > (- -3 -5)
    2

What happens if we insert a pair of parentheses somewhere?

There are several important points for you to note about this Scheme session. First, note that the leftmost element in a compound expression is the operator, followed by the operands. The Scheme evaluator determines the value of the expression by applying the procedure specified by the operator to the values of the operands.

That last sentence is extremely important, and more complicated than you might think at first, so make sure you understand what it says.

Notice that in the last expression, the "-" occurs three times and means two different things. When it's the leftmost element in the expression, it represents the operation that is to take place; when its appended to the front of a number, it means that the number is negative. Spacing is important here: (- - 3 -5) would produce an error:

    > (- - 3 -5)
    -: contract violation
         expected: number?
         given: #
         argument position: 1st
         other arguments...:
          3
          -5

This points out a feature of Scheme we will talk more about soon: we are allowed to pass procedures as arguments to other procedures! (But not to -.)

Another thing that you will note is that Scheme's procedures for numbers accept both integers and real numbers without any explicit type coercion or casting. The result of adding 3 to 5.2 is 8.2. That's what most people would say, too. The result of dividing 1 by 3 is 1/3, a fraction, or a rational number. This, too, is obvious to people with no programming experience. Rational numbers are a data type in Scheme.

Other computer languages usually make all sorts of distinctions among different types of numbers, but those distinctions are driven by the implementation of the language and processors for it, and not by our understanding of numbers.

Somehow, we programmers become conditioned by our languages into thinking that these distinctions are a necessary ones. They are not. Scheme characterizes numbers as exact or inexact and makes distinctions in behavior driven by this mathematical idea.

If you need one more example of how Scheme hides implementation details about its numbers, execute this Scheme program. Try that in Java [ loop or recursive ], Python or Ada!! You can see the results in this file of sample runs.

Note, too, that while Scheme has no for or while loops, it does quite nicely with a deeply recursive program, thank you. You will learn the "magic" that makes this possible in just a few weeks.

NOTE: By the way, these programs, along with the sample interactions, are available in the .zip file for today's session notes. I'll bundle up a .zip file of code for you for each class session. Be sure to download the code, study it, run it, and modify it. That's the best way to learn the ideas we are studying!

Benefits and Costs of Prefix Notation.

The prefix notation that Scheme uses has several advantages over other notations, such as infix:

One problem with prefix notation is that sometimes there are so many parentheses that we get lost:

(* (* (+ 3 5) (- 3 (/ 4 3))) (- (* (+ 4 5) (+ 7 6)) 4))

If we really try, we could probably figure this out, but it is not clear what this means upon a casual reading. In order to help with this kind of confusion, most Scheme programmers adopt some sort of indentation standard that allows them to read programs easily. If we write the above expression over a number of lines and indent them carefully, we can more easily see how it should be evaluated:

                          (* (* (+ 3 5)
                                (- 3 (/ 4 3)))
                             (- (* (+ 4 5)
                                   (+ 7 6))
                                4))

Of course, this isn't the only way to indent this expression, and you may prefer another style. But I think anyone will agree that this is "better" than the original. It doesn't matter what style of indentation you use in this course as long as your meaning can be clearly understood.

Let your editor help you, too. Dr. Racket will indent your code for you, and you can use the Racket > Reindent menu option to indent code that you've edited in strange ways. You'll have to indent data expressions on your own.

Finally, we can use abstraction to hide such complexity as well. Would anyone really write this arithmetic expression in a program? Probably not. If we did have use for such an expression, then we would almost certainly attribute meaning to parts of the expression, say, to the sub-expression (* (+ 4 5) (+ 7 6)). Why not name it what it means? When we do, two levels of nesting -- and 6 parentheses -- disappear from the expression.

                          (define area
                            (* (+ 4 5)
                               (+ 7 6)))

                          (* (* (+ 3 5)
                                (- 3 (/ 4 3)))
                             (- area 4))

One of the ways that Scheme programmers avoid this sort of parenthesis paralysis is to use local names and write shorter, more expressive code. Another is to keep the bodies of our procedures short, so that they do not introduce too much nesting. These are the things you will want to learn as we proceed.

By the way, Dr. Racket tells me that the answer is 4520/3.



Wrap Up



Reading: The Read-Eval-Print Loop

Scheme is an interactive language. Rather than writing a large program as a whole and then translating it "in batch", in Scheme one writes a little piece of program, translates it, and then does the same with another piece. Quite different from batch-style programming, this allows quick turnaround in software development -- which accounts for the prevalent use of interactive languages in (rapid) prototyping.

Do you use any interactive languages? Some of you probably do, Python, Perl, and shell scripting among them. If you're lucky, you may even have tried Ruby.

An interactive language can be either interpreted or compiled. So can batch languages, but there usually isn't much point in interpreting them. Dr. Racket both interprets and compiles: It interprets expressions but compiles procedure definitions. The result is faster response time than pure interpretation. Generally, though, I will refer to "the Scheme interpreter".

Perl and Java are interpreted, but they aren't generally used in an interactive fashion. Scheme's cousin Common Lisp is another interactive language that uses both interpretation or compilation.

What does the Scheme interpreter do with the Scheme expressions that it reads? It evaluates them! The Scheme interpreter works in much the same way that any interactive language interpreter works:

  1. It reads an expression.
  2. It evaluates the expression.
  3. It prints the value of the expression.

The "top-level" behavior of the interpreter is to cycle through this sequence of actions repeatedly. In Scheme this is implemented recursively. A simplified version of the process might look like:

    (define run
      (lambda ()
        (print (eval (read)))
        (run)))

This cycle is called the read-eval-print loop. This behavior is the foundation for all that we will be doing with language interpretation in this course. Be sure to understand it!

Reading Expressions

The technical term for the read step is parsing, which translates the string of characters that the user enters into a data structure on which the evaluator can operate. During this semester, we will discuss how syntax is described, and we will use Scheme's parser for our own interpreters, but we will not dig into the mechanics of parsing. They are properly a topic of the compiler course, 810:155 Translation of Programming Languages -- a course I strongly recommend!

Evaluating Expressions

The evaluation of expressions is the main focus of this course. The basic mechanism for evaluation is one with which you are probably already familiar, if only by example in other languages: To evaluate the expression (operator operand1 operand2 ...),

  1. Evaluate each of the subexpressions, including the operator.
  2. Apply the leftmost result (which, in most of our code, will be a procedure) to the operands.

Because some operators are special forms, we know that a Scheme interpreter must evaluate the operator first and the operands next. Otherwise, it wouldn't know whether to use the standard procedure application rule or the special form's rule.

eval is recursive. Evaluating the subexpressions means making a recursive call to eval for each expression. If one of the subexpressions is a compound expression, then it, too, is evaluated in this same way. Simplicity can give rise to power...

This explanation is a bit oversimplified, because it omits answers to some potentially important questions. For example, in what order are the subexpressions evaluated? But this general algorithm will serve as a workhorse for us this semester.

Quick Exercise: Can you think of a situation in which the order of evaluation matters? Think of a scenario in another programming language, such as Java. (Try x + ++x.)

Printing Expressions

Printing the result of evaluating an expression -- which is itself an expression! -- is the inverse of reading one. The printer must translate the internal representation of the expression into a string of characters that can be written to the terminal. Again, we will largely rely on the Scheme interpreter's printing mechanism this semester.

The exact interface to any Scheme interpreter is implementation-specific. But the basic read (an expression) - eval(uate the expression) - print (the result) loop is a constant.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... January 17, 2013