Session 4

Racket Data Structures

CS 3540
Programming Languages and Paradigms

Opening Exercise

Suppose that we are writing a function to compute the letter grade for a student in Programming Languages, giving the student's final percentage score. It might be 0.95, or 0.77, or 0.83.

Write a Racket expression to fill in the blank:
        (define letter-grade-for
          (lambda (value)
            ;; FILL IN THE BLANK

For example:

    > (define student-grade (/ 248 285))
    > (letter-grade-for student-grade)
    > (letter-grade-for 0.95)
    > (letter-grade-for 0.77)

(Don't tell me you've already forgotten our grading scale...)


This is a choice, so it maps onto a Racket if expression:

    (if (>= student-grade 0.90)
        (if (>= student-grade 0.80)
            (if (>= student-grade 0.70)
                (if (>= student-grade 0.60)

With several levels of nesting, this problem is an even better match for a cond expression:

    (cond ((>= student-grade 0.90) 'A)
          ((>= student-grade 0.80) 'B)
          ((>= student-grade 0.70) 'C)
          ((>= student-grade 0.60) 'D)
          (else 'F))

This expression is more compact and easier to read.

Racket provides a few other selection operators, including a case expression:

    > (case (random 6)
        ((0) 'zero)
        ((1) 'one)
        ((2) 'two)
        (else 'many))
All we will need this semester, though, are if and cond.

Basic Data Structures in Racket

In addition to primitive and compound expressions that deal with behavior, most programming languages also provide one or more primitive data types and ways of combining and abstracting them. For any data type, we are interested in:

  1. the set of values represented by the type, and
  2. the set of operations that can be executed on that type.

For pragmatic reasons, we are also often interested in how the values are represented textually when written in programs and when displayed to users. Later, we will classify the different operations on a data type according to what they do.

We briefly discussed some of Racket's primitive data types in Session 2. With a few exceptions, you should recognize these types from other programming languages. For your reference, I have provided an on-line summary of Racket's atomic data types.

Programming languages usually also provide aggregate data types. A data aggregate consists of a group of data objects. You may think of an aggregate data type as a means of combining data objects into larger structures. In Python, you probably used lists, dictionaries, and even tuples; all are aggregate data types. In Java, you might have used arrays and classes.

Sometimes, something can look like a separate data type but not be one. For example, in C++ you can create and use arrays. But an array is really derived from pointers to objects in heap-allocated memory. The array notation is created for the convenience of programmers. To us C++ programmers, arrays seem like a data type, even if they aren't. They are syntactic sugar -- an idea we'll explore in some detail in a few weeks.

Racket provides three primary aggregate data types: pair, vector, and hash table. Racket also provides the list, which is derived from the pair for programming convenience. We will use pairs and lists extensively this semester, vectors occasionally, and hash tables not at all.

The Pair

In Racket, we can construct an aggregate of two data objects called a pair. After we make a pair, we can refer to it by name, use it as an operand to a procedure, and in general treat it just like we'd treat any atomic object, such as a number.

The function that creates a pair is called cons. This function takes two arguments of any type and returns a pair consisting of the two items. If we have a pair, we can access the individual parts using the functions car and cdr, which refer to the first part of the pair and the second part of the pair, respectively. These names are purely historical [ note ].

We call cons a constructor and car and cdr accessors. These terms should be familiar to many of you from Java or C++ programming. All aggregate data types we discuss in this course, whether Racket's built-in types or the ones we create, will have constructors and accessors.

We might use cons, car, and cdr as follows:

    > (define a (cons 10 20))
    > (define b (cons 3 7))
    > (car a)
    > (car b)
    > (cdr a)
    > (cdr b)
    > (define c (cons a b))
    > (car (car c))
    > (car c)
    (10 . 20)                   ; "dotted pair" notation
    > c
    ((10 . 20) 3 . 7)
    > (cdr (car c))
    > (cdr c)
    (3 . 7)
    > (cdr a)
    > (cdr (car a))
    cdr: contract violation
      expected: pair?
      given: 10
    > (car (car a))
    car: contract violation
      expected: pair?
      given: 10

What do those error messages tell us about car and cdr? For one thing, they are strongly typed; they accept only pairs as an argument. A contract violation is Racket's way of telling us that we sent a value of an unexpected type to a function.

Box and Pointer Diagrams

We can use pictures to help us understand the operation of cons. You surely have done something similar in your study of linked data structures... If not, we have done you a great disservice! One of the best ways to understand how code manipulates a data structure is to draw a picture.

The pictures we draw are called box and pointer diagrams, for reasons that will be obvious.

The box and pointer diagram for a in the preceding example would look like this:

Similarly, the box and pointer diagram for b would look like:

Of course, a cons cell can point to something other than numbers, as c shows:

A pair really is an aggregate containing pointers to two values. Once we gain experience programming with pairs, though, we rarely have to think about them at this level. The operations we use hide the details of memory reference from us. However, understanding pairs at this level is quite helpful when we need to figure out why a piece of code doesn't work the way we think it should. That happens a lot while we are learning.

Some Practice with cons


In abstract terms, a list is an ordered sequence of elements. In some languages, the elements in a list must all be the same type. In Racket, elements of a list may be of any type. For example, we can make a list that contains numbers, booleans, and symbols:

    '(1 #t 'eugene 2 #f 'wallingford)
This is consistent with Racket's dynamic typing. It also turns out to be enormously useful as we write programs.

Here are some examples of lists:

    ()          The empty list -- a null pointer
    (3)         A list with one element: the number 3
    (3 4)       A list with two elements: the number 3 and the number 4
    (3 #t)      A list with two elements: the number 3 and the boolean #t
    ((3 4 5))   A list with one element!  (The element is a list.)

Lists have a particular form in memory. If we draw a box and pointer diagram for the list (3 4 5) it looks like this:

A list is a pair whose second item is also a list.

(We will look at the implications of this definition in more detail later.)

We can get our pairs to look like the above box and pointer diagram if we use cons in this way:

    > (cons 3 (cons 4 (cons 5 '())))     ;; why quote the innermost ()?
    '(3 4 5)

We can also make a list using the list constructor, the standard procedure list:

    > (list 3 4 5)
    '(3 4 5)

The list procedure constructs a new list containing the zero or more arguments passed to it.

Now you may begin to see why we draw diagrams of this sort. Notice that the diagram for the cdr of a list of numbers is topologically similar to the diagram for the entire list. Later we will benefit from this similarity when we write recursive programs. Racket lists are an example of designing data structures in a way that makes it easier to write the algorithms we want to write -- an idea that you learn about in your courses on data structures and algorithms.

Interlude: Quotation

If you can't hear me,
it's because I'm in parentheses.

-- Steven Wright

Steven Wright -- right!

Let's take a small detour to consider the question in the comment above: Why do we quote the innermost ()? This is one of the most common questions students have as they learn about Racket lists. The quotation is a small feature of Racket that simplifies so many of our interactions. You will come to see that it is indispensable.

Recall that Racket has a list function that takes n arguments and returns a list with those n items in it.

    > (list 2 4 6 5 2)
    '(2 4 6 5 2)

Calls to list can be nested to create arbitrarily complex list structures:

    > (list (list 2 4 (list 6 5) 2) 4 7 (list 4 5) 9 2 (list 1))
    '((2 4 (6 5) 2) 4 7 (4 5) 9 2 (1))

    > (list (list 2 4 (list 6 (list 1 2 3 4) 5) 2) 4 7
            (list 4 (list 2 8) 5) 9 2 (list 1))
    '((2 4 (6 (1 2 3 4) 5) 2) 4 7 (4 (2 8) 5) 9 2 (1))

In each of these situations, we use list to create a list of items. The term "list" doesn't add any information about the structure of the list, though; it merely tells Racket to build it. It would be nice if we didn't have to use list in these cases -- if we could specify the structure and content of the list directly. That would save us a bunch of function calls (but not the parentheses that a nested list entails!)

For example, instead of ...

    (define a-tree (list 3 4 3 1 2))

... we might like to write:

    (define a-tree (3 4 3 1 2))

Unfortunately, the Racket interpreter would complain. It would try to evaluate the second argument to define, because it can't tell the difference between that expression and any other list, such as

    (define a-number (* 4 3 1 2))

The problem is that, in Racket, data and programs look alike! Lists look just like expressions in Racket. As we will later learn, this common form of data and program is one of the sources of Racket's power, but it also means that we need a way to tell the interpreter that some expressions are meant to be taken literally and not evaluated. We do this with the quote special form. Using quote, we can define the two lists above as so:

    (define a-tree (quote ((3 4) (3 (1 2)))))
    (define another-tree (quote (* 3 (+ 1 2))))

Since we use quote so frequently, Racket provides shorthand in the form of a special character, ':

    (define a-tree '((3 4) (3 (1 2))))
    (define another-tree '(* 3 (+ 1 2)))

The quote character in Racket, much as in English, tells Racket to take the next symbol, list, whatever, literally (but not "literally"). This gives us the power to create lists of symbols as well as numbers. Take for example, the following exchange with the interpreter:

    > (define x 56)
    > (define y 30)
    > (list x y)
    (56 30)
    > (list 'x 'y)
    (x y)
    > (list 'x y)
    (x 30)
    > (list x 'y)
    (56 y)

Lists created as literals are just like lists created using cons and list. We can perform all the expected operations on them. For instance:

     > (car '(3 4 5))

Quick Exercise: When can we not use a quote in place of a call to list?

More on Lists

Because lists are built out of pairs, we can use car and cdr to access the elements of a list. Remember that car returns whatever the first pointer points to and that cdr returns whatever the second pointer points to. Because a list is a pair whose second item is also a list, we know that the cdr of a list will always be a list, too.

However, when we are working on lists, we sometimes like to think in terms of list items. Racket provides first and rest as synonyms for car and cdr, though rest works only with lists that are not empty. That may seem like a big restriction, but it turns out not to be in practice. Empty lists are a special case that we handle separately.

(Racket also provides functions for accessing the second through tenth items of a list!)

For instance, on the above diagram, car returns a 3 and cdr returns another list, (4 5). Note that this is different than when we made a pair of numbers and the car and cdr both pointed to numbers.

Let's give our list a name and play with it, using car, cdr, and cons to access elements and construct new lists:

    > (define 3-to-5 (list 3 4 5))
    > (car 3-to-5)
    > (cdr 3-to-5)
    (4 5)
    > (car (cdr 3-to-5))
    > (car (cdr (cdr 3-to-5)))
    > (cons 2 3-to-5)
    '(2 3 4 5)

The following do not result in lists:

    > (cons 3-to-5 5)
    ((3 4 5) . 5)
    > (cons (cdr 3-to-5) (car 3-to-5))
    ((4 5) . 3)
    > (cons 3 4)
    (3 . 4)

Why not? We saw the dot notation earlier when playing with pairs. Here, it exposes to us that the last item in a pair is not a list.

Quick Exercise: Draw box-and-pointer diagrams for these examples. From your pictures and the definition of a Racket list, it should be clear that these are not lists.

Note that the last pointer in the last pair of any list is something called nil. Look back to when we used only cons to form the list (3 4 5). The innermost cons looked like this: (cons 5 '()). These two anomalies are different views on the same idea: We need to be able to talk about lists that contain no items.

What is nil? The word "nil" is a contraction of the Latin word for nothing -- and it means just that. We use it to represent the empty list. In box-and-pointer diagrams, we indicate a pointer to an empty list with a slash.

Now we have all the parts we need to define a list:

This is an inductive definition. We will return to the idea of inductive definitions soon and take great advantage of this definition for lists when we write recursive programs to manipulate them.

Having the last element in each of our lists be the empty list is important from both a practical and a theoretical standpoint. It is nice in theory because that means the cdr of every list is always another list. The last element in the list, therefore, has to be a list -- but, since it contains no items, it must be the empty list.

It will be nice in practice because we will define many operations that use recursion to process lists. Our base case can usually check for the empty list.

Sample Problems

If you would like a bit more practice, draw box-and-pointer diagrams for the following objects:

A Functional Data Structure

A few words on the Racket list as a functional data structure.

... lists in Python are mutable. We can change their value.

... Racket lists can be mutated, but that's not how cons, first/rest, and car/cdr work.

Make a list. "Replace" its first item. Copies. Sharing works if we don't allow code to modify values in the list.

Replace the *second* item.

Functional data structures: functions, multithreading. Not sharing is more expensive if you want all the copies to exist at the same time. Garbage collection takes care of the rest.

Many of you use Git for version control. Many people argue that the best way to understand Git is as a purely functional data structure. (If you'd like to see a quick introduction to immutable lists, read that post up to the heading Getting Git. If you want to understand Git better, keep reading.)

Wrap Up

Eugene Wallingford ..... ..... January 18, 2018