CS 3540: Session 4

Session 4
Racket Data Structures

Opening Exercise: Choosing a Value

Suppose that we are writing a function to compute the letter grade for a student in Programming Languages, given the student's final percentage score. It might be 0.95, or 0.77, or 0.83.

Write a Racket expression to fill in the blank:

(define letter-grade-for
  (lambda (value)
    ;; FILL IN THE BLANK
  ))

For example:

> (define student-grade (/ 248 285))
> (letter-grade-for student-grade)
'B
> (letter-grade-for 0.95)
'A
> (letter-grade-for 0.77)
'C

(Don't tell me you've already forgotten our grading scale...)

Solutions: Choices in Racket

In Session 3, we saw conditional expressions and so know that Racket has an if expression. We can use nested if's to implement this choice:

(if (>= student-grade 0.90)
  'A
  (if (>= student-grade 0.80)
      'B
      (if (>= student-grade 0.70)
          'C
          (if (>= student-grade 0.60)
              'D
              'F))))

With several levels of nesting and simple values, this problem is an even better match for a cond expression:

(cond ((>= student-grade 0.90) 'A)
      ((>= student-grade 0.80) 'B)
      ((>= student-grade 0.70) 'C)
      ((>= student-grade 0.60) 'D)
      (else 'F))

This expression is more compact and easier to read. A cond expression can be helpful in cases where we have many alternatives, especially when the expressions are relatively short.

Quick Exercise: Is cond a function or a special form? How do we know?

Hint: If it were a function, how would its arguments be evaluated?

Racket provides a few other selection operators, too, including a case expression :

> (case (random 6)
  ((0) 'zero)
  ((1) 'one)
  ((2 3 4) 'few)
  (else 'many))
'many

case tries to match its key value against the members of one or more lists. We can't use this operator for our letter grade function because we can't list all the possible values for the student grade.

All we will need this semester to make choices are if and cond. You'll find that I generally use if for two-way choices and cond for multi-way choices. You may use whichever you find more helpful in any given situation.

Bonus Racket: the exact->inexact function.

A Quick Review of Rackunit

Notice how I use check-equal? expressions in the letter-grade-for code file to record the examples I created for the exercise. I mentioned Rackunit in Session 3's notes as a way of checking our expectations in code. It also makes for a simple, lightweight unit testing framework, so we will be using it for most of the code we write — beginning with Homework 2.

check-equal? does not work as well for comparing floating-point values, which cannot always be represented exactly in a computer. Rackunit provides another test operator, check-=, which works only for numbers. It takes a third argument, a tolerance, and then checks to see if its first two arguments are within that tolerance value. We will want to use check-= whenever the function we are testing returns a floating-point number.

Rackunit defines a number of other useful testing functions, some of which we will use over the course of the semester. I use check-true and check-false in the homework02.rkt template file to test in-range?, which returns boolean values.

When writing code, don't break the tests. Your code may fail a test, but it should always load and run without causing a Racket error. Keep in mind this partial hierarchy of correctness.

Hint: When you begin working on Homework 2, the tests for all five functions will fail. You can only work on one function at a time, and meanwhile every time you run your code all of the other tests will fail. So: comment out tests for problems you aren't solving yet. Don't forget to uncomment them when you start working on the function!

More Than A Hint: Use the exact filenames given in the assignment. Reasons include software engineering and grading.

Basic Data Structures in Racket

In addition to primitive and compound expressions that deal with behavior, most programming languages also provide one or more primitive data types and ways of combining and abstracting them. For any data type, we are interested in:

the set of values represented by the type, and
the set of operations that can be executed on that type.

For pragmatic reasons, we are also often interested in how the values are represented textually when written in programs and when displayed to users. Later, we will classify the different operations on a data type according to what they do.

We briefly discussed some of Racket's primitive atomic data types in Session 2. With a few exceptions, you should recognize those types from other programming languages. For your reference, I have provided an on-line summary of Racket's atomic data types.

Programming languages usually also provide aggregate data types. A data aggregate consists of a group of data objects. You may think of an aggregate data type as a means of combining data objects into larger structures. In Python, you probably used lists, dictionaries, and perhaps tuples. All are data types that aggregate other values into groups. In Java, you might have used arrays and instances of various classes such as ArrayList.

Sometimes, something can look like a separate data type but not be one. For example, in C and C++, we can create and use arrays. But an array is really derived from pointers to objects in heap-allocated memory. The array notation is created for the convenience of programmers. To us C and C++ programmers, arrays seem like a data type, even if they aren't. They are syntactic sugar — an idea we'll explore in some detail in a few weeks.

Racket provides three primary aggregate data types: pair, vector, and hash table. Racket also provides the list, which is derived from the pair for programming convenience. We will use pairs and lists extensively this semester, vectors occasionally, and hash tables not at all.

The Pair

In Racket, we can construct an aggregate of two data objects called a pair. After we make a pair, we can refer to it by name, use it as an argument to a function, and in general treat it just like we'd treat any atomic object, such as a number.

The function that creates a pair is called cons. This function takes two arguments of any type and returns a pair consisting of the two items. If we have a pair, we can access the individual parts using the functions car and cdr, which refer to the first part of the pair and the second part of the pair, respectively. These names are purely historical.

We call cons a constructor and car and cdr accessors. These terms may be familiar to many of you from Java or C++ programming. All aggregate data types we discuss in this course, whether Racket's built-in types or the ones we create, will have both constructors and accessors.

We might use cons, car, and cdr as follows:

> (define a (cons 10 20))
> (define b (cons 3 7))
> (car a)
10
> (car b)
3
> (cdr a)
20
> (cdr b)
7
> a
(10 . 20)                   ; "dotted pair" notation
> (define c (cons a b))
> (car (car c))
10
> (car c)
(10 . 20)                   ; there's that dot again
> c
((10 . 20) 3 . 7)
> (cdr (car c))
20
> (cdr c)
(3 . 7)
> (cdr a)
20
> (cdr (car a))
cdr: contract violation
  expected: pair?
  given: 10
> (car (car a))
car: contract violation
  expected: pair?
  given: 10

What do those error messages tell us about car and cdr? For one thing, they are strongly typed; they accept only pairs as an argument. A contract violation is Racket's way of telling us that we sent a value of an unexpected type to a function.

Box and Pointer Diagrams

We can use pictures to help us understand the operation of cons. You surely have done something similar in your study of linked data structures... If not, we have done you a great disservice! One of the best ways to understand how code manipulates a data structure is to draw a picture.

The pictures we draw are called box and pointer diagrams, for reasons that will be obvious.

The box and pointer diagram for a in the preceding example would look like this:

a cons cell labeled 'a with a car of 10 and a cdr of 20

Similarly, the box and pointer diagram for b would look like:

a cons cell labeled 'b with a car of 3 and a cdr of 7

Of course, a cons cell can point to something other than numbers, as c showed us:

a cons cell labeled 'c with a car pointing to the 'a cell and a cdr pointing to the 'b cell

A pair really is an aggregate containing pointers to two values. Once we gain experience programming with pairs, though, we rarely have to think about them at this level. The operations we use hide the details of memory reference from us. However, understanding pairs at this level is quite helpful when we need to figure out why a piece of code doesn't work the way we think it should. That happens a lot while we are learning.

Some Practice with `cons`

First, use the box and pointer diagrams shown above to understand the interaction shown in the preceding example. In particular, can you tell why each error occurred?

Then:

Draw box and pointer diagrams for this sequence of expressions, in order:

(define x (cons 1 2))
(define y (cons x x))
(define z (cons y 4))
(define w (cons z (cons y x)))

We will find that pairs are really useful in our programming this semester, but mostly because they are the elementary components of lists.

Lists

In abstract terms, a list is an ordered sequence of elements. In some languages, the elements in a list must all be the same type. In Racket, elements of a list may be of any type. For example, we can make a list that contains numbers, booleans, symbols, and strings:

'(1 #t eugene 2 #f "wallingford")

This is consistent with Racket's dynamic typing. It also turns out to be enormously useful as we write programs.

Here are some examples of lists:

()          The empty list -- a null pointer
(3)         A list with one element: the number 3
(3 4)       A list with two elements: the number 3 and the number 4
(3 #t)      A list with two elements: the number 3 and the boolean #t
(3 4 5)     A list with three elements...
((3 4 5))   A list with one element!  (The element is a list.)

That last example may remind you of Homework 1...

Lists have a particular form in memory. If we draw a box and pointer diagram for the list (3 4 5) it looks like this:

a series of three cons cells chained by their cdr links, with cars of 3, 4, and 5, respectively

A list is a pair whose second item is also a list.

We will look at the implications of this definition in more detail later.

We can get our pairs to look like the above box and pointer diagram if we use cons in this way:

> (cons 3 (cons 4 (cons 5 '())))     ;; why quote the innermost ()?
'(3 4 5)

We can also make a list using the list constructor, the function list:

> (list 3 4 5)
'(3 4 5)

The list function constructs a new list containing the zero or more arguments passed to it.

Now you may begin to see why we draw diagrams of this sort. Notice that the diagram for the cdr of a list of numbers is topologically similar to the diagram for the entire list. Later we will benefit from this similarity when we write recursive programs. Racket lists are an example of how the design of a data structure can make it easier to write the algorithms we want to write — an idea that you learn about in your courses on data structures and algorithms.

Interlude: Quotation

If you can't hear me,
it's because I'm in parentheses.

— Steven Wright

Let's take a small detour to consider the question in the comment above: Why do we quote the innermost ()? This is one of the most common questions students have as they learn about Racket lists. Quotation, which we discussed briefly last session, is a small feature of Racket that simplifies so many of our interactions. You will come to see that it is indispensable.

As we just learned, Racket has a list function that takes n arguments and returns a list containing those n items.

> (list 2 4 6 5 2)
'(2 4 6 5 2)

Calls to list can be nested to create arbitrarily complex list structures:

> (list (list 2 4 (list 6 5) 2) 4 7 (list 4 5) 9 2 (list 1))
'((2 4 (6 5) 2) 4 7 (4 5) 9 2 (1))

> (list (list 2 4 (list 6 (list 1 2 3 4) 5) 2) 4 7
        (list 4 (list 2 8) 5) 9 2 (list 1))
'((2 4 (6 (1 2 3 4) 5) 2) 4 7 (4 (2 8) 5) 9 2 (1))

In both of these examples, we use list to create a list of items. The term "list" doesn't add any information about the structure of the list, though; it merely tells Racket to build it. It would be nice if we didn't have to use list in these cases -- if we could specify the structure and content of the list directly. That would save us a bunch of function calls (but not the parentheses that a nested list entails!)

For example, instead of ...

(define a-tree (list 3 4 3 1 2))

... we might like to write:

(define a-tree (3 4 3 1 2))

Unfortunately, the Racket interpreter would complain. Why? It would try to evaluate the second argument to define, which is a parenthesized prefix expression. So it evaluates the first item in the list, to determine whether it is a function or a special form. When it is neither, we receive an error message.

In such cases, the Racket interpreter can't tell the difference between (3 4 3 1 2) and any other list, such as

(define a-number (* 4 3 1 2))

This situation follows from the fact that, in Racket, data and programs look alike! Lists look just like expressions in Racket. As we will later learn, this common form for data and program is one of the sources of Racket's power. It also means, though, that we need a way to tell the interpreter that some expressions are meant to be taken literally and not evaluated. We do this with the quote special form. Using quote, we can define the two lists above as literal expressions:

(define a-tree (quote ((3 4) (3 (1 2)))))
(define another-tree (quote (* 3 (+ 1 2))))

Since we use quote so frequently, Racket provides shorthand in the form of a special character, ':

(define a-tree '((3 4) (3 (1 2))))
(define another-tree '(* 3 (+ 1 2)))

The quote character in Racket, much as in English, tells Racket to take the next symbol, list, whatever, literally. This gives us the power to create lists of symbols as well as numbers. Take, for example, the following exchange with the interpreter:

> (define x 56)
> (define y 30)
> (list x y)
(56 30)
> (list 'x 'y)
(x y)
> (list 'x y)
(x 30)
> (list x 'y)
(56 y)

Lists created as literals are just like lists created using cons and list. We can perform all the expected operations on them. For instance:

> (car '(3 4 5))
3

Deep-Thinking Exercise: When can we not use a quote in place of a call to list?

More on Lists

Because lists are built out of pairs, we can use car and cdr to access the elements of a list. Remember that car returns whatever the first pointer points to and that cdr returns whatever the second pointer points to. Because a list is a pair whose second item is also a list, we know that the cdr of a list will always be a list, too.

However, when we are working on lists, we usually prefer to think in terms of lists and the items they contain, not in terms of pairs. Racket provides first and rest as synonyms for car and cdr. They are different in one respect: car and cdr accept any pair as an argument, but first and rest work only with a list, which we know to be a pair whose second item is also a list. rest has one further restriction: it works only with lists that are not empty. That may seem like a big restriction, but in practice it turns out not to be. Empty lists are usually a special case, and we will want to handle them separately in our code.

For instance, on the (3 4 5) list we saw earlier, car returns a 3 and cdr returns another list, (4 5). Note that this is different than when we made a pair of numbers and the car and cdr both pointed to numbers.

Let's give our list a name and play with it, using car, cdr, and cons to access elements and construct new lists:

> (define 3-to-5 (list 3 4 5))
> (car 3-to-5)
3
> (cdr 3-to-5)
(4 5)
> (car (cdr 3-to-5))
4
> (car (cdr (cdr 3-to-5)))
5
> (cons 2 3-to-5)
'(2 3 4 5)
> 3-to-5
'(3 4 5)

The following expressions do not produce lists:

> (cons 3-to-5 5)
((3 4 5) . 5)
> (cons (cdr 3-to-5) (car 3-to-5))
((4 5) . 3)
> (cons 3 4)
(3 . 4)

Why not? We saw the dot notation earlier when playing with pairs. Here, it lets us see that the last item in a pair is not a list.

Quick Exercise: Draw box-and-pointer diagrams for these examples. From your pictures and the definition of a Racket list, it should be clear that these are not lists.

Note that the last pointer in the last pair of any list is something called nil. Look back to when we used only cons to form the list (3 4 5). The innermost cons looked like this: (cons 5 '()). These two anomalies are different views on the same idea: We need to be able to talk about lists that contain no items.

What is nil? The word "nil" is a contraction of the Latin word for nothing -- and it means just that. We use it to represent the empty list. In box-and-pointer diagrams, we indicate a pointer to an empty list with a slash.

Now we have all the parts we need to define a list more completely:

The empty list is a list.
A non-empty list is a pair whose cdr is a list.

This is an inductive definition. We will return to the idea of inductive definitions soon and take great advantage of this definition for lists when we write recursive programs to manipulate them.

Having the last element in each of our lists be the empty list is important from both a practical and a theoretical standpoint. It is nice in theory because that means the cdr of every list is always another list. The last element in the list, therefore, has to be a list -- but, since it contains no items, it must be the empty list.

It will be nice in practice because we will define many operations that use recursion to process lists. Our base case can usually check for the empty list.

Sample Problems

If you would like a bit more practice, draw box-and-pointer diagrams for the following objects:

(list 1 2)
(cons 1 (list 2))
(cons 1 (cons 2 nil))
(cons (list 3 4) (cons 3 (cons 4 (list 4 5))))

A Functional Data Structure

A few words on the Racket list as a functional data structure...

Lists in Python are mutable. If you change the value of one its elements, you change the list itself.

We can create mutable lists in Racket, but that's not usually what we do. Racket's standard lists do not work that way. In particular, that is not how cons, first/rest, and car/cdr work.

Suppose I make a list:

> (define list-1 '((2 4 (6 (1 2 3 4) 5) 2) 4 7 (4 (2 8) 5) 9 2 (1)))

and want to "replace" its first item:

> (cons 42 (rest list-1))
'(42 4 7 (4 (2 8) 5) 9 2 (1))

There is only one copy of the '(4 7 (4 (2 8) 5) 9 2 (1)) part of these lists. Sharing works if we don't allow code to modify values in the list.

What if I "replace" the second item in list-1?

> (cons (first list-1) (cons 42 (rest (rest list-1))))
'((2 4 (6 (1 2 3 4) 5) 2) 42 7 (4 (2 8) 5) 9 2 (1))

A function computes a value; it does not change the world around it. Functional data structures enable us to write functions that compute values with compound data. Not sharing substructures is more expensive if you want all of the copies to exist at the same time: you'd have to actually make copies! Garbage collection takes care of the rest for us. As I've noted a couple of times already this semester, pure functions work better in a multithreaded, multiprocessor world. Functional data structures are essential for this purpose.

Functional data structures are useful in the world, too. Many of you use Git for version control. The best way to understand how Git works is to realize that a Git repository is a purely functional data structure!

Wrap Up

Reading
- Review today's lecture notes, especially any parts we did not cover in class.
- Then read about this short introduction to vectors, which is Racket's indexable data structure.

Homework
- Homework 2 is available and due next session. Note the due time!

Session 4 Racket Data Structures