Session 4
Racket Data Structures
Opening Exercise: Choosing a Value
Suppose that we are writing a function to compute the letter grade for a student in Programming Languages, given the student's final percentage score. It might be 0.95, or 0.77, or 0.83.
Write a Racket expression to fill in the blank:
(define letter-grade-for (lambda (value) ;; FILL IN THE BLANK ))
For example:
> (define student-grade (/ 248 285)) > (letter-grade-for student-grade) 'B > (letter-grade-for 0.95) 'A > (letter-grade-for 0.77) 'C
(Don't tell me you've already forgotten our grading scale...)
Solutions: Choices in Racket
In Session 3, we saw
conditional expressions
and so know that Racket has an if
expression.
We can use nested if
's to implement this choice:
(if (>= student-grade 0.90) 'A (if (>= student-grade 0.80) 'B (if (>= student-grade 0.70) 'C (if (>= student-grade 0.60) 'D 'F))))
With several levels of nesting and simple values, this problem is
an even better match for a cond
expression:
(cond ((>= student-grade 0.90) 'A) ((>= student-grade 0.80) 'B) ((>= student-grade 0.70) 'C) ((>= student-grade 0.60) 'D) (else 'F))
This expression is more compact and easier to read. A
cond
expression can be helpful in cases where we have
many alternatives, especially when the expressions are relatively
short.
Racket provides a few other selection operators, too, including
a case
expression
:
> (case (random 6) ((0) 'zero) ((1) 'one) ((2 3 4) 'few) (else 'many)) 'many
case
tries to match its key value against the members
of one or more lists. We can't use this operator for our letter
grade function because we can't list all the possible values for
the student grade.
All we will need this semester to make choices are if
and cond
. You'll find that I generally use
if
for two-way choices and cond
for
multi-way choices. You may use whichever you find more helpful
in any given situation.
Bonus Racket:
the exact->inexact
function.
A Quick Review of Rackunit
Notice how I use check-equal?
expressions in
the letter-grade-for
code file
to record the examples I created for the exercise. I mentioned
Rackunit in
Session 3's notes
as a way of checking our expectations in code. It also makes for
a simple, lightweight unit testing framework, so we will be using
it for most of the code we write — beginning with
Homework 2.
check-equal?
does not work as well for comparing
floating-point values, which cannot always be represented exactly
in a computer. Rackunit provides another test operator,
check-=
, which works only for numbers. It takes
a third argument, a tolerance, and then checks to see if its first
two arguments are within that tolerance value. We will want to use
check-=
whenever the function we are testing returns a
floating-point number.
Rackunit defines a number of other useful testing functions, some
of which we will use over the course of the semester. I use
check-true
and check-false
in
the homework02.rkt
template file
to test in-range?
, which returns boolean values.
When writing code, don't break the tests. Your code may fail a test, but it should always load and run without causing a Racket error. Keep in mind this partial hierarchy of correctness.
Hint: When you begin working on Homework 2, the tests for all five functions will fail. You can only work on one function at a time, and meanwhile every time you run your code all of the other tests will fail. So: comment out tests for problems you aren't solving yet. Don't forget to uncomment them when you start working on the function!
More Than A Hint: Use the exact filenames given in the assignment. Reasons include software engineering and grading.
Basic Data Structures in Racket
In addition to primitive and compound expressions that deal with behavior, most programming languages also provide one or more primitive data types and ways of combining and abstracting them. For any data type, we are interested in:
- the set of values represented by the type, and
- the set of operations that can be executed on that type.
For pragmatic reasons, we are also often interested in how the values are represented textually when written in programs and when displayed to users. Later, we will classify the different operations on a data type according to what they do.
We briefly discussed some of Racket's primitive atomic data types in Session 2. With a few exceptions, you should recognize those types from other programming languages. For your reference, I have provided an on-line summary of Racket's atomic data types.
Programming languages usually also provide aggregate data
types. A data aggregate consists of a group of data objects.
You may think of an aggregate data type as a means of
combining data objects into larger structures. In Python,
you probably used lists, dictionaries, and perhaps tuples. All
are data types that aggregate other values into groups. In Java,
you might have used arrays and instances of various classes such
as ArrayList
.
Sometimes, something can look like a separate data type but not be one. For example, in C and C++, we can create and use arrays. But an array is really derived from pointers to objects in heap-allocated memory. The array notation is created for the convenience of programmers. To us C and C++ programmers, arrays seem like a data type, even if they aren't. They are syntactic sugar — an idea we'll explore in some detail in a few weeks.
Racket provides three primary aggregate data types: pair, vector, and hash table. Racket also provides the list, which is derived from the pair for programming convenience. We will use pairs and lists extensively this semester, vectors occasionally, and hash tables not at all.
The Pair
In Racket, we can construct an aggregate of two data objects called a pair. After we make a pair, we can refer to it by name, use it as an argument to a function, and in general treat it just like we'd treat any atomic object, such as a number.
The function that creates a pair is called cons
.
This function takes two arguments of any type and returns a pair
consisting of the two items. If we have a pair, we can access the
individual parts using the functions car
and
cdr
, which refer to the first part of the pair and
the second part of the pair, respectively. These names are
purely historical.
We call cons
a constructor and car
and cdr
accessors. These terms may be familiar
to many of you from Java or C++ programming. All aggregate data
types we discuss in this course, whether Racket's built-in types
or the ones we create, will have both constructors and accessors.
We might use cons
, car
, and
cdr
as follows:
> (define a (cons 10 20)) > (define b (cons 3 7)) > (car a) 10 > (car b) 3 > (cdr a) 20 > (cdr b) 7 > a (10 . 20) ; "dotted pair" notation > (define c (cons a b)) > (car (car c)) 10 > (car c) (10 . 20) ; there's that dot again > c ((10 . 20) 3 . 7) > (cdr (car c)) 20 > (cdr c) (3 . 7) > (cdr a) 20 > (cdr (car a)) cdr: contract violation expected: pair? given: 10 > (car (car a)) car: contract violation expected: pair? given: 10
What do those error messages tell us about car
and
cdr
? For one thing, they are strongly typed; they
accept only pairs as an argument. A contract violation is Racket's
way of telling us that we sent a value of an unexpected type to a
function.
Box and Pointer Diagrams
We can use pictures to help us understand the operation of
cons
. You surely have done something similar in your
study of linked data structures... If not, we have done you a
great disservice! One of the best ways to understand how code
manipulates a data structure is to draw a picture.
The pictures we draw are called box and pointer diagrams, for reasons that will be obvious.
The box and pointer diagram for a
in the preceding
example would look like this:
Similarly, the box and pointer diagram for b
would
look like:
Of course, a cons cell can point to something other than
numbers, as c
showed us:
A pair really is an aggregate containing pointers to two values. Once we gain experience programming with pairs, though, we rarely have to think about them at this level. The operations we use hide the details of memory reference from us. However, understanding pairs at this level is quite helpful when we need to figure out why a piece of code doesn't work the way we think it should. That happens a lot while we are learning.
Some Practice with cons
First, use the box and pointer diagrams shown above to understand the interaction shown in the preceding example. In particular, can you tell why each error occurred?
Then:
(define x (cons 1 2))
(define y (cons x x))
(define z (cons y 4))
(define w (cons z (cons y x)))
We will find that pairs are really useful in our programming this semester, but mostly because they are the elementary components of lists.
Lists
In abstract terms, a list is an ordered sequence of elements. In some languages, the elements in a list must all be the same type. In Racket, elements of a list may be of any type. For example, we can make a list that contains numbers, booleans, symbols, and strings:
'(1 #t eugene 2 #f "wallingford")
This is consistent with Racket's dynamic typing. It also turns out to be enormously useful as we write programs.
Here are some examples of lists:
() The empty list -- a null pointer (3) A list with one element: the number 3 (3 4) A list with two elements: the number 3 and the number 4 (3 #t) A list with two elements: the number 3 and the boolean #t (3 4 5) A list with three elements... ((3 4 5)) A list with one element! (The element is a list.)
That last example may remind you of Homework 1...
Lists have a particular form in memory. If we draw a box and
pointer diagram for the list (3 4 5)
it looks like
this:
A list is a pair whose second item is also a list.
We will look at the implications of this definition in more detail later.
We can get our pairs to look like the above box and pointer diagram
if we use cons
in this way:
> (cons 3 (cons 4 (cons 5 '()))) ;; why quote the innermost ()? '(3 4 5)
We can also make a list using the list constructor, the function
list
:
> (list 3 4 5) '(3 4 5)
The list
function constructs a new list containing the
zero or more arguments passed to it.
Now you may begin to see why we draw diagrams of this sort. Notice
that the diagram for the cdr
of a list of numbers is
topologically similar to the diagram for the entire list.
Later we will benefit from this similarity when we write recursive
programs. Racket lists are an example of how the design of a data
structure can make it easier to write the algorithms we want to
write — an idea that you learn about in your courses on data
structures and algorithms.
Interlude: Quotation
If you can't hear me,
it's because I'm in parentheses.
— Steven Wright
Let's take a small detour to consider the question in the comment above: Why do we quote the innermost ()? This is one of the most common questions students have as they learn about Racket lists. Quotation, which we discussed briefly last session, is a small feature of Racket that simplifies so many of our interactions. You will come to see that it is indispensable.
As we just learned, Racket has a list
function that
takes n arguments and returns a list containing those
n items.
> (list 2 4 6 5 2) '(2 4 6 5 2)
Calls to list
can be nested to create arbitrarily
complex list structures:
> (list (list 2 4 (list 6 5) 2) 4 7 (list 4 5) 9 2 (list 1)) '((2 4 (6 5) 2) 4 7 (4 5) 9 2 (1)) > (list (list 2 4 (list 6 (list 1 2 3 4) 5) 2) 4 7 (list 4 (list 2 8) 5) 9 2 (list 1)) '((2 4 (6 (1 2 3 4) 5) 2) 4 7 (4 (2 8) 5) 9 2 (1))
In both of these examples, we use list
to create a
list of items. The term "list
" doesn't add any
information about the structure of the list, though; it merely
tells Racket to build it. It would be nice if we didn't have
to use list
in these cases -- if we could specify the
structure and content of the list directly. That would save
us a bunch of function calls (but not the parentheses
that a nested list entails!)
For example, instead of ...
(define a-tree (list 3 4 3 1 2))
... we might like to write:
(define a-tree (3 4 3 1 2))
Unfortunately, the Racket interpreter would complain. Why? It
would try to evaluate the second argument to define
,
which is a parenthesized prefix expression. So it evaluates the
first item in the list, to determine whether it is a function or
a special form.
When it is neither, we receive an error message.
In such cases, the Racket interpreter can't tell the difference
between (3 4 3 1 2)
and any other list, such as
(define a-number (* 4 3 1 2))
This situation follows from the fact that, in Racket, data
and programs look alike! Lists look just like expressions
in Racket. As we will later learn, this common form for data and
program is one of the sources of Racket's power. It also means,
though, that we need a way to tell the interpreter that some
expressions are meant to be taken literally and not evaluated. We
do this with the quote
special form. Using
quote
, we can define the two lists above as literal
expressions:
(define a-tree (quote ((3 4) (3 (1 2))))) (define another-tree (quote (* 3 (+ 1 2))))
Since we use quote
so frequently, Racket provides
shorthand in the form of a special character, '
:
(define a-tree '((3 4) (3 (1 2)))) (define another-tree '(* 3 (+ 1 2)))
The quote character in Racket, much as in English, tells Racket to take the next symbol, list, whatever, literally. This gives us the power to create lists of symbols as well as numbers. Take, for example, the following exchange with the interpreter:
> (define x 56) > (define y 30) > (list x y) (56 30) > (list 'x 'y) (x y) > (list 'x y) (x 30) > (list x 'y) (56 y)
Lists created as literals are just like lists created using
cons
and list
. We can perform all the
expected operations on them. For instance:
> (car '(3 4 5)) 3
Deep-Thinking Exercise: When can we not use a
quote in place of a call to list
?
More on Lists
Because lists are built out of pairs, we can use car
and cdr
to access the elements of a list. Remember
that car
returns whatever the first pointer points to
and that cdr
returns whatever the second pointer
points to. Because a list is a pair whose second item is also a
list, we know that the cdr
of a list will always be
a list, too.
However, when we are working on lists, we usually prefer to think
in terms of lists and the items they contain, not in terms of
pairs. Racket provides
first
and
rest
as synonyms for car
and cdr
. They are
different in one respect: car
and cdr
accept any pair as an argument, but first
and
rest
work only with a list, which we
know to be
a pair whose second item is also a list. rest
has one
further restriction: it works only with lists that are not empty.
That may seem like a big restriction, but in practice it turns out
not to be. Empty lists are usually a special case, and we will
want to handle them separately in our code.
For instance, on the (3 4 5)
list we
saw earlier, car
returns a 3
and cdr
returns another list,
(4 5)
. Note that this is different than when we made
a pair of numbers and the car
and cdr
both
pointed to numbers.
Let's give our list a name and play with it, using car
,
cdr
, and cons
to access elements and
construct new lists:
> (define 3-to-5 (list 3 4 5)) > (car 3-to-5) 3 > (cdr 3-to-5) (4 5) > (car (cdr 3-to-5)) 4 > (car (cdr (cdr 3-to-5))) 5 > (cons 2 3-to-5) '(2 3 4 5) > 3-to-5 '(3 4 5)
The following expressions do not produce lists:
> (cons 3-to-5 5) ((3 4 5) . 5) > (cons (cdr 3-to-5) (car 3-to-5)) ((4 5) . 3) > (cons 3 4) (3 . 4)
Why not? We saw the dot notation earlier when playing with pairs. Here, it lets us see that the last item in a pair is not a list.
Quick Exercise: Draw box-and-pointer diagrams for these examples. From your pictures and the definition of a Racket list, it should be clear that these are not lists.
Note that the last pointer in the last pair of any list is
something called nil
. Look back to when we used only
cons
to form the list (3 4 5)
. The
innermost cons
looked like this:
(cons 5 '())
. These two anomalies are different views
on the same idea: We need to be able to talk about lists that
contain no items.
What is nil
? The word "nil" is a contraction of the
Latin word for nothing -- and it means just that. We use it to
represent the empty list. In box-and-pointer diagrams, we indicate
a pointer to an empty list with a slash.
Now we have all the parts we need to define a list more completely:
- The empty list is a list.
- A non-empty list is a pair whose
cdr
is a list.
This is an inductive definition. We will return to the idea of inductive definitions soon and take great advantage of this definition for lists when we write recursive programs to manipulate them.
Having the last element in each of our lists be the empty list is
important from both a practical and a theoretical standpoint. It
is nice in theory because that means the cdr
of every
list is always another list. The last element in the list,
therefore, has to be a list -- but, since it contains no items, it
must be the empty list.
It will be nice in practice because we will define many operations that use recursion to process lists. Our base case can usually check for the empty list.
Sample Problems
(list 1 2)
(cons 1 (list 2))
(cons 1 (cons 2 nil))
(cons (list 3 4) (cons 3 (cons 4 (list 4 5))))
A Functional Data Structure
A few words on the Racket list as a functional data structure...
Lists in Python are mutable. If you change the value of one its elements, you change the list itself.
We can create mutable lists in Racket, but that's not usually what
we do. Racket's standard lists do not work that way.
In particular, that is not how cons
,
first
/rest
, and
car
/cdr
work.
Suppose I make a list:
> (define list-1 '((2 4 (6 (1 2 3 4) 5) 2) 4 7 (4 (2 8) 5) 9 2 (1)))
and want to "replace" its first item:
> (cons 42 (rest list-1)) '(42 4 7 (4 (2 8) 5) 9 2 (1))
There is only one copy of the
'(4 7 (4 (2 8) 5) 9 2 (1))
part of these lists. Sharing works if we don't allow code to
modify values in the list.
What if I "replace" the second item in list-1
?
> (cons (first list-1) (cons 42 (rest (rest list-1)))) '((2 4 (6 (1 2 3 4) 5) 2) 42 7 (4 (2 8) 5) 9 2 (1))
A function computes a value; it does not change the world around it. Functional data structures enable us to write functions that compute values with compound data. Not sharing substructures is more expensive if you want all of the copies to exist at the same time: you'd have to actually make copies! Garbage collection takes care of the rest for us. As I've noted a couple of times already this semester, pure functions work better in a multithreaded, multiprocessor world. Functional data structures are essential for this purpose.
Functional data structures are useful in the world, too. Many of you use Git for version control. The best way to understand how Git works is to realize that a Git repository is a purely functional data structure!
Wrap Up
-
Reading
- Review today's lecture notes, especially any parts we did not cover in class.
- Then read about this short introduction to vectors, which is Racket's indexable data structure.
-
Homework
- Homework 2 is available and due next session. Note the due time!