Session 13
An Application of Structural Recursion: Variable Binding
Quick Review of Homework 5
Problems
2,
3,
and
4
create operations for X-lists that we could implement over
flat lists using map
and apply
. The
mutual recursion pattern
makes them almost as easy to implement over the nested lists.
Implement one or two — max-length
.
The solution to
prefix->infix
is so very simple. Follow the data structure...
Congratulations! You have written your first translator. This
function converts a legal program in one form into a legal program
in another form. At its heart, this is what a compiler does. The
output of prefix->infix
is a legal Python program.
We can use the same idea to write
another prefix expression translator
that translates Racket expressions into legal programs in languages
like Forth and Joy. (We will see Joy again later in the semester).
Where We Are
For the last few sessions, we have been discussing different techniques for writing recursive programs, all based on the fundamental technique of structural recursion. Last time, we introduced a new topic in the study of programming languages: the static properties of variables. That included the definition of a little programming language that will serve as our testbed for studying the topic.
We review that reading in class.
Connect the idea to Racket free variables.
Today, we use our techniques for writing recursive programs to write a program that processes programs in our little language. Our task is straightforward:
Does a variable occur bound in a given piece of code?
When we write programs to process other programs, we see quickly why knowing how to write recursive programs is so important: Programming language specifications are almost always highly inductive!
Formal Definitions of Free and Bound Variables
As we learned last time, if a program feature is static, then its value can be determined by looking at the text of a program. A person can look at the code, of course, but what about another program? The text of a program is data, so we ought to be able to give the text as input to another program that determines the value of a static feature. This is just what compilers, type checkers, IDEs, and all sorts of other programming tools do: examine a program to extract its static features.
Let's write another program that takes a program as input. The input program will be written in the little language we saw last time:
<exp> ::= <varref> | (lambda (<var>) <exp>) | (<exp> <exp>)
We will define a function named
(occurs-bound? v exp)
, which answers this question:
Does a given variable referencevar
occur bound in expressionexp
from the little language?
Writing this function will help us in at least three ways:
- It will help us to understand the definitions of free and bound variables more deeply, by seeing the definitions come alive in a piece of code that implements them. The working code will also be a testbed for experimenting with the little language.
- It will help us to see how the same recursive programming techniques we've been learning can be used to process a program written in some language.
- It will help us see again that we can write a program to process other programs.
Last time, we learned the terms occurs bound and occurs free. A variable "is bound" or "occurs bound" in an expression if it refers to the formal parameter in an expression that contains it. A variable reference "is free" or "occurs free" in an expression if it occurs but is not bound.
To write code that implements these definitions, though, we need more formal definitions of occurs free and occurs bound. Because our language definition is inductive, we can give these terms inductive definitions, too.
First, occurs bound:
A variablev
occurs bound in an expressionexp
if and only if:By definition, no variable occurs bound in an expression that consists of a single variable reference.
exp
is of the form(lambda (var) body)
and either
v
occurs bound inbody
, orv
occurs free inbody
andv
is the same asvar
.exp
is of the form(exp1 exp2)
andv
occurs bound in eitherexp1
orexp2
.
Now, occurs free:
A variablev
occurs free in an expressionexp
if and only if:
exp
is a variable reference and is the same asv
exp
is of the form(lambda (var) body)
,v
is different fromvar
, andv
occurs free inbody
.exp
is of the form(exp1 exp2)
andv
occurs free in eitherexp1
orexp2
.
With these definitions, we are ready to write our function.
Syntax Procedures for the Little Language
But wait... How can we know if
exp
is of the form(lambda (var) body)
?
We could implement a function to verify that exp
is a
list of size three, whose first item is the symbol
lambda
, whose second item is a list of one symbol, and
whose third item is a legal expression in the language. Such code
will obscure our definition of what it means to "occur bound" and
make it much harder to read!
Indeed, we will be using Racket lists to represent two different
kinds of expression. Some lists denote
lambda
expressions. Other lists denote applications
of functions to arguments.
This will require us to use many car
s and
cdr
s, or first
s and rest
s,
or second
s and third
s to access parts of
the data. What's worse, they will mean different things in the
different parts of the same function!
This data type begs us to use the Syntax Procedures design pattern. I ask you to read about this pattern for next time. For now, we will see it in action on our problem.
Before we begin to implement our solution, I have created these syntax procedures for our little language. There are three kinds of syntax procedure in the file:
- type predicates, which test whether an expression is a variable reference, a function, or an application,
- access procedures, or "accessors", which extract the parts of compound expressions (functions and applications), and
- constructor functions, or "constructors", which create an an object out of its parts.
Demonstrate the syntax procedures.
We are used to Racket data types having type predicates — for
example, symbol?
, number?
, and
list?
. We have also seen that Racket provides access
procedures for its data structures: for example, car
and cdr
, first
and rest
, and
vector-ref
. Finally, we have also seen that Racket
provides constructors for its data structures, such as
cons
for pairs and list
for lists. I have
simply defined analogous functions for our data type, the syntax of
the little language.
These procedures allow us to write occurs-bound?
in terms of the little language, rather than in terms of Racket's
car
s and cdr
s, first
s and
rest
s. It lets us think only about the problem spec
and the language, not the underlying implementation. The
difference will be noticeable.
Implementing occurs-bound?
Finally, we are ready to begin writing occurs-bound?
.
As always, we base our function on the inductive definition of the data type it manipulates. An expression in the language can be one of three alternatives. Following the Structural Recursion pattern, our function will make a three-way choice, with one arm in the function for each arm in the definition.
Option: Discuss a fourth case, error
.
Let's use a cond
expression instead of an
if
, to simplify the layout of our code:
(define occurs-bound? (lambda (s exp) (cond ((varref? exp) ;; handle a variable reference ) ((app? exp) ;; handle an application ) (else ;; handle a lambda expression ) )))
I swapped the order for handling applications and
lambda
s because the definition of "occurs bound?" is
simpler in the application case than in the lambda
case. Putting default cases and other simple cases at the top of
a function makes it easier to read. I also like doing this because
it encourages me solve the easier cases first.
Handling variable references is easy. Our definition says, No variable occurs bound in an expression consisting of a single variable reference, so:
(define occurs-bound? (lambda (s exp) (cond ((varref? exp) #f) ((app? exp) ;; handle an application ) (else ;; handle a lambda expression ) )))
How can a variable occur bound in a function application? The
application itself doesn't bind a variable; it is simply a list
of two expressions. So s
can occur bound in an
application only if it occurs bound either in the function
expression or in the argument expression:
(define occurs-bound? (lambda (s exp) (cond ((varref? exp) #f) ((app? exp) (or (occurs-bound? s (app->proc exp)) (occurs-bound? s (app->arg exp))) ) (else ;; handle a lambda expression ) )))
The toughest case is the lambda
expression.
s
can occur bound in a lambda
in two
different ways. s
can occur bound within the body of
the lambda
OR it can occur free in the body and
be the same as the formal parameter of the lambda
expression.
(define occurs-bound? (lambda (s exp) (cond ((varref? exp) #f) ((app? exp) (or (occurs-bound? s (app->proc exp)) (occurs-bound? s (app->arg exp)))) (else ; lambda (or (occurs-bound? s (lambda->body exp)))) (and (eq? s (lambda->param exp)) (occurs-free? s (lambda->body exp))) )))
Notice that the definition of occurs-bound?
calls
occurs-free?
. This is another example of mutually
recursive functions. Here, though, the mutual recursion
results not from two data definitions that are mutually inductive,
but because the definitions for the two terms are themselves
mutually inductive!
In order to test this solution, we need to define
occurs-free?
, too. I've done that for you, with the
function given in the code download for today. However, try to
write occurs-free?
on your own first before you read
it. Doing so will give you some practice doing what we have
just done. Then look at my solution, compare them, and make sure
you understand any differences.
There are several things to notice about this function:
Notice how the use of structural recursion made this code relatively easy to write. It told us which cases to consider and, when we are considering each, we don't have to think about the other two cases at all.
Notice how the use of syntax procedures made this code relatively
easy to write. They enabled us to program using the same terms
that are used in the definitions. While writing the function, we
had to think only of the definition of bound variables; we didn't
have to worry about which sequence of first
s and
rest
s to use in order to manipulate the underlying
list implementation. Furthermore, if we decide to change the
underlying representation of programs to some other data structure,
we won't need to modify this code at all. We will need only to
write syntax procedures for the new representation.
Notice, too, how the use of syntax procedures makes this code relatively easy to read. We can read it in much the same way as we read the prose definition of occurs bound?. Understanding the code requires as little reliance on the syntax of Racket lists as possible, because it follows our language grammar and the definition of our terms to the tee. This is an example of how using a program to describe a concept can be just as clear as a prose definition, if not clearer. And, because it is executable, we can verify that it is unambiguous. (Run the tests!!)
Today's zip file includes
source code
for occurs-bound?
and occurs-free?
. Play
with these functions, both to be sure you understand how to write
such code and also to be sure you understand the ideas of free and
bound variables. For example...
A Study Question for Quiz 2: Areoccurs-free?
andoccurs-bound?
inverses of one another? That is, for a single expressionexp
:Why or why not?
- If
(occurs-bound? exp)
is true, then(occurs-free? exp)
is always false.- If
(occurs-bound? exp)
is false, then(occurs-free? exp)
is always true.
Unbound Variables
I said last time that we cannot evaluate an expression containing
a completely free variable, because
at run-time, the variable needs to have a binding.
Such a free variable needs to be bound within an enclosing
expression or at the "top-level". Racket primitives are like
that. Symbols such as car
and +
are
free in our expressions, but they are bound to their primitive
values at the top level of the REPL.
(By the way... How do you think that works?)
Technically, my statement is not quite true. We can evaluate an expression that contains a free variable — as long as the variable is never evaluated. How could that happen?
Here are two trivial examples:
> (if (zero? 0) 1 foo) 1 > (and #f foo) #f
foo
is unbound, but it will never be evaluated. The
value of this if
expression is always 1
,
and the evaluation rule for the special form if
never
evaluates foo
. This works even when foo
is not bound at the top level.
The rest of this section for home study.
See if you can follow the argument.
Don't worry; I won't ask you to do gymnastics like this on the quiz!
Let's try a bit of Racket mental gymnastics. Can you create an expression that:
- doesn't use a conditional,
- contains an unbound variable, and
- whose value is not affected by the value of the unbound variable?
We could try a lambda
expression:
(lambda (x) y)
y
is unbound—but if no one ever applies the
lambda
expression to an argument, then y
will never be evaluated!
But if someone does apply the lambda
to an argument,
the interpreter will evaluate the y
. So the value of
applying the lambda
depends on the value of
y
. That's another wrinkle. Can we iron it out?
Here is a hint: Suppose I have a function that makes no use of its formal parameter. That is, its value is independent of the value of any argument that is passed to it. Here is an example:
(lambda (x) (lambda (y) y))
This function takes some value, binds it to x
, and
then ignores it, returning a lambda
expression that
doesn't refer to it.
Can you use this idea to create an expression that contains a free variable and whose value doesn't depend on the value of that variable? How about this example?
( (lambda (x) (lambda (y) y)) x )
The value of this expression doesn't depend on the value of
x
! Alas, unlike the if
expression and
first lambda
expression above, the Racket interpreter
will evaluate the x
— in order to pass
it as an argument. This does, however, show us that the value of
an expression can be independent of the value of a free variable
it contains. That is a step forward.
Now we know what we have to do: write a function that:
- takes an argument containing a free variable that is never evaluated, and
- never evaluates its argument.
If we pass a lambda
expression that contains a free
variable, the variable won't be evaluated until the
lambda
is applied. But if the receiving function
never uses its argument, the free variable will never be
evaluated!
So:
> ( (lambda (x) (lambda (z) z)) (lambda (x) y) ) #<procedure>
The y
in the function passed as an argument is free,
but never evaluated. We can apply this function to any argument:
> (((lambda (x) (lambda (z) z)) (lambda (x) foo)) 'x) 'x
Yes, this is only an academic exercise. You won't ever need such a function, certainly not in this course. Dr. Racket won't even let you do it in source code! But it's a useful little puzzle to help us explore and understand better the idea of free and bound variables.
Sometimes, computer scientists like to play fun little games that other people might not see as fun! I hope you at least find it instructive.
Wrap Up
-
Reading
-
Study these notes, especially the section in which we build
the
occurs-bound?
function. - Read a mini-lecture on syntax procedures. This idea may seem natural to you, especially if you have experience with object-oriented programming. It is an important element of data abstraction in many styles. Syntax procedures are also especially helpful in making Racket programs more readable!
-
Study these notes, especially the section in which we build
the
-
Homework
- Homework 5 was due yesterday.
- Homework 6 is available and due on Monday. It gives you more practice writing recursive programs and working with expressions in the little language. Ask questions early, so you can finish in time to begin studying for the next quiz.