Session 18

Variable References


CS 3540
Programming Languages and Paradigms


Where Are We?

Last time, we examined the idea of recursive local procedures. Like other local variables, they are a syntactic abstraction. Unlike other local variables, they require something more complex than a simple rewrite to an application of a lambda expression.

Recall that a syntactic abstraction is a feature of a language that is convenient for programmers to have but not essential to the expressiveness of the language. In the last few weeks, we have learned that a number of standard language features are really syntactic abstractions of more primitive features, including:

Today you learn that variable names are not necessary: they are really syntactic sugar. I'll support this claim by showing how a piece of code without explicit variable references can convey the same information as one that uses variable names.

Before our main discussion, let's riff on declared-vars, is-declared?, and free-vars, functions you wrote for Homework 6, Exam 2, and Homework 7, respectively. They extended ideas we first saw when we wrote occurs-bound? and occurs-free? in Session 13. All of these functions deal with variable references and thus prepare us to discuss the names of variables in more detail. After our main discussion, we will finish the day by reviewing Homework 7.



Opening Exercise

All of the functions listed in the previous paragraph do static analysis, the processing of a program to extract some meaningful information from the code. They all ask about variable declarations and references. Our topic today is variable references, so let's write another function of that sort as a warm-up.

Write the structurally recursive function (all-varrefs exp) that returns a set of all variable references in exp.

all-varrefs takes as input an expression in the original subset of the little language, defined by this BNF expression:

     <exp> ::= <varref>
             | (lambda (<var>) <exp>)
             | (<exp> <exp>)

all-varrefs returns a set of all the variable references that occur in exp, whether free or bound. For example:

    > (all-varrefs 'x)
    '(x)

    > (all-varrefs '(lambda (y) (x y)))
    '(x y)

    > (all-varrefs '(square x))
    '(x square)

    > (all-varrefs '((lambda (y) (sqrt y))
                     (sqrt x)))
    '(x y sqrt)

Assume that you have a set ADT with these operations:



A Solution

How do we start? ...   Structural Recursion.

    (define all-varrefs
      (lambda (exp)
        (cond ((varref? exp) ...)
              ((lambda? exp) ...)
              (else          ...) ; app
        )))

Notice: There is no '() case! The empty list is not a legal expression in teh little language.

Now, for each arm, we need to return a set of variable references that occur when we see an expression of that type. Where can variable references occur...

... in the varref? arm? The expression is a variable reference:

    (define all-varrefs
      (lambda (exp)
        (cond ((varref? exp) (set exp))
              ((lambda? exp) ...)
              (else          ...) ; application
        )))

... in the lambda? arm? Anywhere in the body:

    (define all-varrefs
      (lambda (exp)
        (cond ((varref? exp) exp)
              ((lambda? exp) (all-varrefs (lambda->body exp))
              (else          ...) ; application
        )))

... in the app? arm? In either the procedure part or the argument part:

    (define all-varrefs
      (lambda (exp)
        (cond ((varref? exp) exp)
              ((lambda? exp) (all-varrefs (lambda->body exp))
              (else          ; application
                (set-union (all-varrefs (app->proc exp))
                           (all-varrefs (app->arg  exp))))))))

We are done. Is this a difficult problem? It involves only one idea, a variable reference. This is an idea that you understand well in at least one or two other languages, plus Racket. It does require you to think about a language and its grammar -- which is the point of this course.

Notice: There are no car's and no cdr's! We access the parts of an expression using our syntax procedures for the little language. This program doesn't need to know anything about the concrete syntax of the language, only the parts of each kind of expression.

To solve problems of this sort, you must understand the BNF description of the language's syntax. This is not new; for several weeks, we wrote recursive procedures that required us to study and understand the BNF definition of the data structure being processed. It helps a lot to study and understand the code that we have written in class.

As always, let structure of the data guide you. It helps you ask the right questions.

Quick exercise: What if we want our answer to contain repeats, so that we could how many times each name appears? Hint: We can't use a set for our answer anymore...

If you would like to get more practice processing programs in this little language, return to Session 13 and the sessions since Exam 2. You can re-implement any of the functions we create or talk about there. If you'd like other ideas, feel free to ask.



Lexical Addresses

Last time, we returned to the idea of a variable's scope, which we often use to refer to the part of a program where a variable can be seen. We saw that most modern languages structure regions as blocks that can lie in sequence or contain other blocks.

For example, consider this lambda expression:

    (lambda (x y)         ;; Block 0
       ((lambda (a)       ;; Block 1
           (x (a y)))
        x))

Block 1 lies within Block 0. The reference to a in in Block 1 is to the variable declared in that block. The references to x and y are to variables declared in Block 0. In this sense, the variable references x and y are "deeper" than the reference a, because the corresponding declarations are one block farther away.

If we decided to change the names of the variables declared in Block 0 from x and y to, say, foo and bar, we would need to change the references to x and y as well. Whatever we call them, x or foo refers to the first variable declared in the block, and y or bar refers to the second.

With these ideas of depth and position, we can reason ourselves to an interesting conclusion: each variable reference can be determined uniquely by what block the variable is declared in and the position of that declaration within the block. Let's make this concept more concrete.

The lexical address of a variable reference gives the depth of the reference from the block in which the variable was declared and the position of the variable's declaration in that block. A lexical address can take the form (v : d p), where

We will treat depth and position as zero-based counters. That is, the depth tells us how many block boundaries we must cross to get from a variable reference to its declaration, and the position tells us how many steps we need to take down the list of local declarations to find the declaration.

For example, in our lambda expression above:

    (lambda (x y)         ;; Block 0
       ((lambda (a)       ;; Block 1
           (x (a y)))
        x))

The x in the last line is in Block 0 and refers to the parameter x declared in Block 0, which is the first declaration in that block. So the address of this x has depth 0 and position 0, or (x : 0 0).

The references to x and y in the third line are also to the parameters declared in Block 0. They appear in Block 1, but no declarations in that block shadow the original declarations. So the addresses of those references are (x : 1 0) and (y : 1 1), respectively.

Finally, the reference to a in the same line is to the formal parameter of Block 1, so its address is (a : 0 0).

Note that this idea is not specific to Racket, to a language with lambda expressions, or to a language with a copious number of parentheses. The lambda expression above is similar in form to this Java-like code:

    {                            // Block 0
      Classname1 x = ...;
      Classname2 y = ...;

      {                          // Block 1
        Classname1 a = ...;
        x.doOneThing( a.doAnotherThing(y) );
      }

      return x;
    }

Exercises

Determine the lexical addresses of the variable references in the following expressions:

     (lambda (f g)                             ;; Problem 1
        (lambda (x)
           (f (g x))))

     ((lambda (x) (x 3))                       ;; Problem 2
      (lambda (x) (* x x)))

     (lambda (x)                               ;; Problem 3
        (lambda (y)
           ((lambda (x)
               (x y))
            x)))

     (define x                                 ;; Problem 4 ... sample usage:
        (lambda (x)                            ;;    > (x '(1 2 3))
           (map (lambda (x) (add1 x)) x)))     ;;    (2 3 4)



Removing Variable References

We could annotate the variable references in a lambda expression with each reference's lexical address. Consider the first lambda expression in the previous section:

     (lambda (x y)
        ((lambda (a)
            ((x : 1 0) ((a : 0 0) (y : 1 1))))
         (x : 0 0)))

This sort of annotation can be useful to a program that manipulates this expression, because the variable names themselves are meaningless inside the machine. For example, a compiler needs to be able to compute the location of a referenced variable in memory so that it can write the addressing code into the assembly language it generates. A lexical address could be part of that computation.

But can we go even one step further and remove the variable references altogether? Let's see...

     (lambda (x y)
        ((lambda (a)
            ((: 1 0) ((: 0 0) (: 1 1))))
         (: 0 0)))

Have we lost any information? Um... no! Each lexical address specifies exactly which formal parameter is referred to at each point in the code. Given a lexical address, we can look up the associated variable name.

Very cool! The code in blocks no longer refers to variables by name, but we can reconstruct the names if we need them.

As strange as this may sound, let's ask the next question: Can we eliminate the variable names themselves? Let's see. What happens if we replace the parameter list with a number that indicates how many parameters are declared:

     (lambda 2
        ((lambda 1
            ((: 1 0) ((: 0 0) (: 1 1))))
         (: 0 0)))

Have we lost any information? Yes -- but maybe no, if we look at the program from a different perspective.

On this latter point: We have lost nothing. We can still compute the same answers that we were able to compute before.

The semantics of the expression -- its meaning when executed -- have been preserved. This must mean that variable names are syntactic sugar. We can translate any piece of code that uses names into a behaviorally-equivalent form that uses no variable names. Our examples here demonstrate the variable names really are a syntactic abstraction.

I remember my first encounter with this idea. It came in graduate school when I was learning to program in Smalltalk. I brought up a debugger on a block of code that had been compiled, only to find that all of my variable names and parameter names had been replaced with the generic names t1, t2, and so on.

    multiplyAndScale: t1 and: t2 
      "multiplies the given numbers and returns the result times 4"
      | t3 |
      t3 := t1 * t2.
      ^t3 * 4

In order to show me my source code, the debugger had to re-construct the code from its bytecode-compiled form, in which all identifiers had been replaced with lexical addresses. It could not re-create my identifier names, so it just created a sequence of unique symbols to put in their places. Not too surprisingly, the code behaved just the same anyway.

Ironically, the idea of that variable names are a syntactic abstraction demonstrates just how important identifier names can be to programmers. Imagine having to read and write expressions containing only lexical addresses, or randomly generated variable names... You may feel similarly disoriented while reading some of my code. I know I sometimes feel equally reading yours. :-)

Replacing variable references with lexical addresses carries the "not-very-descriptive variable name" problem to its comic extreme. Even so, try to learn from them. Remember that code with poorly-named variables can begin to look like this pretty quickly to readers who are unfamiliar with your code, or are otherwise unprepared to interpret the variable names you have chosen.

Use variable names that are as descriptive as possible when you write your code, for the benefit of human readers, knowing that your interpreter or compiler will eliminate them from the internal representation of your code.

Quick Question: Why did I use numbers in place of the variable declarations? What do these numbers help me do or know?

Exercises

Write Racket expressions that are equivalent to the following lexical address expressions from which variable names have been removed:

     (lambda 1                                    ;; Problem 1
        (lambda 1
           (: 1 0)))

     (lambda (x)                                  ;; Problem 2
        (lambda (x)
           (: 1 0)))

     (lambda 3                                    ;; Problem 3
        ((: 0 1) ((lambda 2
                     ((: 0 0) (: 1 0) (: 0 1)))
                  (: 0 2))))



Closing

What are lexical addresses used for?

In this course, they help us to understand how our languages work, what is essential and what is not. Beyond this course, this idea is used when implementing interpreters, compilers, and IDEs such as Eclipse. Syntactic analysis is an essential part of practical software engineering whenever code and structured data are involved.

Even variable names are syntactic sugar.

Let that sink in. It helps us to see why the three things every programming language has doesn't contain the concrete items that we all might expect to find on the list. If not even variable names are essential, then what constitutes the core of a programming language really is different than our previous experience may have led us to believe.



A Few Thoughts on Homework 7

Why write make-varref?

Language processors are software, too. When we write them, we benefit from applying the lessons we learn throughout computer science: make-varref is part of our data abstraction.

Why do true and false show up as free variables?

Because our syntax procedures still treat them as variable references. We need to update the syntax procedures and the core language processors to reflect that we have added a new core type to the language: booelans.

Why are we doing all of this?

    concrete syntax of full LL → [    syntax procedures   ] →
    abstract syntax of full LL → [       preprocessor     ] →
    abstract syntax of core LL → [ analysis and execution ]

... a quick run through the code, especially booleans, and anything else you want.



Wrap Up



Eugene Wallingford ..... wallingf@cs.uni.edu ..... March 8, 2018