Session 21
Creating New Syntax

or:   Racket, A Language for Making Languages

Introduction

When teaching you a new language, especially one that is very different from the languages you already know, professors often try to convince you that you can do all the things you are used to doing in the languages you know — Python, Java, ...

But then you may wonder, why do I need a new language at all?

What makes Racket different? Compelling? Why learn it?

We have seen higher-order functions and the idea that code==data, but there is more.

This is the story... of how Lisp-like languages really do come from a different place, and how they are inspiring the designers of other languages.

This session is about ideas. The code I show you is to illustrate those ideas. I certainly won't ask you to write code like this on Quiz 3. But please try to understand the ideas.

The Set-Up

In Unit 3, you wrote a preprocessor for the little language we studied. Recently you have been writing and extending a preprocessor and an evaluator for the Huey language. Racket has a preprocessor and an evaluator, too.

We've also seen that Racket exposes its machinery to us in ways that other languages usually do not. We can add new functions and operators to the language, thus affecting how the evaluator works. What would it be like to add syntax to the language, thus affecting how the preprocessor works?

Example 1: A Python-Like for Loop

Instead of writing many for loops, Python programmers write list comprehensions. For example:

roots = [sqrt(i) for i in range(0, 10)]

is equivalent to this Python for loop:

roots = []
for i in range(0, 10):
    roots.append(sqrt(i))

More generally, we can think of the for loop as:

for var in lst:
    exp-using-var

It might be nice to add a Python-like for loop to Racket, such as:

(for i in (range 0 10):
  (sqrt i))

or, more generally:

(for var in lst:
  exp-using-var)
Sidenote: Racket already has a fine set of for-loops. They have many different features, depending on our needs. This is just a simple example for us to explore, and to see how those loops work.

This functional loop is equivalent to a Racket map expression:

(map (lambda (i)
        (sqrt i))
     (range 0 10))

This an example of a syntactic abstraction. We can write code to translate the abstraction into a core form:

(for var in lst :    =====>     (map (lambda (var)
     exp-using-var)                    exp-using-var)
                                     lst)

Opening Exercise: Make It So

a photo of Captain Picard from Star Trek: The Next Generation sitting in his chair on the bridge
Captain Picard says: "Make it so."
Write a function for-to-map that takes as input an expression of the form
(for <var> in <lst> : <exp>)
and returns an expression of the form
(map (lambda (<var>) <exp>)
     <lst>)
for, in, and : are all symbols.

For example:

> (for-to-map '(for i in lst : exp))
'(map (lambda (i) exp) lst)

> (for-to-map '(for n in (range 0 10):
                    (sqrt n)))
'(map (lambda (n) (sqrt n))
      (range 0 10))

Note: The input is always a list of size 6, and the output is always a list of size 3. All you need are the list function and a few list accessors.

Implementing for-to-map

We can write a simple list-to-list translator that converts the for loop to an equivalent map expression:

(define for-to-map
  (lambda (for-exp)
    (let ((var (second for-exp))
          (lst (fourth for-exp))
          (exp (sixth for-exp)))
      (list 'map
            (list 'lambda (list var) exp)
            lst))))

This code handles only the surface syntax of the new form. To add it to the language, we'd have to recursively translate the form. But this simple function alone demonstrates the idea of translational semantics, and shows just how easy it can be to convert a simple syntactic abstraction into an equivalent core form.

This enables me to pass in code with Python's for syntax and produce executable Racket code, as a Racket list.

We can do this! We have the technology — and you have the knowledge to write the preprocessor. Racket's simple, parenthesized syntax helps us here.

If only we could build this process into the language somehow: remove the friction, and let Racket do most of the work.

We can.

Implementing for-to-map as Racket Syntax

Racket gives us a better option. The syntax-rules operator enables us to define patterns of the form:

pattern → expansion

and add them to Racket's preprocessor.

Here is the for-to-map "transformer" we wrote as a Racket function written using syntax-rules:

(define-syntax for-p
  (syntax-rules (in :)
    ( (for-p var in lst : exp)
        (map (lambda (var) exp) lst) )  ))

So easy. So powerful. And relatively clear, even if you have never seen the syntax-rules operator before. Look at the two patterns...

This does more than translate surface syntax in the form of a Racket list; it enables the Racket language processor to expand the expression in place and execute the result:

> (for-p i in (range 0 10):
    (sqrt i))
'(0
  1
  1.4142135623730951
  ...
  2.8284271247461903
  3)

syntax-rules lets us write a syntax transformer that translates (or expands) a syntactic abstraction into a core expression. Historicsally, and in many other languages, such transformers are called macros.

Notice, though: This happens before run-time:

a graph showing the read-preprocess-evaluate pipeline NEED ALT TEXT
a graph showing the read-preprocess-evaluate pipeline

Other languages have preprocessors, too. For example, C's preprocessor provides operators such as include, ifndef, and define. The preprocessor does a simple text replacement of the macro pattern with its expansion.

Lisp — Racket's grandparent — offered that and more, though also at a low level than Racket.

This is what I mean when I sat that Racket is language for making languages. It gives us operators that define syntax at the level of the code we want to be able to write.

You can find both the for-to-map function and the for-p macro in this file.

Example 2: A Wordy if Expression

Now, let's try something more practical for us to use.

Back in Session 4, we wrote an if expression to solve the opening exercise:

(if (>= student-grade 0.90)
    'A
    (if (>= student-grade 0.80)
        'B
        (if (>= student-grade 0.70)
            'C
            (if (>= student-grade 0.60)
                'D
                'F))))

We were just learning to write Racket expressions, so this was good practice. With a cond expression, we can write something a bit shorter:

(cond ((>= student-grade 0.90) 'A)
      ((>= student-grade 0.80) 'B)
      ((>= student-grade 0.70) 'C)
      ((>= student-grade 0.60) 'D)
      (else 'F))

That's better, but... still wordy. Many languages include a case statement that switches on a single variable. Racket does, too:

(case transaction
  ('withdraw withdraw)
  ('deposit  deposit)
  ('balance  balance)
  (else      error))

Unfortunately for us, Racket's case looks for an exact match, so it can't help us with our grade evaluator. What we'd like to write is something like this:

(range-case student-grade
  ((>= 0.90) 'A)
  ((>= 0.80) 'B)
  ((>= 0.70) 'C)
  ((>= 0.60) 'D)
  (else 'F))

and have it generate the if expression for us.

What can we do? After the last few weeks, we know how to wrote code that translates a range-case expression into an equivalent cond or if. To make this available in our code, though, we might imagine that we would have to add an arm to the Racket preprocessor. This is risky (what if we break it?) and potentially quite difficult (how big would the Racket preprocessor be?).

Racket adopts a different approach: it lets the programmer instruct the preprocessor by defining a new special form. We have seen several of Racket's primitive special forms:

Racket lets us define new syntax.

Implementing a range-case Expression

Take a look at a solution for range-case.
Study the parts of the macro.

We can use the functions expand-once and expand-once to see how Racket's preprocessor translates the abstraction into a core form:

> (expand-once #'(range-case taxable-income
                   ((<=  12000) '(     0 0.044     0.00))
                   ((<=  60000) '( 12000 0.0482  528.00))
                   ((<= 150000) '( 60000 0.057  2841.60))
                   (else        '(150000 0.06   7971.60))))

(if (<= taxable-income 12000)
    '(0 0.044 0.0)
    (range-case taxable-income
      ((<= 60000)  '(12000 0.0482 528.0))
      ((<= 150000) '(60000 0.057 2841.6))
      (else        '(150000 0.06 7971.6))))

This approach works great if we are choosing a value based on a single value, such as an identifier. But if id is a compound expression, it will be repeated throughout the generated code — and this evaluated multiple times. Can we do better?

Yes! We can evaluate the key expression once and bind its value to a new local variable, to save recomputation. See the new version of range-case at the bottom of the source file linked above. This special form uses the original range-case to do the recursive work. Most important, Racket guarantees to use a local variable name that does not collide with any name in the range-case expression. This is good hygiene.

Racket Macros

Racket enables us to define pattern → expansion templates as new special forms. To support complex forms:

And keep in mind: this is all happening before run-time.

Implementing a Different range-case Expression

Note: We do not cover this in class.

What if we decide we want a more verbose syntax, such as:

((0.90 1.00) 'A)
((0.80 0.90) 'B)
...

This would allow for non-sequential and overlapping ranges.

We can do that. This solution defines range-case to use a different pattern and a different expansion template.

Change the pattern, change the translation, BOOM! A new special form.

Don't worry about the details of the code. We won't be defining our own syntax this semester. But please note: This is just Racket code. We are using the language we are writing in to extend the language we are writing in — on the fly.

Macros in Other Languages

Other languages have macros. What languages with macros are you likely to encounter?

Old-Style Macros

C and assembly language have rudimentary macro systems, implemented as text-based preprocessors. The C preprocessor works by simple textual search-and-replace at the token, rather than the character level. This allows some powerful forms of conditional processing, but working at the token level creates problems. If you are interested in learning more, check out the bonus reading for today.

If you publish research papers in CS, you might use a tool named LaTeX. TeX is a computer typesetting system written by Donald Knuth in the 1970s and 1980s. LaTeX is a derivative of TeX, with most of its functionality implemented as macros in TeX.

Macros at this low level are hard to work with, are error prone, are not always as powerful as we'd like.

Embeddable Languages

PHP is one of the most common languages used on the web. PHP programs are embedded into HTML files. The PHP processor recognizes code fragments using the markers [?php and ?] and executes the code to modify or extend the HTML.

More generally, programs in embeddable languages can be embedded in free-format text, or in the source code of other languages. This is similar to a textual macro language, but embeddable languages are usually much more powerful, as they are full-featured programming languages. (Racket comes with Scribble, a tool for for writing documents that allow Racket expressions as embedded programs.)

Modern Macro Systems

Among programming languages created in the last decade or so, Rust and Elixir stand out for their hygienic syntax-level macro systems. Rust excels at systems programming, while Elixir has found a niche in web development. Both languages have borrowed ideas from Racket and adapted them to the specific syntax and capabilities of those languages.

Even among these new languages with modern macro systems, though, Racket stands out.

Creating a New Language in Racket

My notes for the rest of this section are incomplete. I will complete and improve them soon.

We have just seen that Racket lets us modify its expander, the preprocessor that translates syntactic sugar into core expressions.

If we could also modify Racket's reader, we could define an entirely different language using Racket.

We can.

Racket is a language-making language. It treats languages as libraries to be loaded, mixed, and matched.

This is one example.

Matthew Butterick is a lawyer, a programmer, and a typographer. Then he decided to write a book called Practical Typography. He could have used Word or Latex, but neither gave him the flexibility or even the power he wanted. As a programmer, he knew he didn't have to settle for other people's tools. So he went looking for programming languages to use. Nothing seemed quite right.

Then he discovered Racket. Racket is a language-making language, so he decided to create his own publishing system, which became an entire language within Racket: Pollen.

Note: You will need to install Pollen to run this code. Use the menu command File | Install Package.... Type pollen into the Package Source box and click Install. When it's done, relaunch DrRacket.

Demonstrate Pollen.

Remember: All the reading and expanding happens before run-time.

Now we can write Pollen files, er, programs, and run them in Racket. Butterick has written two books using Pollen, at the same time creating wonderful web sites from the same source code:

If you want to learn more about how to make languages such as Pollen, check out Beautiful Racket. It's a very good book.

This is one example of a document language written in Racket. In Session 28, we will see a programming language written in Racket — one that doesn't use Racket's parenthesized prefix notation!

Wrap Up