Session 21
Creating New Syntax
or: Racket, A Language for Making Languages
Introduction
When teaching you a new language, especially one that is very different from the languages you already know, professors often try to convince you that you can do all the things you are used to doing in the languages you know — Python, Java, ...
But then you may wonder, why do I need a new language at all?
What makes Racket different? Compelling? Why learn it?
We have seen higher-order functions and the idea that code==data, but there is more.
This is the story... of how Lisp-like languages really do come from a different place, and how they are inspiring the designers of other languages.
This session is about ideas. The code I show you is to illustrate those ideas. I certainly won't ask you to write code like this on Quiz 3. But please try to understand the ideas.
The Set-Up
In Unit 3, you wrote a preprocessor for the little language we studied. Recently you have been writing and extending a preprocessor and an evaluator for the Huey language. Racket has a preprocessor and an evaluator, too.
We've also seen that Racket exposes its machinery to us in ways that other languages usually do not. We can add new functions and operators to the language, thus affecting how the evaluator works. What would it be like to add syntax to the language, thus affecting how the preprocessor works?
Example 1: A Python-Like for
Loop
Instead of writing many for
loops, Python programmers
write list comprehensions. For example:
roots = [sqrt(i) for i in range(0, 10)]
is equivalent to this Python for
loop:
roots = [] for i in range(0, 10): roots.append(sqrt(i))
More generally, we can think of the for
loop as:
for var in lst: exp-using-var
It might be nice to add a Python-like for
loop to
Racket, such as:
(for i in (range 0 10): (sqrt i))
or, more generally:
(for var in lst: exp-using-var)
Sidenote: Racket already has a fine set of for
-loops.
They have many different features, depending on our needs. This is
just a simple example for us to explore, and to see how those loops
work.
This functional loop is equivalent to a Racket map
expression:
(map (lambda (i) (sqrt i)) (range 0 10))
This an example of a syntactic abstraction. We can write code to translate the abstraction into a core form:
(for var in lst : =====> (map (lambda (var) exp-using-var) exp-using-var) lst)
Opening Exercise: Make It So
for-to-map
that takes as input an
expression of the form
(for <var> in <lst> : <exp>)and returns an expression of the form
(map (lambda (<var>) <exp>) <lst>)
for
, in
, and :
are all symbols.
For example:
> (for-to-map '(for i in lst : exp)) '(map (lambda (i) exp) lst) > (for-to-map '(for n in (range 0 10): (sqrt n))) '(map (lambda (n) (sqrt n)) (range 0 10))
Note: The input is always a list of size 6, and the output is
always a list of size 3. All you need are the list
function and a few list accessors.
Implementing for-to-map
We can write a simple list-to-list translator that converts the
for
loop to an equivalent map
expression:
(define for-to-map (lambda (for-exp) (let ((var (second for-exp)) (lst (fourth for-exp)) (exp (sixth for-exp))) (list 'map (list 'lambda (list var) exp) lst))))
This code handles only the surface syntax of the new form. To add it to the language, we'd have to recursively translate the form. But this simple function alone demonstrates the idea of translational semantics, and shows just how easy it can be to convert a simple syntactic abstraction into an equivalent core form.
This enables me to pass in code with Python's for
syntax and produce executable Racket code, as a Racket list.
- ... run
for-to-map
on a simple expression - ... run the result expression as code
We can do this! We have the technology — and you have the knowledge to write the preprocessor. Racket's simple, parenthesized syntax helps us here.
If only we could build this process into the language somehow: remove the friction, and let Racket do most of the work.
We can.
Implementing for-to-map
as Racket Syntax
Racket gives us a better option. The syntax-rules
operator enables us to define patterns of the form:
pattern → expansion
and add them to Racket's preprocessor.
Here is the for
-to-map
"transformer" we
wrote as a Racket function written using syntax-rules
:
(define-syntax for-p (syntax-rules (in :) ( (for-p var in lst : exp) (map (lambda (var) exp) lst) ) ))
So easy. So powerful. And relatively clear, even if you have
never seen the syntax-rules
operator before. Look
at the two patterns...
This does more than translate surface syntax in the form of a Racket list; it enables the Racket language processor to expand the expression in place and execute the result:
> (for-p i in (range 0 10): (sqrt i)) '(0 1 1.4142135623730951 ... 2.8284271247461903 3)
syntax-rules
lets us write a syntax transformer
that translates (or expands) a syntactic abstraction into
a core expression. Historicsally, and in many other languages,
such transformers are called macros.
Notice, though: This happens before run-time:
Other languages have preprocessors, too. For example, C's
preprocessor provides operators such as include
,
ifndef
, and define
. The preprocessor
does a simple text replacement of the macro pattern with its
expansion.
Lisp — Racket's grandparent — offered that and more, though also at a low level than Racket.
This is what I mean when I sat that Racket is language for making languages. It gives us operators that define syntax at the level of the code we want to be able to write.
You can find both the for-to-map
function and the
for-p
macro in
this file.
Example 2: A Wordy if
Expression
Now, let's try something more practical for us to use.
Back in Session 4, we wrote an if
expression to solve
the opening exercise:
(if (>= student-grade 0.90) 'A (if (>= student-grade 0.80) 'B (if (>= student-grade 0.70) 'C (if (>= student-grade 0.60) 'D 'F))))
We were just learning to write Racket expressions, so this was
good practice. With a cond
expression, we can write
something a bit shorter:
(cond ((>= student-grade 0.90) 'A) ((>= student-grade 0.80) 'B) ((>= student-grade 0.70) 'C) ((>= student-grade 0.60) 'D) (else 'F))
That's better, but... still wordy. Many languages include a
case
statement that switches on a single variable.
Racket does, too:
(case transaction ('withdraw withdraw) ('deposit deposit) ('balance balance) (else error))
Unfortunately for us, Racket's case
looks for an exact
match, so it can't help us with our grade evaluator. What we'd
like to write is something like this:
(range-case student-grade ((>= 0.90) 'A) ((>= 0.80) 'B) ((>= 0.70) 'C) ((>= 0.60) 'D) (else 'F))
and have it generate the if
expression for us.
What can we do? After the last few weeks, we know how to wrote
code that translates a range-case
expression into an
equivalent cond
or if
. To make this
available in our code, though, we might imagine that we would have
to add an arm to the Racket preprocessor. This is risky
(what if we break it?) and potentially quite difficult
(how big would the Racket preprocessor be?).
Racket adopts a different approach: it lets the programmer instruct the preprocessor by defining a new special form. We have seen several of Racket's primitive special forms:
-
Some (
define
,quote
,if
) have syntax that looks just like calling a function, each with its own evaluation rule. -
Others (
lambda
,let
,letrec
) have what appears to be a new syntax.
Racket lets us define new syntax.
Implementing a range-case
Expression
Take a look at
a solution
for range-case
.
Study the parts of the macro.
We can use the functions expand-once
and
expand-once
to see how Racket's preprocessor
translates the abstraction into a core form:
> (expand-once #'(range-case taxable-income ((<= 12000) '( 0 0.044 0.00)) ((<= 60000) '( 12000 0.0482 528.00)) ((<= 150000) '( 60000 0.057 2841.60)) (else '(150000 0.06 7971.60)))) (if (<= taxable-income 12000) '(0 0.044 0.0) (range-case taxable-income ((<= 60000) '(12000 0.0482 528.0)) ((<= 150000) '(60000 0.057 2841.6)) (else '(150000 0.06 7971.6))))
This approach works great if we are choosing a value based on a
single value, such as an identifier. But if id
is a
compound expression, it will be repeated throughout the generated
code — and this evaluated multiple times. Can we do better?
Yes! We can evaluate the key expression once and bind its value
to a new local variable, to save recomputation. See the new
version of range-case
at the bottom of the source file
linked above. This special form uses the original
range-case
to do the recursive work. Most important,
Racket guarantees to use a local variable name that does not
collide with any name in the range-case
expression.
This is good hygiene.
Racket Macros
Racket enables us to define pattern → expansion templates as new special forms. To support complex forms:
- It allows the use of an ellipsis to describe compound patterns.
- It allows one special form to expand to another special form.
- It even allows a syntax rule to be recursive.
And keep in mind: this is all happening before run-time.
Implementing a Different range-case
Expression
Note: We do not cover this in class.
What if we decide we want a more verbose syntax, such as:
((0.90 1.00) 'A) ((0.80 0.90) 'B) ...
This would allow for non-sequential and overlapping ranges.
We can do that.
This solution
defines range-case
to use a different pattern and a
different expansion template.
Change the pattern, change the translation, BOOM! A new special form.
Don't worry about the details of the code. We won't be defining our own syntax this semester. But please note: This is just Racket code. We are using the language we are writing in to extend the language we are writing in — on the fly.
Macros in Other Languages
Other languages have macros. What languages with macros are you likely to encounter?
Old-Style Macros
C and assembly language have rudimentary macro systems, implemented as text-based preprocessors. The C preprocessor works by simple textual search-and-replace at the token, rather than the character level. This allows some powerful forms of conditional processing, but working at the token level creates problems. If you are interested in learning more, check out the bonus reading for today.
If you publish research papers in CS, you might use a tool named LaTeX. TeX is a computer typesetting system written by Donald Knuth in the 1970s and 1980s. LaTeX is a derivative of TeX, with most of its functionality implemented as macros in TeX.
Macros at this low level are hard to work with, are error prone, are not always as powerful as we'd like.
Embeddable Languages
PHP is one of the most common languages used on the web. PHP
programs are embedded into HTML files. The PHP processor
recognizes code fragments using the markers [?php
and ?]
and executes the code to modify or extend the
HTML.
More generally, programs in embeddable languages can be embedded in free-format text, or in the source code of other languages. This is similar to a textual macro language, but embeddable languages are usually much more powerful, as they are full-featured programming languages. (Racket comes with Scribble, a tool for for writing documents that allow Racket expressions as embedded programs.)
Modern Macro Systems
Among programming languages created in the last decade or so, Rust and Elixir stand out for their hygienic syntax-level macro systems. Rust excels at systems programming, while Elixir has found a niche in web development. Both languages have borrowed ideas from Racket and adapted them to the specific syntax and capabilities of those languages.
Even among these new languages with modern macro systems, though, Racket stands out.
Creating a New Language in Racket
My notes for the rest of this section are incomplete. I will complete and improve them soon.
We have just seen that Racket lets us modify its expander, the preprocessor that translates syntactic sugar into core expressions.
If we could also modify Racket's reader, we could define an entirely different language using Racket.
We can.
Racket is a language-making language. It treats languages as libraries to be loaded, mixed, and matched.
This is one example.
Matthew Butterick is a lawyer, a programmer, and a typographer. Then he decided to write a book called Practical Typography. He could have used Word or Latex, but neither gave him the flexibility or even the power he wanted. As a programmer, he knew he didn't have to settle for other people's tools. So he went looking for programming languages to use. Nothing seemed quite right.
Then he discovered Racket. Racket is a language-making language, so he decided to create his own publishing system, which became an entire language within Racket: Pollen.
Note: You will need to install Pollen to run this code. Use the menu commandFile | Install Package...
. Typepollen
into thePackage Source
box and click Install. When it's done, relaunch DrRacket.
Demonstrate Pollen.
- Show a Pollen file.
- Run in Dr. Racket. Look at the output.
-
Run at the command line:
racket poem.html.pp
. -
Run and re-direct the output:
racket poem.html.pp > poem.html
. -
Pollen can do that for us:
raco pollen render poem.html.pp
- Open the output.
Remember: All the reading and expanding happens before run-time.
Now we can write Pollen files, er, programs, and run them in Racket. Butterick has written two books using Pollen, at the same time creating wonderful web sites from the same source code:
If you want to learn more about how to make languages such as Pollen, check out Beautiful Racket. It's a very good book.
This is one example of a document language written in Racket. In Session 28, we will see a programming language written in Racket — one that doesn't use Racket's parenthesized prefix notation!
Wrap Up
-
Reading
- Review these lecture notes. Peek at the code for the session. Pay more attention to the ideas than the details, unless you really want to go deeper. (In which case, let me know!)
- If you like the ideas in this session and would like to see more, check out this short reading with associated code examples.
-
Homework
- Homework 10 is available and due on Thursday.
- Homework 11 will be available then.