TITLE: What a Tiny Language Can Teach Us About Gigantic Systems
AUTHOR: Eugene Wallingford
DATE: October 18, 2015 10:42 AM
DESC:
-----
BODY:
StrangeLoop is
long in the books
for most people, but I'm still thinking about some of the things
I learned there. This is the first of what I hope to be a few
more posts on talks and ideas still on my mind.
The conference opened with a keynote address by
Peter Alvaro,
who does research at the intersection of distributed systems and
programming languages. The talk was titled "I See What You Mean",
but I was drawn in more by his alternate title: "What a Tiny
Language Can Teach Us About Gigantic Systems". Going in, I had
no idea what to expect from this talk and so, in an attitude whose
pessimism surprised me, I expected very little. Coming out, I had
been surprised in the most delightful way.
Alvaro opened with the confounding trade-off of all abstractions:
Hiding the distracting details of a system can illuminate the critical
details (yay!), but the boundaries of an abstraction lock out the
people who want to work with the system in a different way (boo!).
He illustrated the frustration felt by those who are locked out with
a tweet from
@pxlplz:
SELECT bs FROM table WHERE sql="arrgh" ORDER BY hate
From this base, Alvaro moved on to his personal interests: query
languages, semantics, and distributed systems. When modeling
distributed systems, we want a language that is resilient to failure
and tolerant of a loose ordering on the execution of operations. But
we also need a way to model what programs written in the language mean.
The common semantic models express a common split in computing:
- operational semantics: a program means what it does
- model-theoretic semantics: a program means the set of facts
that makes it true
With query languages, we usually think of programs in terms of the
databases of facts that makes them true. In many ways, the streaming
data of a distributed system is a dual to the database query model.
In the latter, program control flows down to fixed data. In
distributed systems, data flows down to fixed control units. If I
understood Alvaro correctly, his work seeks to find a sweet spot amid
the tension between these two models.
Alvaro walked through three approaches to applicative programming.
In the simplest form, we have three operators:
select (σ),
project (Π), and
join (⋈).
The database language SQL adds to this set negation (¬).
The Prolog subset
Datalog
makes computation of the least fixed point a basic operation. Datalog
is awesome, says Alvaro, but not if you add ¬! That
creates a language with too much power to allow the kind of reasoning
we want to do about a program.
Declarative programs don't have assignment statements, because they
introduce time into a model. An assignment statement
effectively partitions the past (in which an old value holds) from the
present (characterized by the current value). In a program with state,
there is an hidden clock inside the program.
We all know the difficulty of managing state in a standard system.
Distributed systems create a new challenge. They need to deal with
time, but a relativistic time in which different programs seem to
be working on their own timelines. Alvaro gave a couple of common
examples:
- a sender crashes, then restarts and begins to replay a set of
transaction
- a receiver enters garbage collection, then comes back to life
and begins to respond to queued messages
A language that helps us write better distributed systems must give
us a way to model relativistic time without a hidden universal clock.
The rest of the talk looked at some of Alvaro's experiments aimed at
finding such languages for distributed systems, building on the ideas
he had introduced earlier.
The first was
Dedalus,
billed as "Datalog in time and space". In Dedalus, knowledge is
local and ephemeral. It adds two temporal operators to the set found
in SQL: @next, for making assertions about the future, and
@async, for making assertions of independence between
operations. Computation in Dedalus is rendezvous between data and
control. Program state is a deduction.
But what of semantics? Alas, a Dedalus program has an infinite number
of models, each model itself infinite. The best we can do is to pull
at all of the various potential truths and hope for quiescence. That's
not comforting news if you want to know what your program will mean
while operating out in the world.
Dedalus as the set of operations {σ, Π,
⋈, ¬, @next, @async} takes us
back to the beginning of the story: too much power for effective
reasoning about programs.
However, Dedalus minus ¬ seems to be a sweet spot. As an
abstraction, it hides state representation and control flow and
illuminates data, change, and uncertainty. This is the direction
Alvaro and his team are moving in now. One result is
Bloom,
a small new language founded on the Dedalus experiment. Another is
Blazes,
a program analysis framework that identifies potential inconsistencies
in a distributed program and generates the code needed to ensure
coordination among the components in question. Very interesting stuff.
Alvaro closed by returning to the idea of abstraction and the role of
programming language. He is often asked why he creates new programming
languages rather than working in existing languages. In either approach,
he points out, he would be creating abstractions, whether with an API or
a new syntax. And he would have to address the same challenges:
- Respect users. We are they.
- Abstractions leak. Accept that and deal with it.
- It is better to mean well than to feel good. Programs have
to do what we need them to do.
Creating a language is an act of abstraction. But then, so is all
of programming. Creating a language specific to distributed systems
is a way to make very clear what matters in the domain and to
provide both helpful syntax and clear, reliable semantics.
Alvaro admits that this answer hides the real reason that he creates
new languages:
Inventing languages is dope.
At the end of this talk, I understood its title, "I See What You
Mean", better than I did before it started. The unintended double
entendre made me smile. This talk showed how language interacts
with problems in all areas of computing, the power language gives us
as well as the limits it imposes. Alvaro delivered a most excellent
keynote address and opened StrangeLoop on a high note.
Check out
the full talk
to learn about all of this in much greater detail, with the many
flourishes of Alvaro's story-telling.