TITLE: What a Tiny Language Can Teach Us About Gigantic Systems AUTHOR: Eugene Wallingford DATE: October 18, 2015 10:42 AM DESC: ----- BODY: StrangeLoop is long in the books for most people, but I'm still thinking about some of the things I learned there. This is the first of what I hope to be a few more posts on talks and ideas still on my mind. The conference opened with a keynote address by Peter Alvaro, who does research at the intersection of distributed systems and programming languages. The talk was titled "I See What You Mean", but I was drawn in more by his alternate title: "What a Tiny Language Can Teach Us About Gigantic Systems". Going in, I had no idea what to expect from this talk and so, in an attitude whose pessimism surprised me, I expected very little. Coming out, I had been surprised in the most delightful way. Alvaro opened with the confounding trade-off of all abstractions: Hiding the distracting details of a system can illuminate the critical details (yay!), but the boundaries of an abstraction lock out the people who want to work with the system in a different way (boo!). He illustrated the frustration felt by those who are locked out with a tweet from @pxlplz:

SELECT bs FROM table WHERE sql="arrgh" ORDER BY hate

From this base, Alvaro moved on to his personal interests: query languages, semantics, and distributed systems. When modeling distributed systems, we want a language that is resilient to failure and tolerant of a loose ordering on the execution of operations. But we also need a way to model what programs written in the language mean. The common semantic models express a common split in computing:

operational semantics: a program means what it does
model-theoretic semantics: a program means the set of facts that makes it true

With query languages, we usually think of programs in terms of the databases of facts that makes them true. In many ways, the streaming data of a distributed system is a dual to the database query model. In the latter, program control flows down to fixed data. In distributed systems, data flows down to fixed control units. If I understood Alvaro correctly, his work seeks to find a sweet spot amid the tension between these two models. Alvaro walked through three approaches to applicative programming. In the simplest form, we have three operators: select (σ), project (Π), and join (⋈). The database language SQL adds to this set negation (¬). The Prolog subset Datalog makes computation of the least fixed point a basic operation. Datalog is awesome, says Alvaro, but not if you add ¬! That creates a language with too much power to allow the kind of reasoning we want to do about a program. Declarative programs don't have assignment statements, because they introduce time into a model. An assignment statement effectively partitions the past (in which an old value holds) from the present (characterized by the current value). In a program with state, there is an hidden clock inside the program. We all know the difficulty of managing state in a standard system. Distributed systems create a new challenge. They need to deal with time, but a relativistic time in which different programs seem to be working on their own timelines. Alvaro gave a couple of common examples:

a sender crashes, then restarts and begins to replay a set of transaction
a receiver enters garbage collection, then comes back to life and begins to respond to queued messages

A language that helps us write better distributed systems must give us a way to model relativistic time without a hidden universal clock. The rest of the talk looked at some of Alvaro's experiments aimed at finding such languages for distributed systems, building on the ideas he had introduced earlier. The first was Dedalus, billed as "Datalog in time and space". In Dedalus, knowledge is local and ephemeral. It adds two temporal operators to the set found in SQL: @next, for making assertions about the future, and @async, for making assertions of independence between operations. Computation in Dedalus is rendezvous between data and control. Program state is a deduction. But what of semantics? Alas, a Dedalus program has an infinite number of models, each model itself infinite. The best we can do is to pull at all of the various potential truths and hope for quiescence. That's not comforting news if you want to know what your program will mean while operating out in the world. Dedalus as the set of operations {σ, Π, ⋈, ¬, @next, @async} takes us back to the beginning of the story: too much power for effective reasoning about programs. However, Dedalus minus ¬ seems to be a sweet spot. As an abstraction, it hides state representation and control flow and illuminates data, change, and uncertainty. This is the direction Alvaro and his team are moving in now. One result is Bloom, a small new language founded on the Dedalus experiment. Another is Blazes, a program analysis framework that identifies potential inconsistencies in a distributed program and generates the code needed to ensure coordination among the components in question. Very interesting stuff. Alvaro closed by returning to the idea of abstraction and the role of programming language. He is often asked why he creates new programming languages rather than working in existing languages. In either approach, he points out, he would be creating abstractions, whether with an API or a new syntax. And he would have to address the same challenges:

Respect users. We are they.
Abstractions leak. Accept that and deal with it.
It is better to mean well than to feel good. Programs have to do what we need them to do.

Creating a language is an act of abstraction. But then, so is all of programming. Creating a language specific to distributed systems is a way to make very clear what matters in the domain and to provide both helpful syntax and clear, reliable semantics. Alvaro admits that this answer hides the real reason that he creates new languages:

Inventing languages is dope.

At the end of this talk, I understood its title, "I See What You Mean", better than I did before it started. The unintended double entendre made me smile. This talk showed how language interacts with problems in all areas of computing, the power language gives us as well as the limits it imposes. Alvaro delivered a most excellent keynote address and opened StrangeLoop on a high note. Check out the full talk to learn about all of this in much greater detail, with the many flourishes of Alvaro's story-telling.