TITLE: StrangeLoop: Compilers, Compilers, Compilers AUTHOR: Eugene Wallingford DATE: September 24, 2013 4:38 PM DESC: ----- BODY:

[My notes on StrangeLoop 2013: Table of Contents]

I went to a lot of talks about compilation. There seemed to be more this year than last, but perhaps I was suffering from a perception bias. I'm teaching compilers this semester and have been reading a bit about V8 and Crankshaft on the elliptical of late. Many of the talks I saw revolved around a common theme: dynamic run-time systems. Given the prominence these days of Javascript, Python, Ruby, Lua, and their like, it's not surprising that finding better ways to organize dynamic run-times and optimize their performance are receiving a lot of attention. The problem of optimizing dynamic run-time systems is complicated by the wide range of tasks being performed dynamically: type checking, field access, function selection, and the most common whipping horse of my static-language friends, garbage collection. Throw in eval, which allows the execution of arbitrary code, possibly changing even the definition of core classes, and it's amazing that our dynamic languages can run in human time at all. That's a tribute to the people who have been creating compilers for us over the years. As I listened to these talks, my ears were tuned to ideas and topics that I need to learn more about. That's what my notes captured best. Here are a few ideas that stood out. The JavaScript interpreter, interpreted. Martha Girdler gave a quick, jargon-free tour of how Javascript works, using Javascript code to illustrate basic ideas like contexts. This sort of talk can help relatively inexperienced developers understand the common "pain points" of the language, such as variable hoisting. Fast and Dynamic. Maxime Chevalier-Boisvert went a level deeper, tracing some of the fundamental ideas used to implement run-time systems from their historical roots in Lisp, Smalltalk, and Self up to research prototypes such as Chevalier-Boisvert's own Higgs compiler. Many of the ideas are familiar to anyone who has had an undergrad compiler course, such as type tagging and microcoded instructions. Others are simple extensions of such ideas, such as inline caching, which is a workhorse in any dynamic compiler. Still others have entered popular discussion only recently. Maps, which are effectively hidden classes, originated in Self and are now being applied and extended in a number of interesting ways. Two ideas from this talk that I would like to learn more about are hybrid type inference, which Chevalier-Boisvert mentioned in the context of Chrome and Firefox, and basic block versioning, a technique being explored in the Higgs compiler. In closing, the speaker speculated on the better compilers of the future. Some of the advances will come from smarter CPUs, which might execute potential future paths in parallel, and more principled language design. But many will come from academic research that discovers new techniques and improves exiting ones. Some of the ideas of the future are probably already available and simply haven't caught on yet. Chevalier-Boisvert offered three candidates: direct manipulation of the AST, pattern matching, and the persistent image. I certainly hear a lot of people talking about the first of these, but I've yet to see a compelling implementation yet. Ruby Doesn't Have to Be Slow. In this session, Alex Gaynor explained why dynamic languages don't have to be slow. Though Ruby was his working example, everything he said applies to Javascript, Python, Lua, and other dynamic languages. He then talked about how he is putting these ideas to work in Topaz, a fast Ruby interpreter written in RPython. Topaz uses a number of advanced techniques, including a tracing JIT, type-specialized field look-up, maps, quasi-immutable fields, and escape analysis. It supports a subset of Ruby, though much of what is missing now is simply standard library classes and methods. Two of the more interesting points of this talk for me were about meta-issues. First, he opened with an elaboration of the claim, "Ruby is slow", which he rightfully rejected as too imprecise to be meaningful. What people probably mean is something like, "Code written in Ruby executes CPU-bound tasks slower than other languages." I would add that, for many of my CS colleagues, the implicit benchmark is compiled C. Further, Ruby users tend to respond to this claim poorly. Rather than refute it, they accept its premise and dance around its edges. Saddest, he says, is when they say, "If it turns out to matter, we can rewrite the program in some serious language." The compiler nerd in him says, "We can do this." Topaz is, in part, an attempt to support that claim. Second, in response to an audience question, he claimed that people responsible for Java got something right fifteen years: they convinced people to abandon their C extensions. If the Ruby world followed course, and moved away from external dependencies that restrict what the compiler and run-time system can know, then many performance improvements would follow. Throughout this talk, I kept coming back to JRuby in my mind... The Art of Finding Shortcuts. Vyacheslav " @mraleph" Egorov's talk was ostensibly about an optimizing compiler for Dart, but like most of the compiler talks this year, it presented ideas of value for handling any dynamic language. Indeed, this talk gave a clear introduction to what an optimizing compiler does, what in-line caching is, and different ways that the compiler might capitalize on them. According to Egorov, writing an optimizing compiler for language like Dart is the art of finding -- and taking -- shortcuts. The three key issues to address are representation, resolution, and redundancy. You deal with representation when you design your run-time system. The other two fall to the optimizing compiler. Resolution is fundamentally a two-part question. Given the expression obj.prop, In-line caches eliminate redundancy by memoizing where/what pairs. The goal is to use the same hidden class maps to resolve property access whenever possible. Dart's optimizer uses in-line caching to give type feedback for use in improving the performance of loads and stores. Egorov was one of the most quotable speakers I heard at StrangeLoop this year. In addition to "the art of finding shortcuts", I noted several other pithy sayings that I'll probably steal at some point, including: Two ideas I plan to read more about after hearing this talk are allocation sinking and load forwarding.

~~~~

I have a lot of research to do now. -----