October 30, 2015 4:35 PM

Taking Courses Broad and Wide

Nearly nine years ago, digital strategist Russell Davies visited the University of Oregon to work with students and faculty in the advertising program and wrote a blog entry about his stint there. Among his reflections on what the students should be doing and learning, he wrote:

We're heading for a multi-disciplinary world and that butts right up against a university business model. If I were preparing myself for my job right now I'd do classes in film editing, poetry, statistics, anthropology, business administration, copyright law, psychology, drama, the history of art, design, coffee appreciation, and a thousand other things. Colleges don't want you doing that, that destroys all their efficiencies, but it's what they're going to have to work out.

I give similar advice to prospective students of computer science: If they intend to take their CS degrees out into the world and make things for people, they will want to know a little bit about many different things. To maximize the possibilities of their careers, they need a strong foundation in CS and an understanding of all the things that shape how software and software-enhanced gadgets are imagined, made, marketed, sold, and used.

Just this morning, a parent of a visiting high school student said, after hearing about all the computer science that students learn in our programs, "So, our son should probably drop his plans to minor in Spanish?" They got a lot more than a "no" out of me. I talked about the opportunities to engage with the growing population of Spanish-speaking Americans, even here in Iowa; the opportunities available to work for companies with international divisions; and how learning a foreign language can help students study and learn programming languages differently. I was even able to throw in a bit about grammars and the role they play in my compiler course this semester.

I think the student will continue with his dream to study Spanish.

I don't think that the omnivorous course of study that Davies outlines is at odds with the "efficiencies" of a university at all. It fits pretty well with a liberal arts education, which even of our B.S. students have time for. But it does call for some thinking ahead, planning to take courses from across campus that aren't already on some department's list of requirements. A good advisor can help with that.

I'm guessing that computer science students and "creatives" are not the only ones who will benefit from seeking a multi-disciplinary education these days. Davies is right. All university graduates will live in a multi-disciplinary world. It's okay for them (and their parents) to be thinking about careers when they are in school. But they should prepare for a world in which general knowledge and competencies buoy up their disciplinary knowledge and help them adapt over time.

Posted by Eugene Wallingford | Permalink | Categories: General, Teaching and Learning

October 22, 2015 4:22 PM

Aramaic, the Intermediate Language of the Ancient World

My compiler course is making the transition from the front end to the back end. Our attention is on static analysis of abstract syntax trees and will soon turn to other intermediate representations.

In the compiler world, an "intermediate representation" or intermediate language is a notation used as a stepping stone between the abstract syntax tree and the machine language that is ultimately produced. Such a stepping stone allows the compiler to take smaller steps in translation process and makes it easier to improve the code before getting down into the details of machine language.

We sometimes see intermediate languages in the "real world", too. They tend to arise as a result of cultural and geopolitical forces and, while they usually serve different purposes in human affairs than in compiler affairs, they still tend to be practical stepping stones to another language.

Consider the case of Darius I, whose Persian armies conquered most of the Middle East around 500 BC. As John McWhorter writes in The Atlantic, at the time of Darius's conquest,

... Aramaic was so well-entrenched that it seemed natural to maintain it as the new empire's official language, instead of using Persian. For King Darius, Persian was for coins and magnificent rock-face inscriptions. Day-to-day administration was in Aramaic, which he likely didn't even know himself. He would dictate a letter in Persian and a scribe would translate it into Aramaic. Then, upon delivery, another scribe would translate the letter from Aramaic into the local language. This was standard practice for correspondence in all the languages of the empire.

For sixty years, many compiler writers have dreamed of a universal intermediate language that would ease the creation of compilers for new languages and new machines, to no avail. But for several hundred years, Aramaic was the intermediate representation of choice for a big part of the Western world! Alas, Greek and Arabic later came along to supplant Aramaic, which now seems to be on a path to extinction.

This all sounds a lot like the world of programming, in which languages come and go as we develop new technologies. Sometimes a language, human or computer, takes root for a while as the result of historical or technical forces. Then a new regime or a new culture rises, or an existing culture gains in influence, and a different language comes to dominate.

McWhorter suggests that English may have risen to prominence at just the right moment in history to entrench itself as the world's intermediate language for a good long run. We'll see. Human languages and computer languages may operate on different timescales, but history treats them much the same.

Posted by Eugene Wallingford | Permalink | Categories: Computing, General

October 18, 2015 10:42 AM

What a Tiny Language Can Teach Us About Gigantic Systems

StrangeLoop is long in the books for most people, but I'm still thinking about some of the things I learned there. This is the first of what I hope to be a few more posts on talks and ideas still on my mind.

The conference opened with a keynote address by Peter Alvaro, who does research at the intersection of distributed systems and programming languages. The talk was titled "I See What You Mean", but I was drawn in more by his alternate title: "What a Tiny Language Can Teach Us About Gigantic Systems". Going in, I had no idea what to expect from this talk and so, in an attitude whose pessimism surprised me, I expected very little. Coming out, I had been surprised in the most delightful way.

Alvaro opened with the confounding trade-off of all abstractions: Hiding the distracting details of a system can illuminate the critical details (yay!), but the boundaries of an abstraction lock out the people who want to work with the system in a different way (boo!). He illustrated the frustration felt by those who are locked out with a tweet from @pxlplz:

SELECT bs FROM table WHERE sql="arrgh" ORDER BY hate

From this base, Alvaro moved on to his personal interests: query languages, semantics, and distributed systems. When modeling distributed systems, we want a language that is resilient to failure and tolerant of a loose ordering on the execution of operations. But we also need a way to model what programs written in the language mean. The common semantic models express a common split in computing:

  • operational semantics: a program means what it does
  • model-theoretic semantics: a program means the set of facts that makes it true

With query languages, we usually think of programs in terms of the databases of facts that makes them true. In many ways, the streaming data of a distributed system is a dual to the database query model. In the latter, program control flows down to fixed data. In distributed systems, data flows down to fixed control units. If I understood Alvaro correctly, his work seeks to find a sweet spot amid the tension between these two models.

Alvaro walked through three approaches to applicative programming. In the simplest form, we have three operators: select (σ), project (Π), and join (). The database language SQL adds to this set negation (¬). The Prolog subset Datalog makes computation of the least fixed point a basic operation. Datalog is awesome, says Alvaro, but not if you add ¬! That creates a language with too much power to allow the kind of reasoning we want to do about a program.

Declarative programs don't have assignment statements, because they introduce time into a model. An assignment statement effectively partitions the past (in which an old value holds) from the present (characterized by the current value). In a program with state, there is an hidden clock inside the program.

We all know the difficulty of managing state in a standard system. Distributed systems create a new challenge. They need to deal with time, but a relativistic time in which different programs seem to be working on their own timelines. Alvaro gave a couple of common examples:

  • a sender crashes, then restarts and begins to replay a set of transaction
  • a receiver enters garbage collection, then comes back to life and begins to respond to queued messages

A language that helps us write better distributed systems must give us a way to model relativistic time without a hidden universal clock. The rest of the talk looked at some of Alvaro's experiments aimed at finding such languages for distributed systems, building on the ideas he had introduced earlier.

The first was Dedalus, billed as "Datalog in time and space". In Dedalus, knowledge is local and ephemeral. It adds two temporal operators to the set found in SQL: @next, for making assertions about the future, and @async, for making assertions of independence between operations. Computation in Dedalus is rendezvous between data and control. Program state is a deduction.

But what of semantics? Alas, a Dedalus program has an infinite number of models, each model itself infinite. The best we can do is to pull at all of the various potential truths and hope for quiescence. That's not comforting news if you want to know what your program will mean while operating out in the world.

Dedalus as the set of operations {σ, Π, , ¬, @next, @async} takes us back to the beginning of the story: too much power for effective reasoning about programs.

However, Dedalus minus ¬ seems to be a sweet spot. As an abstraction, it hides state representation and control flow and illuminates data, change, and uncertainty. This is the direction Alvaro and his team are moving in now. One result is Bloom, a small new language founded on the Dedalus experiment. Another is Blazes, a program analysis framework that identifies potential inconsistencies in a distributed program and generates the code needed to ensure coordination among the components in question. Very interesting stuff.

Alvaro closed by returning to the idea of abstraction and the role of programming language. He is often asked why he creates new programming languages rather than working in existing languages. In either approach, he points out, he would be creating abstractions, whether with an API or a new syntax. And he would have to address the same challenges:

  • Respect users. We are they.
  • Abstractions leak. Accept that and deal with it.
  • It is better to mean well than to feel good. Programs have to do what we need them to do.

Creating a language is an act of abstraction. But then, so is all of programming. Creating a language specific to distributed systems is a way to make very clear what matters in the domain and to provide both helpful syntax and clear, reliable semantics.

Alvaro admits that this answer hides the real reason that he creates new languages:

Inventing languages is dope.

At the end of this talk, I understood its title, "I See What You Mean", better than I did before it started. The unintended double entendre made me smile. This talk showed how language interacts with problems in all areas of computing, the power language gives us as well as the limits it imposes. Alvaro delivered a most excellent keynote address and opened StrangeLoop on a high note.

Check out the full talk to learn about all of this in much greater detail, with the many flourishes of Alvaro's story-telling.

Posted by Eugene Wallingford | Permalink | Categories: Computing, Software Development

October 15, 2015 8:18 AM

Perfection Is Not A Pre-Requisite To Accomplishing Something Impressive

In Not Your Typical Role Model, mathematician Hannah Fry tells us some of what she learned about Ada Lovelace, "the 19th century programmer", while making a film about her. Not all of it was complimentary. She concludes:

Ada was very, very far from perfect, but perfection is not a pre-requisite to accomplishing something impressive. Our science role models shouldn't always be there to celebrate the unachievable.

A lot of accomplished men of science were far from perfect role models, too. In the past, we've often been guilty of covering up bad behavior to protect our heroes. These days, we sometimes rush to judge them. Neither inclination is healthy.

By historical standards, it sounds like Lovelace's imperfections were all too ordinary. She was human, like us all. Lovelace thought some amazing things and wrote them down for us. Let's celebrate that.

Posted by Eugene Wallingford | Permalink | Categories: Computing, General

October 08, 2015 4:20 PM

Marick's Law

This morning, I wanted to send Michael Feathers a link to Marick's Law. The only link I could find to it was a tweet of Brian's. This law is too important to be left vulnerable to the vagaries of an internet service, so let's give it a permanent home:

In software, anything of the form "X's Law" is better understood by replacing the word "Law" with "Fervent Desire".

This is a beautiful observation, lovingly and consciously self-referential. I think of it almost daily. Use it well.


Historical note.    When I searched for Marick's Law, I did find reference to another law going by the same name. Uncle Bob calls this Marick's Law: "When it comes to code, it never pays to rush." This is a useful aphorism as well, and perhaps Brian once called it Marick's Law, too. Uncle Bob's post is dated November 2008. Brian's coining tweet is dated April 2009. I'm going to stick with the first-person post as definitive and observe that Brian, like Whitman, is large and contains multitudes.

I cannot help but notice that we can and should apply the definitive Marick's Law to the secondhand quote given by Uncle Bob. Many of us fervently desire it to be true that, when it comes to code, it never pays to rush. If it's not, then many of our best practices need an overhaul. Besides, we fear deep in our hearts that sometimes it probably does pay to rush.

Posted by Eugene Wallingford | Permalink | Categories: Software Development