TITLE: Strange Loop 2010, Day 1 Afternoon AUTHOR: Eugene Wallingford DATE: October 14, 2010 11:17 PM DESC: ----- BODY: I came back from a lunchtime nap ready for more ideas. You Already Use Closures in Ruby This is one of the talks I chose for a specific personal reason. I was looking for ideas I might use in a Cedar Valley Tech Talk on functional programming later this month, and more generally in my programming languages course. I found one, a simple graphical notation for talking about closures as a combination code and environment. Something the speaker said about functions sharing state also gave me an idea for how to use the old koan on the dual "a closure is poor man's object / an object is poor man's closure". NoSQL at Twitter Speaker Kevin Weil started by decrying his own title. Twitter uses MySQL and relational databases everywhere. They use distributed data stores for applications that need specific performance attributes. Weil traced Twitter's evolution toward different distributed data solutions. They started with Syslog for logging applications. It served there early needs but didn't scale well. They then moved to Scribe, which was created by the Facebook team to solve the same problem and then open-sourced. This move led to a realization. Scribe solved Twitter's problem and opened new vistas. It made logging data so easy, that they started logging more. Having more data gave them better insight into the behavior of their users, behaviors they didn't even know to look for before. Now, data analytics is one of Twitter's most interesting areas. The Scribe solution works, scaling with no changes to the architecture as data throughput doubles; they just add more servers. But this data creates a 'write' problem: giving today's takes technology, it takes something like 42 hours to write 12 TB to a single hard drive. This led Twitter to add Hadoop to its toolset. Hadoop is both scalable and powerful. Weil mentioned an a 4,000-node cluster at Yahoo! that had sorted one terabyte of integers in 62 seconds. The rest of Weil's talk focused on data analytics. The key point underlying all he said was this: It is easy to answer questions. It is hard to ask the right questions. This makes experimental programming valuable, and by extension a powerful scripting language and short turnaround times. They need time to ask a lot of questions, looking for good ones and refining promising questions into more useful ones. Hadoop is a Java platform, which doesn't fit those needs. So, Twitter added Pig, a high-level language that sits atop Hadoop. Programs written in Pig are easy to read and almost English-like. Equivalent SQL programs would probably be shorter, but Pig compiles to MapReduce jobs that run directly on Hadoop. Pig extracts a performance penalty, but the Twitter team doesn't mind. Weil captured why in another pithy sentence: I don't mind if a 10-minute job runs in 12 minutes if it took me 10 minutes to write the script instead of an hour. Twitter works on several kinds of data-analytic problems. A couple stood out:

correlating big data. How do different kinds of user behave -- mobile, web, 3rd-party clients? What features hook users? How do user cohorts work? What technical details go wrong at the same time, leading to site problems?

research on big data. What can we learn from a users' tweets, the tweets of those they follow, or the tweets of those who follow them? What can we learn from asymmetric follow relationships about social and personal interests?

As much as Weil had already described, there was more! HBase, Cassandra, FlockDB, .... Big data means big problems and big opportunities, which lead to hybrid solutions that optimize competing forces. Interesting stuff. Solutions to the Expression Problem This talk was about Clojure, which interests me for obvious reasons, but the real reason I chose this talk was that I wanted to know what is the expression problem! Like many in the room, I had experienced the expression problem without knowing it by this name:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

Speaker Chris Houser used an example in which we need to add a behavior to an existing class that is hard or impossible to modify. He then stepped through four possible solutions: the adapter pattern and monkey patching, which are available in languages like Java and Ruby, and multimethods and protocols, which are available in Clojure. I liked two things about this talk. First, he taught his a "failure-driven" way: pose a problem, solve it using a known technique, expose a weakness, and move on to a more effective solution. I often use this technique when teaching design patterns. Second, the breadth of the problem and its potential solutions encouraged language geeks to talk ideas in language design. The conversation included not only Java and Clojure but also Javascript, C#, and macros. Guy Steele My notes on this talk aren't that long, but it was important enough to have its own entry. Besides, this entry is already long enough! -----