November 20, 2017 2:58 PM

Tests That Protect Your Code From Without

I recently ran across an old post from Spotify about how the confluence of Spotify's generous policy for usernames, the oddities of Unicode, and a change in standard Python libraries led to a situation in which a person could hijack an existing username. The post explains how Spotify tracked down the problem and fixed it -- a good read.

The paragraph before the closing "Some Take-Aways" section says:

So changes in the standard python library from one python version to the next introduced a subtle bug in twisted's nodepre.prepare() function which in turn introduced a security issue in Spotify's account creation.

... which, to my mind, hints at a bonus takeaway not listed in the closing. It would really have been nice to have had a test that noticed when nodeprep's method for canonicalizing (ugh) usernames was no longer idempotent.

Not having such a test is understandable. I don't write tests for all or even many of the library functions I use. This means trusting the creators of the libraries I use. (It also means choosing libraries judiciously.) In this case, there wasn't even a bug in twisted's original code; a change in Python introduced one. It is hard to anticipate when a language upgrade will create a bug in a library that I am using.

However, when something like this occurs, I find that it is a good time to add a test. This bug exposed just how important it is for the code to ensure idempotence when lower-casing a username. A new test or two can protect the code base from unexpected errors from without without placing an unreasonable burden on me as a developer. And those tests give me peace of mind, which is usually more than worth an extra millisecond when running my tests.


Posted by Eugene Wallingford | Permalink | Categories: Software Development

November 15, 2017 4:03 PM

A Programming Digression: Kaprekar Numbers

Earlier this week I learned about Kaprekar numbers when someone re-tweeted this my way:

Kaprekar numbers are numbers whose square in that base can be split into 2 parts that add up to the original number

So, 9 is a Kaprekar number, because 9 squared is 81 and 8+1 equals 9. 7777 is, too, because 7777 squared is 60481729 and 6048 + 1729 equals 7777.

This is the sort of numerical problem that is well-suited for the language my students are writing a compiler for this semester. I'm always looking out for fun little problems that I can to test their creations. In previous semesters, I've blogged about computing Farey sequences and excellent numbers for just this purpose.

Who am I kidding. I just like to program, even in small language that feels like integer assembly language, and these problems are fun!

So I sat down and wrote Klein functions to determine if a given number is a Kaprekar number and to generate all of the Kaprekar numbers less than a given number. I made one small change to the definition, though: I consider only numbers whose squares consist of an even-number of digits and thus can be split in half, a lá excellent numbers.

Until we have a complete compiler for our class language, I always like to write a reference program in a language such as Python so that I can validate my logic. I had a couple of meetings this morning, which gave just the time I needed to port my solution to a Klein-like subset of Python.

When I finished my program, I still had a few meeting minutes available, so I started generating longer and longer Kaprekar numbers. I noticed that there are a bunch more 6-digit Kaprekar numbers than at any previous length:

 1: 1
 2: 3
 3: 2
 4: 5
 5: 4
 6: 24
Homer Simpson says, 'D'oh!'

I started wondering why that might be... and then realized that there are a lot more 6-digit numbers overall than 5-digit -- ten times as many, of course. (D'oh!) My embarrassing moment of innumeracy didn't kill my curiosity, though. How does that 24 compare to the trend line of Kaprekar numbers by length?

 1: 1    of        9  0.11111111
 2: 3    of       90  0.03333333
 3: 2    of      900  0.00222222
 4: 5    of     9000  0.00055555
 5: 4    of    90000  0.00004444
 6: 24   of   900000  0.00002666

There is a recognizable drop-off at each length up to six, where the percentage is an order of magnitude different than expected. Are 6-digit numbers a blip or a sign of a change in the curve? I ran another round. This took much longer, because my Klein-like Python program has to compute operations like length recursively and has no data structures for caching results. Eventually, I had a count:

 7: 6    of  9000000  0.00000066

A big drop, back in line with the earlier trend. One more round, even slower.

 8: 21   of 90000000  0.00000023

Another blip in the rate of decline. This calls for some more experimentation... There is a bit more fun to have the next time I have a couple of meetings to fill.

Image: courtesy of the Simpsons wiki.


Posted by Eugene Wallingford | Permalink | Categories: Computing

November 14, 2017 3:52 PM

Thinking about Next Semester's Course Already

We are deep into fall semester. The three teams in my compilers course are making steady progress toward a working compiler, and I'm getting so excited by that prospect that I've written a few new programs for them to compile. The latest two work with Kaprekar numbers.

Yet I've also found myself thinking already quite a bit about my spring programming languages course.

I have not made significant changes to this course (which introduces students to Racket, functional programming, recursive programming over algebraic data types, and a few principles of programming languages) in several years. I don't know if I'm headed for a major re-design yet, but I do know that several new ideas are commingling in my mind and encouraging me to think about improvements to the course.

The first trigger was reading How to Switch from the Imperative Mindset, which approaches learning functional style explicitly as a matter establishing new habits. My students come to the course having learned an imperative style in Python, perhaps with some OO in Java thrown in. Most of them are not yet 100% secure in their programming skills, and the thought of learning a new style is daunting. They don't come to the course asking for a new set of habits.

One way to develop a new set of habits is to recognize the cues that trigger an old habit, learn a new response, and then rehearse that response until it becomes a new habit. The How to Switch... post echoes a style that I have found effective when teaching OOP to programmers with experience in a procedural language, and I'm thinking about how to re-tool part of my course to use this style more explicitly when teaching FP.

My idea right now is something like this. Start with simple examples from the students' experience processing arrays and lists of data. Then work through solutions in sequence, such as:

  1. first, use a loop of the sort with which they are familiar, the body of which acts on each item in the collection
  2. then, move the action into a function, which the loop calls for each item in the collection
  3. finally, map the function over the items in the collection

We can also do this with built-in functions, perhaps to start, which eliminates the need to write a user-defined function.

In effect, this refactors code that the students are already comfortable with toward common functional patterns. I can use the same sequence of steps for mapping, folding, and reducing, which will reinforce the thinking habits students need to begin writing FP code from the original cues. I'm only just beginning to think about this approach, but I'm quite comfortable using a "refactoring to patterns" style in class.

Going in this direction will help me achieve another goal I have in mind for next semester: making class sessions more active. This was triggered by my post-mortem of the last course offering. Some early parts of the course consist of too much lecture. I want to get students writing small bits of code sooner, but with more support for taking small, reliable steps.

Paired this change to what happens in class are changes to what happens before students come to class. Rather than me talking about so many things in class, I hope to have

  • students reading clear expositions of the material, in small units that are followed immediately by
  • students doing more experimentation on their own in Dr. Racket, learning from the experiments and learning find information about language features as they need them.

This change will require me to package my notes differently and also to create triggers and scaffolding for the students' experimentation before coming to class. I'm thinking of this as something like a flipped classroom, but with "watching videos" replaced by "playing with code".

Finally, this blog post triggered a latent desire to make the course more effective for all students, wherever they are on the learning curve. Many students come to the course at roughly same level of experience and comfort, but a few come in struggling from their previous courses, and a few come in ready to take on bigger challenges. Even those broad categories are only approximate equivalence classes; each student is at a particular point in the development we hope for them. I'd like to create experiences that can help students all of these students learn something valuable for them.

I've only begun to think about the ideas in that post. Right now, I'm contemplating two of ideas from the section on getting to know my students better: gathering baseline data early on that I can use to anchor the course, and viewing grading as planning. Anything that can turn the drudgery of grading into a productive part of the course for me is likely to improve my experience in the course, and that is likely to improved my students' experience, too.

I have more questions than answers at this point. That's part of the fun of re-designing a course. I expect that things will take better shape over the next six weeks or so. If you have any suggestions, email me or tweet to me at @wallingf.


Posted by Eugene Wallingford | Permalink | Categories: Patterns, Teaching and Learning

November 05, 2017 9:42 AM

One Way I'm Like Maurice Sendak

In this conversation, the interviewer asked Maurice Sendak, then age eighty-three, how long he could work in one stretch. The illustrator said that two hours was a long stretch for him.

Because I'm older, I get tired and nap. I love napping. Working and napping and reading and seeing my students. They're awfully nice. They're young and they're hopeful.

I'm not quite eighty-three, but I agree with every sentence in Sendak's answer. I could do worse than be as productive and as cantankerous for as long as he was.


Posted by Eugene Wallingford | Permalink | Categories: General, Personal

October 31, 2017 3:44 PM

Chuck Moore, Extreme Programmer

If you have read any of Chuck Moore's work, you know that he is an extreme programmer in the literal sense of the word: a unique philosophy that embraces small, fast, self-contained programs in a spare language and an impressive development environment. But as I read the opening sections of Moore's book Programming a Problem-Oriented Language, I began to mistake him for an XP-style extreme programmer.

Here is You Aren't Gonna Need It or, as Moore says, Do Not Speculate!:

Do not put code in your program that might be used. Do not leave hooks on which you can hang extensions. The things you might want to do are infinite; that means that each one has 0 probability of realization. If you need an extension later, you can code it later - and probably do a better job than if you did it now. And if someone else adds the extension, will they notice the hooks you left? Will you document that aspect of your program?

Moore also has a corollary called Do It Yourself!, which encourages you, in general, to write your own subroutines rather than import one a canned subroutine from a library. That doesn't sound like Doing the Simplest Thing That Can Work, but his intent is similar: an external routine is a dependency, and it probably doesn't match the specific needs of the problem you are solving. Indeed, Moore closes this section with:

Let me take a stand: I can't solve the problems of the world. With luck, I can write a good program.

That could well be the manifesto of XP.

In the next section, a preview of the rest of the book, Moore tells his readers that they won't find flowcharts in the pages to come documenting the algorithms he describes:

I've never liked them, for they seem to include a useless amount of information - either too little or too much. Besides they imply a greater rigidity in program structure than usually exists. I will be quite specific about what I think you should do and how you should do it. But I will use words, and not diagrams. I doubt that you would give a diagram the attention it deserved, anyway. Or that I would in preparing it.

Other than using a few words, he lets the code stands for itself. Documentation would not add enough value to outweigh the cost of preparing it, or reading it. Usually, the best thing to do with Moore's books is to Listen To The Code: settle into your Forth environment, type the code in, run it, and learn from it.

There is no accounting for my weird fascination with stack-based languages. Chuck Moore always give me more to think about than I expected. I suspect that he would be at odds with some of the practices encouraged by XP, but I often feel the spirit of XP in his philosophy.


Posted by Eugene Wallingford | Permalink | Categories: Patterns, Software Development

October 12, 2017 3:35 PM

Sometimes, We Need to Make a Better Tool

I learned about a couple of cool CLI tools from Nikita Sobolev's Using Better CLIs. hub and tig look like they may be worth a deeper look. This article also reminded me of one of the examples in the blog entry I rmed the other day. It reflects a certain attitude about languages and development.

One of the common complaints about OOP is that what would be a single function in other programming styles usually ends up distributed across multiple classes in an OO program. For example, instead of:

    void draw(Shape s) {
       case s of
          Circle : [code for circle]
          Square : [code for square]
          ...
    }
the code for the individual shapes ends up in the classes for Circle, Square, and so on. If you have to change the drawing code for all of the shapes, you have to track down all of the classes and modify them individually.

This is true, and it is a serious issue. We can debate the relative benefits and costs of the different designs, of course, but we might also think about ways that our development tools can help us.

As a grad student in the early 1990s, I worked in a research group that used VisualWorks Smalltalk to build all of its software. Even within a single Smalltalk image, we faced this code-all-over-the-place problem. We were adding methods to classes and modifying methods all the time as part of our research. We spent a fair amount of time navigating from class to class to work on the individual methods.

Eventually, one of my fellow students had an epiphany: we could write a new code browser. We would open this browser on a particular class, and the browser would provide a pane listing and all of its subclasses, and all of their subclasses. When we selected a method in the root class, the browser enabled us to click on any of the subclasses, see the code for the subclass's corresponding method, and edit it there. If the class didn't have an overriding method, we could add one in the empty pane, with the method signature supplied by the browser.

This browser didn't solve all of the problems we had learning to manage a large code base spread out over many classes, but it was a huge win for dealing with the specific issue of an algorithm being distributed across several kinds of object. It also taught me two things:

  • to appreciate the level of control that Smalltalk gave developers to inspect code and shape the development experience
  • to appreciate the mindset that creating new tools is the way to mitigate many problems in software development, if not to solve them completely

The tool-making mindset is one that I came to appreciate and understand more and more as the years past. I'm disappointed whenever I don't put it to good use, but oftentimes I wise up and make the tools I need to help me do my work.


Posted by Eugene Wallingford | Permalink | Categories: Patterns, Software Development

October 10, 2017 4:10 PM

I'm Not Correcting Someone Who Was Wrong on the Internet

Yesterday, I wrote an entire blog entry that I rm'ed before posting. I had read a decent article on a certain flavor of primitive obsession and design alternatives, which ended with what I thought was a misleading view of object-oriented programming. My blog entry set out to right this wrong.

In a rare moment of self-restraint, I resisted the urge to correct someone who was wrong on the internet. There is no sense in subjecting you to that. Instead, I'll just say that I like both OOP and functional programming, and use both regularly. I remain, in my soul, object-oriented.

On a more positive note, this morning I read an old article that made me smile, Why I'm Productive in Clojure. I don't use Clojure, but this short piece brought to mind many of the same feelings in me, but about Smalltalk. Interestingly, the sources of my feelings are similar to the author's: the right amount of syntax, facilities for meta-programming, interactive development. The article gave me a feeling that is the opposite of schadenfreude: pleasure from the pleasure of others. Some Buddhists call this mudita. I felt mudita after reading this blog entry. Rock on, Clojure dude.


Posted by Eugene Wallingford | Permalink | Categories: Personal, Software Development

October 04, 2017 3:56 PM

A Stroll Through the Gates CS Building

sneaking up on the Gates-Hillman Complex at Carnegie Mellon from Forbes St.
 
sneaking up on the Gates-Hillman Complex
from Forbes St., Pittsburgh, PA

I had a couple of hours yesterday between the end of the CS education summit and my shuttle to the airport. Rather than sit in front of a computer for two more hours, I decided to take advantage of my location, wander over to the Carnegie Mellon campus, and take a leisurely walk through the Gates Center for Computer Science. I'm glad I did.

At the beginning of my tour, I was literally walking in circles, from the ground-level entrance shown in its Wikipedia photo up to where the CS offices seem to begin, up on the fourth floor. This is one of those buildings that looks odd from the outside and is quite confusing on the inside, at least to the uninitiated. But everyone inside seemed to feel at home, so maybe it works.

It didn't take long before my mind was flooded by memories of my time as a doctoral student. Michigan State's CS program isn't as big as CMU's, but everywhere I looked I saw familiar images: Students sitting in their labs or in their offices, one or two or six at a time, hacking code on big monitors, talking shop, or relaxing. The modern world was on display, too, with students lounging comfy chairs or sitting in a little coffee shop, laptops open and earbuds in place. That was never my experience as a student, but I know it now as a faculty member.

I love to wander academic halls, in any department, really, and read what is posted on office doors and hallway walls. At CMU, I encountered the names of several people whose work I know and admire. They came from many generations... David Touretzky, whose Lisp textbook taught me a few things about programming. Jean Yang, whose work on programming languages I find cool. (I wish I were going to SPLASH later this month...) Finally, I stumbled across the office of Manuel Blum, the 1995 Turing Award winner. There were a couple of posters outside his door showing the work of his students on problems of cryptography and privacy, and on the door itself were several comic strips. The punchline of one read, "I'll retire when it stops being fun." On this, even I am in sync with a Turing Award winner.

Everywhere I turned, something caught my eye. A pointer to the Newell/Simon bridge... Newell-and-Simon, the team, were the like the Pied Piper to me when I began my study of AI. A 40- or 50-page printout showing two old researchers (Newell and Simon?) playing chess. Plaques in recognition of big donations that had paid for classrooms, labs, and auditoria, made by Famous People who were either students or faculty in the school.

CMU is quite different from my school, of course, but there are many other schools that give off a similar vibe. I can see why people want to be at an R-1, even if they aspire to be teachers more than research faculty. There is so much going on. People, labs, sub-disciplines, and interdisciplinary projects. Informal talks, department seminars, and outside speakers. Always something going on. Ideas. Energy.

On the ride to the airport later in the day, I sat in some slow, heavy traffic going one direction and saw slower, heavier traffic going in the other. As much as I enjoyed the visit, I was glad to be heading home.


Posted by Eugene Wallingford | Permalink | Categories: Computing, Personal

October 03, 2017 12:23 PM

Why Do CS Enrollments Surges End?

The opening talk of the CS Education Summit this week considered the challenges facing CS education in a time of surging enrollments and continued concerns about the diversity of the CS student population. In the session that followed, Eric Roberts and Jodi Tims presented data that puts the current enrollment surge into perspective, in advance of a report from the National Academy of Science.

In terms of immediate takeaway, Eric Roberts's comments were gold. Eric opened with Stein's Law: If something is unsustainable, it will stop. Stein was an economist whose eponymous law expresses one of those obvious truths we all seem to forget about in periods of rapid change: If something cannot go on forever, it won't. You don't have to create a program to make it stop. A natural corollary is: If it can't go on for long, you don't need a program to deal with it. It will pass soon.

Why is that relevant to the summit? Even without continued growth, current enrollments in CS majors is unsustainable for many schools. If the past is any guide, we know that many schools will deal with unsustainable growth by limiting the number of students who start or remain in their major.

Roberts has studied the history of CS boom-and-bust cycles over the last thirty years, and he's identified a few common patterns:

  • Limiting enrollments is how departments respond to enrollment growth. They must: the big schools can't hire faculty fast enough, and most small schools can't hire new faculty at all.

  • The number of students graduating with CS degrees drops because we limit enrollments. Students do not stop enrolling because the number of job opportunities goes down or any other cause.

    After the dot-com bust, there was a lot of talk about offshoring and automation, but the effects of that were short-term and rather small. Roberts's data shows that enrollment crashes do not follow crashes in job openings; they follow enrollment caps. Enrollments remain strong wherever they are not strictly limited.

  • When we limit enrollments, the effect is bigger on women and members of underserved communities. These students are more likely to suffer from impostor syndrome, stereotype bias, and other fears, and the increased competitiveness among students for fewer openings combines with discourages them from continuing.

So the challenge of booming enrollments exacerbates the challenge to increase diversity. The boom might decrease diversity, but when it ends -- and it will, if we limit enrollments -- our diversity rarely recovers. That's the story of the last three booms.

In order to grow capacity, the most immediate solution is to hire more professors. I hope to write more about that soon, but for now I'll mention only that the problem of hiring enough faculty to teach all of our students has at east two facets. The first is that many schools simply don't have the money to hire more faculty right now. The second is that there aren't enough CS PhDs to go around. Roberts reported that, of last year's PhD grads, 83% took positions at R1 schools. That leaves 17% for the rest of us. "Non-R1 schools can expect to hire a CS PhD every 27 years." Everyone laughed, but I could see anxiety on more than a few faces.

The value of knowing this history is that, when we go to our deans and provosts, we can do more than argue for more resources. We can show the effect of not providing the resources needed to teach all the students coming our way. We won't just be putting the brakes on local growth; we may be helping to create the next enrollment crash. At a school like mine, if we teach the people of our state that we can't handle their CS students, then the people of our state will send their students elsewhere.

The problem for any one university, of course, is that it can act only based on its own resources and under local constraints. My dean and provost might care a lot about the global issues of demand for CS grads and need for greater diversity among CS students. But their job is to address local issues with their own (small) pool of money.

I'll have to re-read the papers Roberts has written about this topic. His remarks certainly gave us plenty to think about, and he was as engaging as ever.


Posted by Eugene Wallingford | Permalink | Categories: Computing, Teaching and Learning

October 02, 2017 12:16 PM

The Challenge Facing CS Education

Today and tomorrow, I am at a CS Education Summit in Pittsburgh. I've only been to Pittsburgh once before, for ICFP 2002 (the International Conference on Functional Programming) and am glad to be back. It's a neat city.

The welcome address for the summit was given by Dr. Farnam Jahanian, the interim president at Carnegie Mellon University. Jahanian is a computer scientist, with a background in distributed computing and network security. His resume includes a stint as chair of the CS department at the University of Michigan and a stint at the NSF.

Welcome addresses for conferences and workshops vary in quality. Jahanian gave quite a good talk, putting the work of the summit into historical and cultural context. The current boom in CS enrollments is happening at a time when computing, broadly defined, is having an effect in seemingly all disciplines and all sectors of the economy. What does that mean for how we respond to the growth? Will we see that the current boom presages a change to the historical cycle of enrollments in coming years?

Jahanian made three statements in particular that for me capture the challenge facing CS departments everywhere and serve as a backdrop for the summit:

  • "We have to figure out how to teach all of these students."

    Unlike many past enrollment booms, "all of these students" this time comprises two very different subsets: CS majors and non-majors. We have plenty of experience teaching CS majors, but how do you structure your curriculum and classes when you have three times as many majors? When numbers go up far enough fast enough, many schools have a qualitatively different problem.

    Most departments have far less experience teaching computer science (not "literacy") to non-majors. How do you teach all of these students, with different backgrounds and expectations and needs? What do you teach them?

  • "This is an enormous responsibility."

    Today's graduates will have careers for 45 years or more. That's a long time, especially in a world that is changing ever more rapidly, in large part due to our own discipline. How different are the long-term needs of CS majors and non-majors? Both groups will be working and living for a long time after they graduate. If computing remains a central feature of the world in the future, how we respond to enrollment growth now will have an outsized effect on every graduate. An enormous responsibility, indeed.

  • "We in CS have to think about impending cultural changes..."

    ... which means that we computer science folks will need to have education, knowledge, and interests much broader than just CS. People talk all the time about the value of the humanities in undergraduate education. This is a great example of why. One bit of good news: as near as I can tell, most of the CS faculty in this room, at this summit, do have interests and education bigger than just computer science (*). But we have to find ways to work these issues into our classrooms, with both majors and non-majors.

Thus the idea of a CS education summit. I'm glad to be here.

(*) In my experience, it is much more likely to find a person with a CS or math PhD and significant educational background in the humanities than to find a person with a humanities PhD and significant educational background in CS or math (or any other science, for that matter). One of my hopes for the current trend of increasing interest in CS among non-CS majors is that we an close this gap. All of the departments on our campuses, and thus all of our university graduates, will be better for it.


Posted by Eugene Wallingford | Permalink | Categories: Computing, Teaching and Learning