TITLE: A Laugh at My Own Expense AUTHOR: Eugene Wallingford DATE: September 10, 2013 3:40 PM DESC: ----- BODY: This morning presented a short cautionary tale for me and my students, from a silly mistake I made in a procmail filter. Back story: I found out recently that I am still subscribed to a Billy Joel fan discussion list from the 1990s. The list has been inactive for years, or I would have been filtering its messages to a separate mailbox. Someone has apparently hacked the list, as a few days ago it started spewing hundreds of spam messages a day. I was on the road for a few days after the deluge began and was checking mail through a shell connection to the mail server. Because I was busy with my trip and checking mail infrequently, I just deleted the messages by hand. When I got back, Mail.app soon learned they were junk and filtered them away for me. But the spam was still hitting my inbox on the mail server, where I read my mail occasionally even on campus. After a session on the server early this morning, I took a few minutes to procmail them away. Every message from the list has a common pattern in the Subject: line, so I copied it and pasted it into a new procmail recipe to send all list traffic to /dev/null :
    :0
    * ^Subject.*[billyjoel]
    /dev/null
Do you see the problem? Of course you do. I didn't at the time. My blindness probably resulted from a combination of the early hour, a rush to get over to the gym, and the tunnel vision that comes from focusing on a single case. It all looked obvious. This mistake offers programming lessons at several different levels. The first is at the detailed level of the regular expression. Pay attention to the characters in your regex -- all of them. Those brackets really are in the Subject: line, but by themselves mean something else in the regex. I need to escape them:
    * ^Subject.*\[billyjoel\]
This relates to a more general piece of problem-solving advice. Step back from individual case you are solving and think about the code you are writing more generally. Focused on the annoying messages from the list, the brackets are just characters in a stream. Looked at from the perspective of the file of procmail recipes, they are control characters. The second is at the level of programming practice. Don't /dev/null something until you know it's junk. Much better to send the offending messages to a junk mbox first:
    * ^Subject.*\[billyjoel\]
    in.tmp.junk
Once I see that all and only the messages from the list are being matched by the pattern, I can change that line send list traffic where it belongs. That's a specific example of the sort of defensive programming that we all should practice. Don't commit to solutions too soon. This, too, relates to more general programming advice about software validation and verification. I should have exercised a few test cases to validate my recipe before turning it loose unsupervised on my live mail stream. I teach my students this mindset and program that way myself, at least most of the time. Of course, the time you most need test cases will be the time you don't write them. The day provided a bit of irony to make the story even better. The topic of today's session in my compilers course? Writing regular expressions to describe the tokens in a language. So, after my mail admin colleague and I had a good laugh at my expense, I got to tell the story to my students, and they did, too. -----