TITLE: Taking Plain Text Seriously Enough AUTHOR: Eugene Wallingford DATE: March 14, 2022 11:55 AM DESC: ----- BODY: Or, Plain Text and Spreadsheets -- Giving Up and Giving In One day a couple of weeks ago, a colleague and I were discussing a student. They said, I think I had that student in class a few semesters ago, but I can't find the semester with his grade." My first thought was, "I would just use grep to...". Then I remembered that my colleagues all use Excel for their grades. The next day, I saw Derek Sivers' recent post that gives the advice I usually give when asked: Write plain text files. Over my early years in computing, I lost access to a fair bit of writing that was done using various word processing applications. All stored data in proprietary formats. The programs are gone, or have evolved well beyond the version and OS I was using at the time, and my words are locked inside. Occasionally I manage to pull something useful out of one of those old files, but for the most part they are a graveyard. No matter how old, the code and essays I wrote in plaintext are still open to me. I love being able to look at programs I wrote for my undergrad courses (including the first parser I ever wrote, in Pascal) and my senior honors project (an early effort to implement Swiss System pairings for chess tournament). All those programs have survived the move from 5-1/4" floppies, through several different media, and still open just fine in emacs. So do the files I used to create our wedding invitations, which I wrote in troff(!). The advice to write in plain text transfers nicely from proprietary formats on our personal computers to tools that run out on the web. The same week as Sivers posted his piece, a prolific Goodreads reviewer reported losing all his work when Goodreads had a glitch. The reviewer may have written in plain text, but his reviewers are buried in someone else's system. I feel bad for non-tech folks when they lose their data to a disappearing format or app. I'm more perplexed when a CS prof or professional programmer does. We know about plain text; we know the history of tools; we know that our plain text files will migrate into the future with us, usable in emacs and vi and whatever other plain text editors we have available there. I am not blind, though, to the temptation. A spreadsheet program does a lot of work for us. Put some numbers here, add a formula or two over there, and boom! your grades are computed and ready for entry -- into the university's appalling proprietary system, where the data goes to die. (Try to find a student's grade from a forgotten semester within that system. It's a database, yet there are no affordances available to users for the simplest tasks...) All of my grade data, along with most of what I produce, is in plain text. One cost of this choice is that I have to write my own code to process it. This takes a little time, but not all that much, to be honest. I don't need all of Numbers or Excel; all I need most of the time is the ability to do simple computations and a bit of sorting. If I use a comma-separated values format, all of my programming languages have tools to handle parsing, so I don't even have to do much input processing to get started. If I use Racket for my processing code, throwing a few parens into the mix enables Racket to read my files into lists that are ready for mapping and filtering to my heart's content. Back when I started professoring, I wrote all of my grading software in whatever language we were using in the class in that semester. That seemed like a good way for me to live inside the language my students were using and remind myself what they might be feeling as they wrote programs. One nice side effect of this is that I have grading code written in everything from Cobol to Python and Racket. And data from all those courses is still searchable using grep, filterable using cut, and processable using any code I want to write today. That is one advantage of plain text I value that Sivers doesn't emphasize: flexibility. Not only will plain text survive into the future... I can do anything I want with it. I don't often feel powerful in this world, but I feel powerful when I'm making data work for me. In the end, I've traded the quick and easy power of Excel and its ilk for the flexible and power of plain text, at a cost of writing a little code for myself. I like writing code, so this sort of trade is usually congenial to me. Once I've made the trade, I end up building a set of tools that I can reuse, or mold to a new task with minimal effort. Over time, the cost reaches a baseline I can live with even when I might wish for a spreadsheet's ease. And on the day I want to write a complex function to sort a set of records, one that befuddles Numbers's sorting capabilities, I remember why I like the trade. (That happened yet again last Friday.) A recent tweet from Michael Nielsen quotes physicist Steven Weinberg as saying, "This is often the way it is in physics -- our mistake is not that we take our theories too seriously, but that we do not take them seriously enough." I think this is often true of plain text: we in computer science forget to take its value and power seriously enough. If we take it seriously, then we ought to be eschewing the immediate benefits of tools that lock away our text and data in formats that are hard or impossible to use, or that may disappear tomorrow at some start-up's whim. This applies not only to our operating system and our code but also to what we write and to all the data we create. Even if it means foregoing our commercial spreadsheets except for the most fleeting of tasks. -----