TITLE: Taking Plain Text Seriously Enough
AUTHOR: Eugene Wallingford
DATE: March 14, 2022 11:55 AM
DESC:
-----
BODY:
Or, Plain Text and Spreadsheets -- Giving Up and Giving In
One day a couple of weeks ago, a colleague and I were discussing a
student. They said, I think I had that student in class a few
semesters ago, but I can't find the semester with his grade."
My first thought was, "I would just use grep to...". Then I
remembered that my colleagues all use Excel for their grades.
The next day, I saw Derek Sivers' recent post that gives the advice I
usually give when asked:
Write plain text files.
Over my early years in computing, I lost access to a fair bit of writing
that was done using various word processing applications. All stored
data in proprietary formats. The programs are gone, or have evolved
well beyond the version and OS I was using at the time, and my words
are locked inside. Occasionally I manage to pull something useful out
of one of those old files, but for the most part they are a graveyard.
No matter how old, the code and essays I wrote in plaintext are still
open to me. I love being able to look at programs I wrote for my
undergrad courses (including the first parser I ever wrote, in Pascal)
and my senior honors project (an early effort to implement Swiss System
pairings for chess tournament). All those programs have survived the
move from 5-1/4" floppies, through several different media, and still
open just fine in emacs. So do the files I used to create our wedding
invitations, which I wrote in troff(!).
The advice to write in plain text transfers nicely from proprietary
formats on our personal computers to tools that run out on the web.
The same week as Sivers posted his piece, a prolific Goodreads reviewer
reported losing all his work
when Goodreads had a glitch. The reviewer may have written in plain
text, but his reviewers are buried in someone else's system.
I feel bad for non-tech folks when they lose their data to a disappearing
format or app. I'm more perplexed when a CS prof or professional
programmer does. We know about plain text; we know the history of
tools; we know that our plain text files will migrate into the future
with us, usable in emacs and vi and whatever other plain text editors
we have available there.
I am not blind, though, to the temptation. A spreadsheet program does
a lot of work for us. Put some numbers here, add a formula or two over
there, and boom! your grades are computed and ready for entry
-- into the university's appalling proprietary system, where the data
goes to die. (Try to find a student's grade from a forgotten semester
within that system. It's a database, yet there are no
affordances available to users for the simplest tasks...)
All of my grade data, along with most of what I produce, is in plain
text. One cost of this choice is that I have to write my own code to
process it. This takes a little time, but not all that much, to be
honest. I don't need all of Numbers or Excel; all I need most of the
time is the ability to do simple computations and a bit of sorting. If
I use a comma-separated values format, all of my programming languages
have tools to handle parsing, so I don't even have to do much input
processing to get started. If I use Racket for my processing code,
throwing a few parens into the mix enables Racket to read my files into
lists that are ready for mapping and filtering to my heart's content.
Back when I started professoring, I wrote all of my grading software in
whatever language we were using in the class in that semester. That
seemed like a good way for me to live inside the language my students
were using and remind myself what they might be feeling as they wrote
programs. One nice side effect of this is that I have grading code
written in everything from Cobol to Python and Racket. And data from
all those courses is still searchable using grep, filterable
using cut, and processable using any code I want to write today.
That is one advantage of plain text I value that Sivers doesn't emphasize:
flexibility. Not only will plain text survive into the future...
I can do anything I want with it.
I don't often feel powerful in this world, but I feel powerful when I'm
making data work for me.
In the end, I've traded the quick and easy power of Excel and its ilk
for the flexible and power of plain text, at a cost of writing a little
code for myself. I like writing code, so this sort of trade is usually
congenial to me. Once I've made the trade, I end up building a set of
tools that I can reuse, or mold to a new task with minimal effort. Over
time, the cost reaches a baseline I can live with even when I might wish
for a spreadsheet's ease. And on the day I want to write a complex
function to sort a set of records, one that befuddles Numbers's sorting
capabilities, I remember why I like the trade. (That happened yet again
last Friday.)
A recent tweet from Michael Nielsen
quotes physicist Steven Weinberg as saying, "This is often the way it
is in physics -- our mistake is not that we take our theories too
seriously, but that we do not take them seriously enough." I think
this is often true of plain text: we in computer science forget to take
its value and power seriously enough. If we take it seriously, then we
ought to be eschewing the immediate benefits of tools that lock away our
text and data in formats that are hard or impossible to use, or that may
disappear tomorrow at some start-up's whim. This applies not only to our
operating system and our code but also to what we write and to all the
data we create. Even if it means foregoing our commercial spreadsheets
except for the most fleeting of tasks.
-----