TITLE: My Experience Storing an Entire Course Directory in Git AUTHOR: Eugene Wallingford DATE: January 08, 2021 2:12 PM DESC: ----- BODY: Last summer, I tried something new: I stored the entire directory of materials for my database course in Git. This included all of my code, the course website, and everything else. It worked well. The idea came from a post or tweet many years ago by Martin Fowler who, if I recall correctly, had put his entire home directory under version control. It sounded like the potential advantages might be worth the cost, so I made a note to try it myself sometime. I wasn't quite ready last summer to go all the way, so I took a baby step by creating my new course directory as a git repo and growing it file by file. My context is pretty simple. I do almost all of my work on a personal MacBook Pro or a university iMac in my office. My main challenge is to keep my files in sync. When I make changes to a small number of files, or when the stakes of a missing file are low, copying files by hand works fine, with low overhead and no tooling necessary. When I make a lot of changes in a short period of time, however, as I sometimes do when writing code or building my website, doing things by hand becomes more work. And the stakes of losing code or web pages are a lot higher than losing track of some planning notes or code I've been noodling with. To solve this problem, for many years I have been using rsync and a couple of simple shell scripts to manage code directories and my course web sites. So, the primary goal for using Git in a new workflow was to replace rsync. Not being a Git guru, as many of you are, I figured that this would also force me to live with git more often and perhaps expand my tool set of handy commands. My workflow for the semester was quite simple. When I worked in the office, there were four steps:
  1. git merge laptop
  2. [ do some work ]
  3. git commit
  4. git push
On my laptop, the opening and closing git commands changed:
  1. git pull origin main
  2. [ do some work ]
  3. git commit
  4. git push origin laptop
My work on a course is usually pretty straightforward. The most common task is to create files and record information with commit. Every once in a while, I had to back up a step with checkout. You may say, "But you are not using git for version control!" You would be correct. The few times I checked out an older version of a file, it was usually to eliminate a spurious conflict, say, a .DS_Store file that was out of sync. Locally, I don't need a lot of version control, but using Git this way was a form of distributed version control, making sure that, wherever I was working, I had the latest version of every file. I think this is a perfectly valid way to use Git. In some ways, Git is the new Unix. It provided me with a distributed filesystem and a file backup system all in one. The git commands ran effectively as fast as their Unix counterparts. My repo was not very much bigger than the directory would have been on its own, and I always had a personal copy of the entire repo with me wherever I went, even if I had to use another computer. Before I started, several people reminded me that Git doesn't always work well with large images and binaries. That didn't turn out to be much of a problem for me. I had a couple of each in the repo, but they were not large and never changed. I never noticed a performance hit. The most annoying hiccup all semester was working with OS X's .DS_Store files, which record screen layout information for OS X. I like to keep my windows looking neat and occasionally reorganize a directory layout to reflect what I'm doing. Unfortunately, OS X seems to update these files at odd times, after I've closed a window and pushed changes. Suddenly the two repos would be out of sync only because one or more .DS_Store files had changed after the fact. The momentary obstacle was quickly eliminated with a checkout or two before merging. Perhaps I should have left the .DS_Stores untracked... All in all, I was pretty happy with the experience. I used more git, more often, than ever before and thus am now a bit more fluent than I was. (I still avoid the hairier corners of the tool, as all right-thinking people do whenever possible.) Even more, the repository contains a complete record of my work for the semester, false starts included, with occasional ruminations about troubles with code or lecture notes in my commit messages. I had a little fun after the semester ended looking back over some of those messages and making note of particular pain points. The experiment went well enough that I plan to track my spring course in Git, too. This will be a bigger test. I've been teaching programming languages for many years and have a large directory of files, both current and archival. Not only are there more files, there are several binaries and a few larger images. I'm trying decide if I should put the entire folder into git all at once upfront or start with an empty folder a lá last semester and add files as I want or need them. The latter would be more work at early stages of development but might be a good way to clear out the clutter that has built up over twenty years. If you have any advice on that choice, or any other, please let me know by email or on Twitter. You all have taught me a lot over the years. I appreciate it. -----