TITLE: Data Ingestion AUTHOR: Eugene Wallingford DATE: June 25, 2013 2:46 PM DESC: ----- BODY: This sentence in Reid Draper's Data Traceability made me laugh recently:
I previously worked in the data ingestion team at a music data company.
Nice turn of phrase. I suppose that another group digests the data, and yet another expels it. Draper's sentence came to mind again yesterday while I was banging my head on a relatively simple problem, transforming a CSV file generated by my university's information system, replete with embedded quotes and commas, into something more manageable. As data ingestion goes, this isn't much of a problem at all. There are plenty of libraries that do the heavy lifting for you, in most any language you choose, Ruby included. Of course, I was just writing a quick-and-dirty script, so I was rolling my own CSV-handling code. As usual, "quick and dirty" is often dirty, but rarely quick. I tweeted a bit of my frustration, in response to which @geoffwozniak wrote:
Welcome to the world of enterprise data ingress.
If I had to deal with these files everyday, I might head for the egress. ... or master a good library, so that I could bang my head on more challenging data ingestion problems. -----