Friday, April 12, 2013

Week 28: Working with CSV files

Michelle had asked me to try and figure out how the best way to handle Vict Bib  records that have multiple authors.  The data will need to be extracted as discretely as possible.  The problem right now on the Vict Bib website is that multiple authors are displayed as one entity.  Also, there are some glitches where editors, authors and translators are interchanged, mostly when a user performs a search.  It seems like a bit of Drupal might solve the problem.  The Feeds Tamper module provides preprocessing functionality before the data is mapped to the entity fields.  The Explode plugin will "explode" the values into an array.

I then met with Michelle and we discovered together that the CSV file will need to be tweaked to transform the commas into pipes.  Commas can be used for other instances, for example, in titles.  So, the best thing is to used the "|" instead.  However, this requires that we control CSV export.  Michelle recommends that we pass on what I've worked on so far to the programmers to see what kind of export they can provide for me.  From there, I will start the ingestion of either the CSV file into cb2bib or BibTeX into Zotero.  I only have five hours of my internship remaining, so it will have to be one or the other. 

No comments:

Post a Comment