I didn't do many things this week, but I really got into the Vict Bib project. Earlier in the week I began a list of Vict Bib fields based on a spreadsheet and an examination of possible fields on the website. This was a loooooong process because there isn't a way to have the website automatically display all fields. I had to scroll through many records-pages and pages actually-in order to make sure that I had identified all of the fields.
Later in the week I started reading documentation on cb2bib and text2bib. The cb2Bib is a free, open source, and multiplatform application for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files. Text2Bib is a PHP script for converting references to BibTeX format. However, it seems like it cannot detect some of the document types that Vict Bib uses. Lastly, I read quite a few forum posts on the subject of data extraction. So, again, this week was not much "doing", but a lot of preparation for what's to come.
No comments:
Post a Comment