My Experience at the Digital Library Program

Friday, April 26, 2013

Reading #9

Foulonneau, M., & Riley, J. (2008). The future of metadata. In Metadata for Digital Resources: Implementation, Systems Design and Interoperability ( pp. 187-197). Oxford: Chandos Publishing.

Metadata began as a simple set of functions, primarily for cataloging. Increasingly, the flexibility and extensibility of metadata are gaining value. A new challenge facing the cultural heritage sector is the development of new ways of gathering large amounts of data (conceptually), relating it to the relevant resources, and using and reusing the metadata across a wide spectrum of applications. The authors address four trends that are influencing and will continue to influence metadata work in the years to come. Automated metadata generation is predicted to continue its place in digital workflows. Some types of metadata are better suited to automated generation and others, like descriptive metadata, can pose a problem. DC Dot is a tool that can be used to automatically generate metadata from web pages. The tool can suggest keywords after an analysis of the text. Tools like these are still in development and manual generation is usually also needed. The second trend is the influence of Web 2.0. The authors predict that user participation, such as recommendations/reviews, tagging and content sharing will have great potential for enriching digital library applications. The third trend concerns strategies for metadata management. The authors suggest broad and accommodating, yet clearly defined usage conditions for metadata records in order to provide the best possible flexibility for future use. Lastly, the authors suggest that as metadata changes, so must the institution’s mission statement. Particularly, the mission statement should address the issue of cooperation between institutions. The institution’s ability to position itself directly inside the circle in which its users and colleagues exist is directly related to its ability to fulfill its primary mission.

Wednesday, April 24, 2013

Week 29: Wrapping Up

So, not much can be said of this week. Unfortunately, the work on Vict Bib that other people were doing is not finished yet, so I cannot complete my part. I am quite disappointed and really wish that I could have left here on a high note. Instead of writing about what I did this week, I thought I'd wrap up this blog addressing a question that I actually have been asked multiple times by multiple people outside of the library/information science world. The question is, "what are the differences between HTML and XML?" I have tried in many ways to answer the question, but for once and for all, I want to have my answer here.

So, what are the main differences between HTML and XML? (Not in any particular order)
1. HTML is static, but XML can carry data between platforms, so it is dynamic.
2. HTML has pre-defined tags, while XML is more flexible and allows the inclusion of custom tags created by the author of the document.
3. HTML is more relaxed about closing tags than XML.
4. HTML was created for design and presentation, whereas XML was originally meant to transport data between an application and a database.
5. The most important distinction is that HTML is concerned with how the data looks, but XML exists to describe the data and is concerned with presentation only if it further reveals the meaning within the data.

So, I guess I need to write down a quick breakdown of what I have done during my time here at the DLP.

1. I have rewritten the Schematron for validating TEI-encoded Victorian Women Writers Project texts. Through this process I have had experience with XML, XPath and quality control of electronic texts while increasing my comfort level with TEI.

2. In collaboration with metadata experts across IU Libraries, I helped to define a core metadata set for use in the ICO and Photocat. I developed my analysis and survey-writing skills while juggling thinking both broadly and on a small scale. I became acquainted with MODS while mapping the Photocat fields to MODS and then later tweaking existing XSLT and drafting a new XSLT for the new core set.

3. I was introduced to METS during the METS Navigator project. I came to appreciate the transition and great amount of work necessary to migrate sets of data from one iteration to the next. I did further mapping and analysis work and tried my hand at writing specifications to be turned over to programmers to create a new and improved METS Navigator 3.0 with pop-up found through finding aids.

4. The last project, as short as my time with it was, allowed me to work with CSV files- a format I had not worked with before- and introduced me to online data extraction tools (cb2bib, text2bib). I also came to have a greater understanding and respect for the steps necessary and the care it takes to migrate large amounts of data.

Although I knew that working in any environment with other people demands a great deal of patience, I have to admit my patience was tried multiple times here, especially the last part of my second semester. While I was frustrated a lot in the beginning of my internship, that was more with myself, trying to figure out Schematron. This time, I became frustrated because I felt I had to wait a lot for people to finish their part of the project until I could do my part. I don't blame anyone, and I understand it is just part of a working environment. That being said, I really valued my time here and I can honestly say I've learned more than I could have by just taking classes. I dont' know what the future holds for me, but I know I am better prepared than I was before my digital library life.

Friday, April 12, 2013

Week 28: Working with CSV files

Michelle had asked me to try and figure out how the best way to handle Vict Bib records that have multiple authors. The data will need to be extracted as discretely as possible. The problem right now on the Vict Bib website is that multiple authors are displayed as one entity. Also, there are some glitches where editors, authors and translators are interchanged, mostly when a user performs a search. It seems like a bit of Drupal might solve the problem. The Feeds Tamper module provides preprocessing functionality before the data is mapped to the entity fields. The Explode plugin will "explode" the values into an array.

I then met with Michelle and we discovered together that the CSV file will need to be tweaked to transform the commas into pipes. Commas can be used for other instances, for example, in titles. So, the best thing is to used the "|" instead. However, this requires that we control CSV export. Michelle recommends that we pass on what I've worked on so far to the programmers to see what kind of export they can provide for me. From there, I will start the ingestion of either the CSV file into cb2bib or BibTeX into Zotero. I only have five hours of my internship remaining, so it will have to be one or the other.

Reading #8

Weir, R. O. (2012). Making Electronic Resources Accessible. In Managing Electronic Resources (pp. 69-86). Chicago: ALA.

All libraries need to develop the best possible service and be responsible for the development and maintenance of online access tools, regardless of the variations between and situations within each library. Having selected appropriate tools, or at least the best tools available, it is important for librarians to remember the purpose of these tools and what libraries need to do. Libraries and librarians are in competition with other information services and there needs to be a focus on how users actually find information. The chapter outlines some general principles that should guide librarians in the development of a successful ASER (access system for electronic resources), regardless of the online access tools that are chosen. 1) Provide quality metadata, 2) ensure convenience for the user, 3) simplicity is best, 4) eliminate all unnecessary steps to access content, 5) make branding ubiquitous, 6) solicit feedback and 7) make assessments.

Friday, April 5, 2013

Week 27: Working with bibliographic data

After completing the list of Vict Bib fields I moved onto make the decision to use cb2bib for the data extraction. Once I decided that, I began to read manuals on cb2bib configuration. It appears that I may have to do some command line work, so I will need to look into that more. While researching cb2bib I also found some information about BibTeX (the format cb2bib will turn the CSV file into). I learned that BibTeX is a reference management software for formatting lists of references. The software makes it easy to cite sources in a consistent manner, by separating bibliographic information from the presentation. Zotero supports this software and can be used to output BibTeX data. After completing this and bringing all of this together in my mind, I began the process of mapping the Vict Bib fields to BibTex fields. Mapping to me is like a puzzle and I really enjoy that part. It's like translation and as a language person, I understand it pretty easily. While there is not always an equivalence between fields, it's fun and challenging trying to find the closest elements.

Friday, March 29, 2013

Week 26: cb2bib or text2bib?

I didn't do many things this week, but I really got into the Vict Bib project. Earlier in the week I began a list of Vict Bib fields based on a spreadsheet and an examination of possible fields on the website. This was a loooooong process because there isn't a way to have the website automatically display all fields. I had to scroll through many records-pages and pages actually-in order to make sure that I had identified all of the fields.

Later in the week I started reading documentation on cb2bib and text2bib. The cb2Bib is a free, open source, and multiplatform application for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files. Text2Bib is a PHP script for converting references to BibTeX format. However, it seems like it cannot detect some of the document types that Vict Bib uses. Lastly, I read quite a few forum posts on the subject of data extraction. So, again, this week was not much "doing", but a lot of preparation for what's to come.

Reading #7

Weir, R. O. (2012). Gathering, Evaluating and Communicating Statistical Usage Information for Electronic Resources. In Managing Electronic Resources (pp. 87-119). Chicago: ALA.

When evaluating e-resources, it is vital to take a close look at the usefulness patrons derive from them compared to the investment, such as purchasing and licensing. The importance of usage data and statistics in making and justifying e-resource renewal decisions is substantial. The challenges faced in gathering data, creating and processing statistics, and reaching responsible conclusions are quickly increasing. Prior to tackling these challenges, an evaluator must be certain of what he or she wants to achieve and how he or she is willing to tackle it, deciding upon a scale of usage analysis that is meaningful and sustainable in each library’s individual context. The library and user communities must be aware of the subtle variations that exist in the definitions of usage between vendors. The attempts to achieve standards, and the nuances of specific data items equip the librarian to use usage data wisely.