My Experience at the Digital Library Program: November 2012

Friday, November 16, 2012

Week 13: Survey Revision and Indexing

After the meeting on Thursday, I made some changes to the survey. Some members of the group brought up some really important points. One was that we need to take into account the difference between item-level cataloging and collection-level cataloging. The survey as written before would not have been able to capture that. There was also the issue of whether Photocat was primarily a cataloging tool for the collection or an end-user discovery tool. With this information, I went back to revise the survey. I added a question to the survey that would hopefully provide answers from the collection managers about whether their collection included information that was not anywhere else. I also added a Venn Diagram accompanied by a question that asked whether Photocat shared information with the collection's external records. Lastly, I added a two-part question that asked if the following metadata fields (that are currently used by at least 50% of all collections) would satisfy the core set needs and if not, which of the following would. We hope to get this survey out right after Thanksgiving.

Another thing I did this week was think about indexable fields for Photocat. After reviewing how the collections use the fields, and considering what kind of information could be easily searched for, I narrowed a list down to 5 indexable fields: Country, Date Taken, Creator, Topical Subject and US State. After discussing with other members of the group, it seemed like most everyone had come to a similar conclusion.

Lastly, I did a bit of review of Schematron this week in order to keep it fresh in my mind since I will be picking that back up in January. Now that I know what I'm doing, it should be easier.

Friday, November 9, 2012

Reading #8

Greenberg, J. (2001). A Quantitative Categorical Analysis of Metadata Elements in Image-Applicable Metadata Schemas. Journal of the American Society for Information Science and Technology, 52(11), 917-924.

A quantitative analysis on the metadata schemas Dublin Core, VRA Core, REACH and EAD with regards to their usefulness in describing visual images. Metadata elements comprising the schemas were individually studied and grouped according to the four metadata classes established for the study: discovery, use, authentication, and administration, taking care to evaluate the applicability of each element to both print and digital images. Each metadata element had a minimum of being assigned to one class and a maximum of being assigned to four classes. The metadata element that met the qualifications of more than one class was considered multi-functional. Each of the metadata schemas had elements that supported functions of each of the classes established by the study. The results illuminate the need for a reconsideration of metadata schemas and perhaps a move away from cataloging-based schemas towards a class-oriented, functionality-based metadata schema for images across multiple domains.

Thursday, November 8, 2012

Week 12: Semantic Grouping of ICO Field Names

After the last meeting with the Metadata Working Group, each member was asked to either write up or put into a spreadsheet their ideas for grouping duplicate fields and to send it to me so that I could write up a summary. Semantically, there are several fields that are similar or identical, and in order to work on determining a core set, some of these fields may need to be grouped together. Alternatively, some of them may need to be less emphasized in favor of a more universal field name.

One of my tasks this week was to write up a summary comparing two members' suggestions for dealing with duplicate or similar fields. I identified the following:

Brad suggested only using Accession Number and to do away with Acquisition Date, Donor Name, Donor Notes, Location Code (archives notes this in their Accession record), Physical Location (archives notes this in their Accession record), Physical Location Shelf/Box/Folder (archives notes this in their Accession record), Lily Location (archives notes this in their accession record), Seller and Provenance.

Ronda did not suggest removing any of these field names, but she did divide and group them. She combined Provenance, Seller and Donor Name into one category that she named Provenance. She then tentatively grouped Acquisition Date and Donor notes into another category that she called Internal Technical/Administrative Information. Location Code, Physical Location, Physical Location Shelf/Box/Folder and Lily Location were then all combined into a Location of Original category, along with Accession Number. She felt that these were all used to identify the location of the original item or parent, but it was not completely clear from the descriptors how much they overlap.

Brad then proposed combining six fields into one, but repeatable field: Alt ID, Call Number, Title Control Number (but this links to IUCAT), Donor ID, External URL (this links to something), and Roll and Frame #. The main concern with removing these fields would be losing semantics if a field links or points to something.

Ronda addressed these fields as well, but did not suggest considering their removal. Again, she tried to group them semantically. She combined Alt ID, Call Number, Donor ID and Roll and Frame #. She sees this group as various ID numbers assigned specifically to the item (as opposed to the parent or collection unit). Title Control Number and External URL were combined into a Supplemental Metadata category. She questioned whether Accession Number could function similarly and therefore belong in that category as well. She mentioned that the Title Control Number could potentially be used to link out to a collection-level MARC record. External URL is more generic so it could maybe be used for both, but she points out that it could only work if the external resource could be identified.

Brad questioned whether Abstract, Caption, Physical Description and Photographer’s Description could all be combined into one free-text field. For example, “The physical description is albumen print. . .” or “The photographer described this photo, in full, blah. . .”

Ronda grouped Abstract, Caption and Photographer’s Description into one category called Description. She thought these could be changed into something more generic, maybe with a dropdown box to indicate the source (cataloger, caption, photographer, person pictured, etc.) Ronda did not include Physical Description into that grouping, but rather into a category she named Description of Physical Object with other fields like Material and Film Type.

Ronda’s other groupings that do not overlap with Brad’s ideas are on the Metadata Subgroup wiki.

Thursday, November 1, 2012

Week 11: Field Names, Display Labels and Surveys

This was the first full week where I worked on the Image Collection Online. During our last meeting with the Metadata Working Group I was given the task of comparing field names with the actual display labels that collection managers were using when describing their digital resources in ICO. The collection managers and other people working in the different collections use Photocat to enter metadata about the items in the collection. I was provided screenshots for all of the collections that use Photocat so that I could see the difference between the field name and the display label. I then put this information into a spreadsheet. I had columns for field type and then individual columns for the name of each collection accompanied by another column where I could denote whether the label was viewable to the public. I actually created two spreadsheets, one for live collections and one for non-live collections.

As I started entering the data, I began to see that not only do collections use some field names differently than intended by the DLP, but there are also inconsistencies with the way the field names are perceived amongst the collections. In order to highlight the collections that are using the field names in the same way, I highlighted those rows in green. For example, all of the collections that use the field name 'Photographer' also all use the same display label, 'Photographer'. But the field name 'City' is not used similarly amongst all the collections that use that field name. The display labels differ, some use 'City' and others use 'City/Town/Village'. It is this kind of information, laid out in a spreadsheet that may help the members of the Metadata Working Group to get an idea of how collection managers utilize the field names for the purposes of their collection. This could also maybe help in determining a core set.

Another method we are using to help narrow down a core set is through the distribution of a survey. I have begun drafting a short survey, no more than 5 questions, that will try and determine how people are using Photocat. I enjoy this aspect because it is using what I've learned here at SLIS, which is communicating with the users to identify how to best create a system/service etc. to help them with their information needs.