My Experience at the Digital Library Program

Friday, December 7, 2012

Reading #9

Park, J. R., & Tosaka, Y. (2010). Metadata quality control in digital repositories and collections: Criteria, semantics, and mechanisms. Cataloging & Classification Quarterly, 48(8), 696-715.

A systematic assessment of practices and issues that affect the quality of metadata in digital repositories and collections is reviewed. The researchers distributed a web-based survey to approximately 600 participants, mostly heads of cataloging and technical services, via mailing lists relevant to the field. A total of 303 people completed the survey. The results of the surveys can be categorized in the following three ways: perceived importance of metadata quality control, criteria in use to measure metadata quality, and the utilization of quality control mechanisms in digital repositories. Of note, the study found that metadata semantics is perceived to be less important than content standards for quality control. This contrasts with 45% and 41% of respondents stating that semantic overlaps and ambiguities, respectively, are the two most significant factors that arise in the application of Dublin Core for their collections. This study emphasizes the need for a strong awareness of content-based metadata quality control in collaboration with metadata guidelines to guarantee consistency in resource description within and across digital collections.

Thursday, December 6, 2012

Week 14 &15: Finishing up

So, these past two weeks I have not done much work because I have basically finished up my hours. I have been coming in about once a week, and those hours may end up counting towards my internship next semester.
Last week, week of 11/26, I came to the Metadata Working Group meeting to discuss the findings of the survey. We received 13 responses, which was great! Based on the survey results we further refined the core set and also made the following conclusions:
          1. There is not much of a unique cataloging emphasis.
          2. The primary goal of the collections/collection managers is end-user discovery.
          3. The majority of collections’ external records share some info with Photocat.
          4. The core set as presented in the survey is not satisfactory.

The following fields are all now (some more confidently than others) in the core:

ABSTRACT
CAPTION
CITY
CREATOR
COPYRIGHT OWNER
COPYRIGHT STATUS
COUNTRY
FEATURED
MODIFYING USER
TITLE
TOPICAL SUBJECT
US STATE

Also, now I have begun compiling the core set definitions. I have sent out an email to the members of the metadata subgroup and hope to hear some responses soon.

This week, week of 12/3, I came in on 12/6 to meet with Michelle to finalize everything and have her sign the evaluation. After our meeting, I started some MODS mapping for the core fields that the group has decided upon. So far, I have all of the mappings except for three fields: COPYRIGHT STATUS, FEATURED and MODIFYING USER. I am really not sure if it is possible to even map those to MODS, so I will wait until next semester to speak to Julie about it.

So, I guess I am finished for this semester. I really appreciate the opportunity I had working here and with the great people at the DLP. I really am glad that I got a good, gentle introduction to XPath and XSL. I hope to get further acquainted with them in the future. I also learned that Schematron is not an easy thing to work on, especially when there are hangups. It is not this ubiquitous thing like XML or TEI, so it was tough to find a good community to consult with when we were having problems. It also was hard to find relevant literature on Schematron that wasn't just guidelines or documentation. Hopefully, Schematron will catch on, because it is a really useful tool. I will not wrap this up entirely, since I will be back in a few weeks to document another 4 1/2 months of my experiences at the Digital Library Program.

Friday, November 16, 2012

Week 13: Survey Revision and Indexing

After the meeting on Thursday, I made some changes to the survey. Some members of the group brought up some really important points. One was that we need to take into account the difference between item-level cataloging and collection-level cataloging. The survey as written before would not have been able to capture that. There was also the issue of whether Photocat was primarily a cataloging tool for the collection or an end-user discovery tool. With this information, I went back to revise the survey. I added a question to the survey that would hopefully provide answers from the collection managers about whether their collection included information that was not anywhere else. I also added a Venn Diagram accompanied by a question that asked whether Photocat shared information with the collection's external records. Lastly, I added a two-part question that asked if the following metadata fields (that are currently used by at least 50% of all collections) would satisfy the core set needs and if not, which of the following would. We hope to get this survey out right after Thanksgiving.

Another thing I did this week was think about indexable fields for Photocat. After reviewing how the collections use the fields, and considering what kind of information could be easily searched for, I narrowed a list down to 5 indexable fields: Country, Date Taken, Creator, Topical Subject and US State. After discussing with other members of the group, it seemed like most everyone had come to a similar conclusion.

Lastly, I did a bit of review of Schematron this week in order to keep it fresh in my mind since I will be picking that back up in January. Now that I know what I'm doing, it should be easier.

Friday, November 9, 2012

Reading #8

Greenberg, J. (2001). A Quantitative Categorical Analysis of Metadata Elements in Image-Applicable Metadata Schemas. Journal of the American Society for Information Science and Technology, 52(11), 917-924.

A quantitative analysis on the metadata schemas Dublin Core, VRA Core, REACH and EAD with regards to their usefulness in describing visual images. Metadata elements comprising the schemas were individually studied and grouped according to the four metadata classes established for the study: discovery, use, authentication, and administration, taking care to evaluate the applicability of each element to both print and digital images. Each metadata element had a minimum of being assigned to one class and a maximum of being assigned to four classes. The metadata element that met the qualifications of more than one class was considered multi-functional. Each of the metadata schemas had elements that supported functions of each of the classes established by the study. The results illuminate the need for a reconsideration of metadata schemas and perhaps a move away from cataloging-based schemas towards a class-oriented, functionality-based metadata schema for images across multiple domains.

Thursday, November 8, 2012

Week 12: Semantic Grouping of ICO Field Names

After the last meeting with the Metadata Working Group, each member was asked to either write up or put into a spreadsheet their ideas for grouping duplicate fields and to send it to me so that I could write up a summary. Semantically, there are several fields that are similar or identical, and in order to work on determining a core set, some of these fields may need to be grouped together. Alternatively, some of them may need to be less emphasized in favor of a more universal field name.

One of my tasks this week was to write up a summary comparing two members' suggestions for dealing with duplicate or similar fields. I identified the following:

Brad suggested only using Accession Number and to do away with Acquisition Date, Donor Name, Donor Notes, Location Code (archives notes this in their Accession record), Physical Location (archives notes this in their Accession record), Physical Location Shelf/Box/Folder (archives notes this in their Accession record), Lily Location (archives notes this in their accession record), Seller and Provenance.

Ronda did not suggest removing any of these field names, but she did divide and group them. She combined Provenance, Seller and Donor Name into one category that she named Provenance. She then tentatively grouped Acquisition Date and Donor notes into another category that she called Internal Technical/Administrative Information. Location Code, Physical Location, Physical Location Shelf/Box/Folder and Lily Location were then all combined into a Location of Original category, along with Accession Number. She felt that these were all used to identify the location of the original item or parent, but it was not completely clear from the descriptors how much they overlap.

Brad then proposed combining six fields into one, but repeatable field: Alt ID, Call Number, Title Control Number (but this links to IUCAT), Donor ID, External URL (this links to something), and Roll and Frame #. The main concern with removing these fields would be losing semantics if a field links or points to something.

Ronda addressed these fields as well, but did not suggest considering their removal. Again, she tried to group them semantically. She combined Alt ID, Call Number, Donor ID and Roll and Frame #. She sees this group as various ID numbers assigned specifically to the item (as opposed to the parent or collection unit). Title Control Number and External URL were combined into a Supplemental Metadata category. She questioned whether Accession Number could function similarly and therefore belong in that category as well. She mentioned that the Title Control Number could potentially be used to link out to a collection-level MARC record. External URL is more generic so it could maybe be used for both, but she points out that it could only work if the external resource could be identified.

Brad questioned whether Abstract, Caption, Physical Description and Photographer’s Description could all be combined into one free-text field. For example, “The physical description is albumen print. . .” or “The photographer described this photo, in full, blah. . .”

Ronda grouped Abstract, Caption and Photographer’s Description into one category called Description. She thought these could be changed into something more generic, maybe with a dropdown box to indicate the source (cataloger, caption, photographer, person pictured, etc.) Ronda did not include Physical Description into that grouping, but rather into a category she named Description of Physical Object with other fields like Material and Film Type.

Ronda’s other groupings that do not overlap with Brad’s ideas are on the Metadata Subgroup wiki.

Thursday, November 1, 2012

Week 11: Field Names, Display Labels and Surveys

This was the first full week where I worked on the Image Collection Online. During our last meeting with the Metadata Working Group I was given the task of comparing field names with the actual display labels that collection managers were using when describing their digital resources in ICO. The collection managers and other people working in the different collections use Photocat to enter metadata about the items in the collection. I was provided screenshots for all of the collections that use Photocat so that I could see the difference between the field name and the display label. I then put this information into a spreadsheet. I had columns for field type and then individual columns for the name of each collection accompanied by another column where I could denote whether the label was viewable to the public. I actually created two spreadsheets, one for live collections and one for non-live collections.

As I started entering the data, I began to see that not only do collections use some field names differently than intended by the DLP, but there are also inconsistencies with the way the field names are perceived amongst the collections. In order to highlight the collections that are using the field names in the same way, I highlighted those rows in green. For example, all of the collections that use the field name 'Photographer' also all use the same display label, 'Photographer'. But the field name 'City' is not used similarly amongst all the collections that use that field name. The display labels differ, some use 'City' and others use 'City/Town/Village'. It is this kind of information, laid out in a spreadsheet that may help the members of the Metadata Working Group to get an idea of how collection managers utilize the field names for the purposes of their collection. This could also maybe help in determining a core set.

Another method we are using to help narrow down a core set is through the distribution of a survey. I have begun drafting a short survey, no more than 5 questions, that will try and determine how people are using Photocat. I enjoy this aspect because it is using what I've learned here at SLIS, which is communicating with the users to identify how to best create a system/service etc. to help them with their information needs.

Friday, October 26, 2012

Reading #7

Lim, S. & Liew, C. (2011). Metadata quality and interoperability of GLAM digital images. AsLib Proceedings, 63(5), 484-498. doi:10.1108/00012531111164978

An exploration of how metadata have been appropriated in galleries, libraries, archives and museums (GLAM) in institutions in New Zealand and an analysis of its quality with the regards to the interoperability of its metadata set. The data collection took place in two stages. First, the metadata records of 16 institutions affiliated with GLAM in New Zealand were analyzed for the kinds and extent of metadata used. However, because these records were publicly accessed, it was impossible to view the metadata that were kept from public view. Therefore, interviews with staff from the institutions were conducted. The study found that the digital image metadata records amongst the four types of institutions differed in their emphases on metadata types and function. A second issue is the lack of variety of metadata. Thirdly, not enough institutions are employing technical metadata in their records, resulting in possible loss of important data. It appears that many institutions treat their digital images as surrogates of physical collections. Further research is proposed on the importance of types of data from the user perspective for the best retrieval and interoperability.