Lim, S. & Liew, C. (2011). Metadata quality and
interoperability of GLAM digital images. AsLib
Proceedings, 63(5), 484-498.
doi:10.1108/00012531111164978
Friday, October 26, 2012
Reading #7
Thursday, October 25, 2012
Week 10: A Very 'Duh' Moment
It’s funny. Most of
the time, the message ‘Validation failed’ with the little red icon is such a
letdown. Now, it’s a pleasant experience.
Seeing that message means that I’m doing something correct! It means that my XPath expressions are
correct and that my assertions and reports are well-written. After roughly 2 and a half weeks of stalling
due to my silly mistake (I feel really bad for wasting everyone’s time), the
problem with Schematron has finally been worked out. It turns out that my XPath was not properly
written. I had been writing them out
including the child node that I wanted to test for. It turns out that the XPath has to end at the
parent node. So, once I made all of the
corrections, things started moving along again.
I’m still playing around with some assertions, such as the one that will
check for date format. I tried doing a
string length, but I’ve been receiving an error saying that date and string
length cannot be compared. I’ve tried
matches, but apparently I need two arguments for a match. But I have solved some other things, such as
how to check that an @href value contains ‘http://purl.dlib.indiana.edu/’. I’m relieved that this is back up and
running.
I have also started some preliminary work on the Image
Collections Online project. I’ve
reviewed the elements and attributes and begun to see redundancies. I’ve read the Wiki pages for the project and
am getting a good idea on the workflow of the project. Now that Schematron is working again, I’m not
sure if I’ll still be working on the ICO project. There is a meeting tomorrow that I am going
to attend and we will all discuss whether there is a need for an intern and if
so, what more specifically I can do.
While some of the work that I would do has already been completed, it
was done by several different people over a long period of time, and some of
the end products may not be relevant or accurate anymore.
Hopefully, I will have the Header section of the VWWP
Schematron complete by the end of today and then finish the Body section
tomorrow.
I also plan on attending the Digital Brown Bag next Wednesday about the Wikipedia GLAM project. I have never heard of it, so I decided to do my reading on it and hopefully will learn something interesting.
I also plan on attending the Digital Brown Bag next Wednesday about the Wikipedia GLAM project. I have never heard of it, so I decided to do my reading on it and hopefully will learn something interesting.
Thursday, October 18, 2012
Week 9: A change of plans
So, it seems like things have really changed. After the weeks long stall due to the
technical problems running Schematron, I spoke with Michelle about where to go
from here. Until she is able to get in
touch with people from Mulberry Technologies and other Schematron list groups,
there is very little I can do with Schematron, besides some editing here and
there.
During our meeting, she suggested to me that I take a break
from Schematron and start looking at another potential project. Based on my interest/desire to expose myself
to additional metadata standards (other than Dublin Core and RDF), she
mentioned a project working with Image Collection Online. This project would help acquaint me more with
metadata and metadata mapping. DLP first
began working with other collections to help them with their cataloging and
metadata needs. Originally, DLP gave
these special collections (ex: Liberian Photograph Collections) a core set of
fields. Slowly, these collections
started asking for more specialized fields to account for their diverse
cataloging needs. While this may have
started off well, and DLP was glad to be flexible, soon the process of
customizing fields became too overwhelming and chaotic. In addition, even though a collection may
have a large amount of specialized fields, they will not all show up in the
interface, anyway. So, now, DLP wants to start moving back towards a core set
of fields. What Michelle thinks I can do
is play the part of a sort of metadata analyst.
I can first establish what the core fields are, then analyze the divergent
fields and see if any of those are actually more similar than previously thought. We could then merge those fields to allow the
collections to express what they feel is necessary, but without overwhelming
the system. Then, I would figure out
what metadata can be mapped to MODS.
One nice thing about this project is that I would be working with the
Metadata Working Group associated with ICO.
This would allow me to have the collaboration I was missing. As much as I feel dismayed about leaving
Schematron behind for now, I think this is a good direction to take
things.
Friday, October 12, 2012
Reading #6
Dalmau, M. & Schlosser, M. (2010). Challenges of serials text encoding in the spirit of scholarly communication. Library Hi Tech, 28(3), 345-359.
In 2006, the Digital Library Program at Indiana University received a grant from the state of Indiana to digitize and encode the nearly 100 year run of the Indiana Magazine of History. The project intended to provide full-text and facsimile views, improve metadata for better search and retrieval and to develop a publishing model for the journal. The digitization and encoding was conducted by a combination of in-house and outsourced personnel in coordination with several quality control guidelines. TEI was chosen for its strength in encoding texts that are literary in nature. TEI's independent header was also seen as a strength for its ability to capture bibliographic metadata. Quality control was handled manually to a small extent, but due to the limited time and budget, most of it was automated. The experience provided a few lessons for the future. It was the opinion of the program that it is best to perform semantic or structurally difficult encoding in-house. In addition, the more manual quality control is performed in advance, the more smoothly the subsequent automated process will run. The paper suggests future emphasis on guidelines and consistent communication with any outside vendors.
In 2006, the Digital Library Program at Indiana University received a grant from the state of Indiana to digitize and encode the nearly 100 year run of the Indiana Magazine of History. The project intended to provide full-text and facsimile views, improve metadata for better search and retrieval and to develop a publishing model for the journal. The digitization and encoding was conducted by a combination of in-house and outsourced personnel in coordination with several quality control guidelines. TEI was chosen for its strength in encoding texts that are literary in nature. TEI's independent header was also seen as a strength for its ability to capture bibliographic metadata. Quality control was handled manually to a small extent, but due to the limited time and budget, most of it was automated. The experience provided a few lessons for the future. It was the opinion of the program that it is best to perform semantic or structurally difficult encoding in-house. In addition, the more manual quality control is performed in advance, the more smoothly the subsequent automated process will run. The paper suggests future emphasis on guidelines and consistent communication with any outside vendors.
Thursday, October 11, 2012
Week 8: Moving On. . .
So, last week was tough. But after speaking with Michelle, we have decided that I should move on to the conceptual mapping for the TEI body. The process is essentially the same; starting with the spreadsheet, although this time I'm skipping the 'fluffy' version and going straight to the structured version. Michelle had told me that this section would in some ways be easier, but also be more difficult in others.
I think she's right. There seems to be fewer things that need to be selected and validated, but it is harder to create the logic for those that do exist. For example, there needs to be a way to check that if a note spans pages, for readability, the note should be collapsed into one page. While the easy way would just to write the assert as <report test="tei:note"/>, all that would do would be to check to see if there was a note. It wouldn't actually check whether the note spans pages. So, this is the kind of logic that I have to play around with. There is also another encoding rule that the encoder needs to remove all end of line hyphens for word splits. However, sometimes a split traverses pages. This is going to be tricky to figure out the logic for.
So far, I have completed the conceptual mapping for the TEI body and have now begun to write the Schematron with 10 asserts. So, yes for now, moving on.
I think she's right. There seems to be fewer things that need to be selected and validated, but it is harder to create the logic for those that do exist. For example, there needs to be a way to check that if a note spans pages, for readability, the note should be collapsed into one page. While the easy way would just to write the assert as <report test="tei:note"/>, all that would do would be to check to see if there was a note. It wouldn't actually check whether the note spans pages. So, this is the kind of logic that I have to play around with. There is also another encoding rule that the encoder needs to remove all end of line hyphens for word splits. However, sometimes a split traverses pages. This is going to be tricky to figure out the logic for.
So far, I have completed the conceptual mapping for the TEI body and have now begun to write the Schematron with 10 asserts. So, yes for now, moving on.
Saturday, October 6, 2012
Reading #5
XPath Tutorial. (n.d.) Retrieved October 3, 2012 from W3Schools: http://www.w3schools.com/xpath/default.asp
This W3C tutorial was helpful in providing a concise yet comprehensive review of XPath. Given my difficulties with Schematron this past week, I did a lot of readings to try and figure out the problem. This reading begins simply with first describing what XPath is. A Venn Diagram demonstrates the relationship that XPath has with other XML functions and applications, such as XQuery, XPointer, XLink and XSLT. A bulleted list also provides understanding. For example, "XPath uses path expressions to navigate in XML documents." The reading then points out that these path expressions are used to select nodes or node-sets in an XML document and look and behave similarly to computer files.
The reading then moves on to discuss the relationship of nodes. Parent, children, sibling, ancestor, and descendant nodes are explained. Even though I have a pretty good grasp of node relationships, I really appreciated that examples were included in the reading. I especially liked how the examples were pretty much consistent. If I were new to this concept, it would have really helped me get used to the XML example document, instead of changing just soon as I started understanding the idea.
I was hoping that the next section would help me figure out what was wrong with my Schematron because it was about selecting nodes. I have a feeling that because no errors were being found in the XML documents I ran against Schematron that the problem was with the Schematron not catching on to the right places in the document. However, I still don't really understand why the direct XPath is working, but the validation run through Schematron is not. I reviewed the section on selecting nodes, but I didn't find anything that helped. I feel like I am back at square one now. At the very least, I had a good review of XPath.
This W3C tutorial was helpful in providing a concise yet comprehensive review of XPath. Given my difficulties with Schematron this past week, I did a lot of readings to try and figure out the problem. This reading begins simply with first describing what XPath is. A Venn Diagram demonstrates the relationship that XPath has with other XML functions and applications, such as XQuery, XPointer, XLink and XSLT. A bulleted list also provides understanding. For example, "XPath uses path expressions to navigate in XML documents." The reading then points out that these path expressions are used to select nodes or node-sets in an XML document and look and behave similarly to computer files.
The reading then moves on to discuss the relationship of nodes. Parent, children, sibling, ancestor, and descendant nodes are explained. Even though I have a pretty good grasp of node relationships, I really appreciated that examples were included in the reading. I especially liked how the examples were pretty much consistent. If I were new to this concept, it would have really helped me get used to the XML example document, instead of changing just soon as I started understanding the idea.
I was hoping that the next section would help me figure out what was wrong with my Schematron because it was about selecting nodes. I have a feeling that because no errors were being found in the XML documents I ran against Schematron that the problem was with the Schematron not catching on to the right places in the document. However, I still don't really understand why the direct XPath is working, but the validation run through Schematron is not. I reviewed the section on selecting nodes, but I didn't find anything that helped. I feel like I am back at square one now. At the very least, I had a good review of XPath.
Friday, October 5, 2012
Week 7: Bump in the Road
This week began well, but quickly became frustrating. I finished much of the Schematron for the TEI Header and was ready to test what I had done against a couple of VWWP legacy and new texts. I was nervous, but hoping for the best. I had felt like I had taken so much time with the two different kinds of conceptual mappings and then the writing process had taken a while, so I wanted to be successful on the first try.
I pulled up one legacy and one new text in Oxygen, along with the editor version of Schematron. I decided to try the new text first. I selected the correct version of Schematron against which to validate the TEI document. Immediately, I got an error, stating that there was a problem with the TEI namespace. I spent about 4 hours searching online, Googling as much as I could to figure out what could possibly be wrong with the way I had declared the namespace. I came across a post on the TEI boards where some other people were having issues with the TEI namespace. Someone had posted another version of the namespace, so I tried it. Essentially, it seemed like it had to be declared twice. It looked strange to me, but Oxygen no longer complained about it. I ended up changing the namespace yet again, after Michelle told me that Professor Walsh uses a different version. But that was a bit later. I ran the validation again, and much to my surprise, the TEI document validated. I was in shock. I immediately knew that something was wrong, but I couldn't figure it out. I tried Googling and looking in all of the readings that Michelle had given me. There was nothing I could find that addressed the problem of Schematron not catching any errors.
I ended up meeting with Michelle later in the week and we tried playing with different options available within Oxygen. I clicked on what seemed an endless number of options, in endless combinations and nothing worked. I thought that maybe I needed to change the version of XPath I was using, but that did not work either. I also tried typing the XPath directly into the search and that worked perfectly. So, Michelle and I are thinking that my XPath is written fine and really don't know what it could be. She has promised to speak with Professor Walsh as well as writing an acquaintance of hers at Mulberry Technologies. This was definitely a frustrating week and I feel powerless.
I pulled up one legacy and one new text in Oxygen, along with the editor version of Schematron. I decided to try the new text first. I selected the correct version of Schematron against which to validate the TEI document. Immediately, I got an error, stating that there was a problem with the TEI namespace. I spent about 4 hours searching online, Googling as much as I could to figure out what could possibly be wrong with the way I had declared the namespace. I came across a post on the TEI boards where some other people were having issues with the TEI namespace. Someone had posted another version of the namespace, so I tried it. Essentially, it seemed like it had to be declared twice. It looked strange to me, but Oxygen no longer complained about it. I ended up changing the namespace yet again, after Michelle told me that Professor Walsh uses a different version. But that was a bit later. I ran the validation again, and much to my surprise, the TEI document validated. I was in shock. I immediately knew that something was wrong, but I couldn't figure it out. I tried Googling and looking in all of the readings that Michelle had given me. There was nothing I could find that addressed the problem of Schematron not catching any errors.
I ended up meeting with Michelle later in the week and we tried playing with different options available within Oxygen. I clicked on what seemed an endless number of options, in endless combinations and nothing worked. I thought that maybe I needed to change the version of XPath I was using, but that did not work either. I also tried typing the XPath directly into the search and that worked perfectly. So, Michelle and I are thinking that my XPath is written fine and really don't know what it could be. She has promised to speak with Professor Walsh as well as writing an acquaintance of hers at Mulberry Technologies. This was definitely a frustrating week and I feel powerless.
Subscribe to:
Posts (Atom)