Thursday, August 23, 2012

Week 1: What is Schematron?

So, this week I began my internship at the Digital Library Program here at Indiana University.  After meeting with my supervisor, Michelle Dalmau, it was decided that I would undertake the Schematron project to improve the Victorian Women Writers Project text-encoding workflow.  Schematron is a second-level of validation that is used to check the quality of XML data.  Mainly, it checks for the presence or absence of patterns in the XML document as well as double-checking the validity of encoder entered data.  One thing I will have to keep reminding myself is that Schematron does not check what the schema already checks. 

I had my first orientation meeting with Michelle and then I was put to work learning about Schematron.  She gave me a binder, called Hands On Schematron and some other readings that would help aquaint me with this validation language.  Everything was a bit overwhelming at first.  I feel like I am so new to this world of what I used to call "techy things" that I'm still a bit in a bit of disbelief that I'm even doing these things.  However, this is important to me.  This feels right, so I knew I just had to dive right in.

A few things immediately stood out to me about Schematron that began to answer that question of "what is Schematron?"   It is an XML vocabulary.  So, it is in itself an XML document.  The assertions (assert and report elements) are XML elements.  But unlike DTD it cannot be used to describe the structure of a document and it does not manipulate data in any way.  However, the beauty of Schematron is that it can express constraints that other XML-based languages cannot.  For example, a validator such as the W3C validator can assert that the list element must contain an item element when listing LCSH or MLA keywords, but it cannot assert that the encoder must enter at least one subject heading or term in their document.  Schematron can check for this. 

So, this week was basically just me sitting at my desk reading over these manuals and papers.  It's hard for me to picture what work I'll be doing next, because I really need to look at the VWWP encoding guidelines and some completed XML documents in order to really know where to start.  I'm looking forward to start mapping out all of the elements, attributes, values and patterns that need to be checked.

No comments:

Post a Comment