Friday, September 7, 2012

Week 3: Building Schematron- The Foundation 2.0

This week I began transferring all of my work from the 'fluffy' version of the conceptual mapping to the structured version. I found this to be pretty straightforward, now that I knew all of the elements and attributes so well, and what exactly I would be testing for each one.

I liked this part because in addition to the context (XPath) I began writing the tests.  This is where I felt like I began to really create and control the kinds of conditions I would like Schematron to check for.  It was also at this point that I realized how the kinds of things the Schematron will check for are quite varied. 

There are tests that check that an encoder has replaced a template value with an actual value. Example: tei:name[@xml:id='encoderusername'].  This will test that the encoder has replaced the template value of xml:id with his or her actual username.  There are other template values that the encoder could miss and Schematron needs to check for them.  Example: tei:title='$Title of introduction'.  This checks that the encoder hs replaced '$Title of introduction' with the actual title of the introduction.  These are all very important checks that other schemas cannot check for. 

The Schematron will also check for patterns and consistencies.  For example, the publication statement (<publicationStmt>) includes important information about the encoding, such as which institution completed the coding, the year of the encoding and a short paragraph about copyright.  Part of the publication statement is the element <idno>.  It is an identifier that is used to identify an object, in this case the particular XML document that is being encoded.  The <idno> is included in the TEI root and must match the <idno> in the publication statement.  So, Schematron needs to alert the encoder or editor if the two values do not match.  Example: tei:idno='tei:TEI[@xml:id]'.  This is used to test that the idno value matches the TEI root xml:id value. 

During our weekly meeting on Wednesday, September 5th, Michelle and I discussed the possibility of writing two Schematrons: one for the encoder and one for the editor.  There are several checks that only need to happen on the editor's side.  One very obvious editor check will be making sure the editor has entered his or her name and assigned that element's xml:id as his or her username.  Example: tei:name='$Editor's First and Last Name' and tei:name[@xml:id='editorusername'].  There is no reason for this check to be performed while the encoder is still working, so it makes sense to create another Schematron that only the editor will need to use.  In this meeting Michelle and I also discussed the possibility that the pseudonym check and prosopography check should also be part of the editor-only version of Schematron.  We'll both think about it the next week and go from there.

These weeks' reading was an older article from 1998, called 'XML and the Future of Digital Libraries'.  It was fascinating reading this article because it was filled with wonder, excitement and also apprehension about this new language.  Yes, at one point I had no idea what XML was, and learning the very basics was exciting, but scary.  But I always accepted it, because by the time I started learning it, it was already so established.  I enjoyed this article and think it was useful for putting the metalanguage into context. 

No comments:

Post a Comment