Friday, September 28, 2012

Week 4-6: Authoring Schematron

Earlier in the week, I wrapped up the process of transferring the data from the 'fluffy' version to the structured version.  After ensuring that we wanted to create two separate Schematrons, one for encoders and one for editor's, I began indicating in blue on the spreadsheet those fields intended for the editor Schematron.   I now have 25 asserts that are intended for the editor Schematron and 36 intended for the encoder Schematron. Many of the editor asserts are simple checks, such as tei:author or tei:date.  These are just for the editor to make sure that the encoder has not accidentally erased the important information. 

Then, on Wednesday I started the process of authoring the Schematron for the TEI Header.  I began writing in Oxygen as an XML file, but saved the document as a .sch, for Schematron.  I began writing the encoder Schematron first, since in my mind would be the first step in the workflow of checking the VWWP TEI-encoded documents.  I started by declaring the namespaces, one for the Schematron (<schema xmlns="http://purl.oclc.org/dsdl/schematron">) and one for the TEI (<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:ns="http://www.tei-c.org/ns/1.0">).  Next, I included the title and a one line description of the intention (This schema will test the legacy and new TEI-encoded texts of the Victorian Women Writers Project).

Then, I began authoring the Schematron.  Taking what I had learned from all of the readings, I began with declaring the pattern.  The pattern is a set of related rules that are used to test the XML document.  It is important to also assign an identifier to each pattern.  I named the pattern identifier based on its function.  So, if the pattern contains a rule that checks for the template value of the responsibility statement, then I would assign an abbreviated name as that pattern's identifier (respStmttemplate-valuecheck).  Next, I included a short paragraph that explained the purpose of the assert.  I partly did this so I could remember later on the purpose of each assert, especially if it wasn't written out completely.  I also did it so that other people could understand my intentions after I have finished the project.  Next, I wrote the rule.  In Schematron, the rule contains one or more tests (called asserts or reports) that apply in a given context.  The context is very important in this situation.  The context is the XPath that has been written to determine at which element the test or series of tests should be performed.  Hands On Schematron by Mulberry Technologies explains, "For every element in the document described as the context of a rule, the rule's tests will be made with that element as context."  So, once I had written the part of the Schematron that declared the context, I then needed to write the tests.  The tests can either be asserts or reports.  Asserts are useful when you want to know something is not true.  For example, I want there to be an author element in this context.  If it is true, fine.  If it is not true, let me know.  Reports can be used to locate elements of interest or also to check for the existence of things of interest.  If you are looking for '2010' as the year of encoding, and write the check properly, if that year does exist as a year of encoding in your document, you will get a message declaring it to be true.  This part took me a little while to figure out, but Hands On Schematron helped clarify the differences.  They write, "report means 'ho hum, show me where this is true' and assert means 'it better be true, or else'".  While, I kind of disagree with the "ho hum", it did give me a better idea of the positive versus negative aspects of the different kinds of tests.   Another way to think about it is that reports are more like warnings for the user, whereas asserts are more like errors. 

Once I completed the encoder version, I moved on to the editor version.  This process was essentially the same, except the Schematron was shorter because there were fewer checks to make.  For both versions, I began keeping a list of questions I had for Michelle and would either email them to her or make sure to ask her during our meetings.  Also, I realized during this process that all of the work with the spreadsheets really helped later on.  The structured asserts spreadsheet helped especially in creating the logical structure in my mind.

No comments:

Post a Comment