norma is currently used to transform "a variety of inputs into normalized, tagged, XHTML (with embedded/linked SVG and PNG where appropriate)." I want to discuss to what extent this should be done. One far end is only changing XML soup to tagged XHTML, the current behaviour, and the other far end is probably something like processing the meaning of the text itself.
I've created a scale with usecases below for a better view on where we should draw the line. When is it changing the content, and when is it changing markup? That's the main point: not to talk about technical difficulties and impossibilities, but to see what things
norma should be able to do.
§1:For example changing P. abies into Picea abies. This must be done from the context of the entire document, otherwise it's impossible to now which genus starting with a "P" that has a species named "abies" in it (there are at least 4, which is abnormally few). It would take a lot of work with custom programs to insert
- Far end: Only changing XML/XHTML tags etc. (current behaviour)
- Having reference lists as
<ol>, even when the article hasn't (issue #48)
- Unshorten species names §1
- Normalise table layout §2
- Other far end: Normalise grammar and correct typo's and grammar mistakes
ami2-species results back into the XHMTL.§2:Example: When tables are cut apart, stuck together or changed in other ways specifically for layout purposes (not issue #57; that's about preserving things stated explicitely in the XML)