James Tauber: MorphGNT

The beginnings of MorphGNT were seeing the functional annotation of the Friberg AGNT, and CCAT at UPenn in Beta code. An early realization in working with CCAT is that the data would be much more usable if regularized into one lemma per line: you can then use Unix command line utilities. Further investigation revealed thousands of errors (though some were systematic and hence easily fixed), including some deeper analytic ones. Building systems to generate data from scratch has been an important part of the process of identifying errors.

Much of the content of Mounce’s book (with reference to morphology) could be replicated with a single awk command on the MorphGNT data.

Helped start the Electronic New Testaments Manuscript Project in 1996: but it was too early, and people didn’t understand what it meant to put things on the web.

Much of the early challenge was simply putting Greek on the web. This led to GreekGIF, a series of images of Greek letters that enabled more readable representations. “I’m relieved to say this is no longer necessary”!

Early involvement with XML put PhD plans on hold. Around 2002, started working on automatically generating inflected forms (initially driven by Mounce’s classes). In 2004, released v5 of MorphGNT, now with Unicode. zhubert.com was the next major development in the use of MorphGNT, and a milestone event. Since then, been working on other corrections (which haven’t yet led to a new version), started PhD studies, and also started collaborating with Ulrik Sandborg-Petersen. A current interest is splitting the text from the analysis: you really need an additional field to identify the analysis for a particular form to eliminate all ambiguity. Also working on splitting off the lexicon: morphological analysis, semantic domain, and other attributes, as well as standardizing lexical representations.

The myth of vocabulary coverage: “the 100 most common words account for 66% of the text”. But these words typically don’t have information content, and you really need about 95% of the words in a verse to understand it (according to learning theorists). Really, we need a new kind of grader reader that’s optimized for early comprehension, clause-based, form (rather than lexeme) based, and gives context in English. Progressive substitution of Greek phrases into an English text helps provide a gradual transition.
Web site: http://morphgnt.org.