Wu/Tan: Tree-based Approaches to Biblical Texts

Spoke last year about creating trees: focus this year on how to use the trees. Doing tree alignment to provide a tool to support translation: just released the print version of a new Chinese NT. Once you’ve done tree alignment, you can use that as a metric of how dynamic a translation is: the higher the links, the less word-for-word. This linked data supports other applications.

Translation memory is typically word-based: with aligned trees, it can be chunk-based (word, phrase, or clause) or relation-based (pairs of words in head-modifier relations).  This translation memory gives translators access to how particular phrases have previously been translated, and concordances for how they’re used in their context.

Probabilistic Hebrew synonym finder: existing synonym dictionaries are incomplete, and the whole notion of synonym is continuous, not discrete. Two words are synonymous when their semantic space overlaps. The aligned trees define an equivalency space: all the words that are used to translate a word are semanticly similar. Degree of synonymy is basically the intersection of the sets divided by the union of the sets, scored using joint probability.

You can also look for similar verses: those containing clauses which aren’t identical but have more or less the same meaning. Clause-level similarity is the most useful view.