Here's the algorithm for the idea i sketched in Using Strongs Numbers for Reducing Lexical Ambiguity. There are three sources of information:
- The Greek original of a verse, which relates a set of Greek words to a unique index (the verse reference). This also provides Strongs numbers for them, which has the additional benefit of getting you past the morphology.
- An English translation of the same verse, which in turn relates a set of English words to the same unique index. The verse index provides coarse-grained, many-to-many alignment, and limits the search space significantly to only a dozen or so possible translations. In practice only the subset of these words that bear significant content are likely to be of interest (not function words).
- The Strongs reference itself, which provides English translations for the Greek words.
The version of Strongs available from Crosswire (which distributes a great Windows Bible program, by the way) has 5624 entries, like this word (#1561) generally translated as "expectation":
1561 ekdoche ek-dokh-ay'
from 1551; expectation:--looking foreign
Crosswalk's version has something more complete, but i haven't found the sources yet (maybe they're not publically available):
ekdoche: the act or manner of receiving from
- expectation, waiting
The approach is to go through a English translation of a verse and identify the content terms (call these TE). Then find the Strongs definitions for all the words in the same verse (TS). Using (insert magic algorithm here), choose the element of TS that best matches the definition. In the simplest case, the English word itself will occur in only one definition for TS. If it occurs in multiple definitions, i'll need a voting scheme (if it matches more than once in some definition, it's more likely to be that one), or weight the Strongs definitions by their a priori probability. This wouldn't work at all without constraints (for example, how do you rule out paradoxos, meaning "contrary to expectation"), but the verse reference gives you that. You don't have to compare against all of Strongs, only the dozen or so entries for that specific verse.
This all looks good on paper, but The Proof Is In The Programming (i should trademark that!).
11:55:00 PM # comment  trackback 
11:21:49 PM # comment  trackback 
Copyright 2004 sean boisen
Theme Design by Bryan Bell