Thursday, June 19, 2003

Here's the algorithm for the idea i sketched in Using Strongs Numbers for Reducing Lexical Ambiguity. There are three sources of information:

  1. The Greek original of a verse, which relates a set of Greek words to a unique index (the verse reference). This also provides Strongs numbers for them, which has the additional benefit of getting you past the morphology.
  2. An English translation of the same verse, which in turn relates a set of English words to the same unique index. The verse index provides coarse-grained, many-to-many alignment, and limits the search space significantly to only a dozen or so possible translations. In practice only the subset of these words that bear significant content are likely to be of interest (not function words).
  3. The Strongs reference itself, which provides English translations for the Greek words.

The version of Strongs available from Crosswire (which distributes a great Windows Bible program, by the way) has 5624 entries, like this word (#1561) generally translated as "expectation":

1561  ekdoche  ek-dokh-ay'
from 1551; expectation:--looking foreign

Crosswalk's version has something more complete, but i haven't found the sources yet (maybe they're not publically available):

ekdoche: the act or manner of receiving from
  1. reception
  2. succession
  3. interpretation
  4. expectation, waiting

The approach is to go through a English translation of a verse and identify the content terms (call these TE). Then find the Strongs definitions for all the words in the same verse (TS). Using (insert magic algorithm here), choose the element of TS that best matches the definition. In the simplest case, the English word itself will occur in only one definition for TS. If it occurs in multiple definitions, i'll need a voting scheme (if it matches more than once in some definition, it's more likely to be that one), or weight the Strongs definitions by their a priori probability. This wouldn't work at all without constraints (for example, how do you rule out paradoxos, meaning "contrary to expectation"), but the verse reference gives you that. You don't have to compare against all of Strongs, only the dozen or so entries for that specific verse.

This all looks good on paper, but The Proof Is In The Programming (i should trademark that!).


11:55:00 PM #  Click here to send an email to the editor of this weblog.  comment []  trackback []
NYTimes Technology section has A Blogger's Big-Fish Fantasy on bloggers and their desires to boost their readership. I got my own small bounce last month from a chance mention ("Off Topic, But Too Good to Miss") by Tim Bray who stumbled across my site through Technorati: my normal daily visits (a modest 15 or so these days) spiked up to 90. I thought a little about other schemes to try to gather more eyeballs, but i've settled back to writing mostly about things that interest me. If they interest other folks too, that's great: nobody likes to write to the ether. But i'm not going to focus on building readership for its own sake. I do make a point of reciprocal blogrolling for blogs in my general areas of interest, so let me know if you want to share.
11:21:49 PM #  Click here to send an email to the editor of this weblog.  comment []  trackback []