I’ve finally started some real work on the idea in “An Algorithm for Mapping to Strongs Numbers”. The initial need is a structured version of Strong’s, indexed by number, with the English definitions accessible. I was able to get the text out of Crosswire‘s excellent Sword Project, which i highly recommend, using their mod2imp utility. Now i just have to write a little Perl to parse the structure and produce the XML, then i’ll be ready for some more integration.
An additional idea this morning was to weight alternative mappings by 1/frequency of terms in the NT corpus: this Zipfian manuver has the effect of giving more strength to infrequent terms, under the intuition that they’re likely to be more distinctive and hence less confusable.