God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
April 26th, 2007

In Memoriam: Karen Spärck Jones

I note with sadness the passing earlier this month of Dr. Karen Spärck Jones, a British computer scientist and research pioneer at the intersection of information retrieval and natural language processing.

In the mid 90s, i was a younger researcher at BBN Technologies doing government-funded work on information extraction. At that time, DARPA’s TIPSTER program was the key American funding vehicle for much research in both information retrieval and information extraction. Consequently, many important figures in the field came to TIPSTER conferences, either to report on funded research or as observers. I recall several positive technical interactions with Prof. Spärck Jones, who graciously treated me, not as the bit player i actually was, but as a potential colleague with ideas to share. Her accessibility despite her well-established (and well-deserved) reputation made an impression on me at the time.

April 26th, 2007

Continuing Web Site Issues

If you can read this, you’re doing better than most. Frustratingly, my web site and blog are still fouled up after three weeks following a mistake by my hosting service. I’ve been holding off on more blog posts waiting for them to resolve things, but i’m losing hope. I wish i could say the worst is past, but now i’m a skeptic.

I’ve had good experiences for the past few years with Lunarpages, my provider, but this is just maddening: tech support says “it’s a sys admin issue”, and i apparently have no recourse other than waiting for them to fix things according to whatever timetable they choose. I’d jump ship at this point, but rebuilding elsewhere and administering the databases and third-party applications is more work than i can undertake right now, so i guess they own me for the time being.

Consider this a big negative recommendation for Lunarpages as a hosting service.

April 19th, 2007

Name Weights for Biblical Characters, Take 3

(originally posted 2007-04-02, but then a victim of repeat web site hosting problems, i’m trying again …)
Looking further at the numbers i previously discussed for estimating the ranked importance of Biblical characters by how often and where they’re mentioned, there’s a refinement of the dispersion factor that i like better. It came from comparing the rank of Ishmael.1 (Abraham’s son by Hagar) to Ishmael.2 (who assassinated Gedaliah, the Babylonian-appointed ruler of Judah discussed in 2 Kings 25 and Jer 40-41). In my first ranking, Ishmael.2 (who i didn’t even remember) was ranked slightly higher than Ishmael.1, contrary to my intuitions (and those of every Bible dictionary i’ve checked, measured by the number of sentences used to describe each).

Quantifying your ideas gives you a way to measure how they match your intuitions, and, when they don’t, think about why. In this case, it was immediately obvious: though Ishmael.2 is mentioned a few more times, those mentions are highly concentrated, in a total of 3 chapters (across 2 books). Ishmael.1 is also mentioned in two books, but in 6 different chapters. By incorporating the number of distinct chapters a name occurs in (just a more fine-grained measure of dispersion), their rank comes out more like what i’d expect. Specifically, given weights of

  • .6 for frequency (as before)
  • .2 for chapter dispersion
  • .2 for book dispersion (so the total dispersion weight is still .4, just refined a bit)

Ishmael.1 comes out at #257, versus #285 for Ishmael.2. Here’s the top 50 chart using this metric:

Top 50 Biblical Characters by Frequency and Dispersion (medium size), Take 3

Here’s a graphic to show more clearly how the rankings change with this metric. Red markers above the Blue line are names that have moved up in rank with the revised metric: for example, John the Baptist (John.1) moves from #50 to #30, which seems appropriate. Those below the line are ranked less highly under the new metric (e.g. Jesse.1, who moves from #18 down to #24).
Biblical People weights, with and without chapter dispersion (medium size)
Some other factors that might improve the estimate even further (and remember, this is just an estimate):

  • As suggested above, external sources (like Bible dictionaries) are a rich and quantifiable source of judgments about importance: the more words or sentences used to describe an individual, the more important they’re likely to be. By using several dictionaries, you’re not held captive to the biases of an individual work or editorial slant. The key feature here is making the connection between the described individual (often in a numbered paragraph) and the Biblical character: but given a map from individuals to passages (which we have), that ought to be possible with a bit of programming at better than 90% accuracy.
  • Though the whole of Scripture is authoritative and inspired, there’s a sense in which certain sections are broader in their implications. For example, anyone mentioned in the first chapters of Genesis should probably get an extra measure of importance: these are the foundational stories of Hebrew and Christian history (and this is another way in which Ishmael.1 is surely the most important one).

Postscript: after drafting the above, but before publishing (making and uploading the charts is still a bit painful), i saw this post at OpenBible.info, suggesting some alternative approaches (thanks for playing!). The first (rank some chapters higher than others) is similar in spirit to my second suggestion above: we both agree, as i’m sure do others, that some parts of the Bible ought to add more weight to this metric. The second suggestion there also proposes a valuable refinement, using association with important people (approximated by co-occurrence in a chapter) to lend importance. I’ll look into incorporating that figure as well.

|