James Strong, IT Heavyweight

The author of Strong’s Exhaustive Concordance was serious about data long before anybody had conceived of Information Technology as an occupation. My print version of Strong’s has 1390 pages of small font, three-column excerpts from Bible verses, indexed by each and every single word in the King James Version. Even function words like “the” are included, though they’re presented in a compressed format that just references the verse (10 pages for “the” in an 18-column format!).

In contrast, the preface is a single page: apparently Strong felt the data spoke for themselves. One additional page provides directions and explanations (I wonder what Tufte would say about the presentation). Everything else in the main concordance is just data.

I’ve been working on converting Strong’s Greek Lexicon of the New Testament to an XML format, starting from the text that’s incorporated in Crosswire’s excellent Sword Project. I’m not quite there yet, but i’ve been impressed at Strong’s intuitive grasp of the value of structured data, even though in a pre-computer era he could only express it via typography.

[read the rest, which didn’t quite turn out the way i had hoped]

Setting up Other Hyper-concordances

One question that came up in our discussion with Mike Perez was, what would it take to create a hyper-concordance for other languages and/or translations? My first response was that it would be pretty simple (programmers are eternal optimists). But even having thought more carefully, i still believe it. The essential elements:

  • a Bible text in some structured format (OSIS preferred, of course) where it’s possible to identify the verses and the words
  • a way to map words back to their dictionary forms. If this is imperfect, or even completely missing, it doesn’t stop you: it just means some forms that ought to be grouped together won’t be. For my RSV hyper-concordance, i just created a text file by hand, mapping e.g. “brethren” to “brother” and “broken” to “break”. For languages with richer morphology than English, you might need a lot more smarts (or a lot more manual effort)
  • a program to create a index mapping each unique word (in its dictionary form) to the verses it occurs in
  • a program to generate an HTML page for each such word=>verses index
  • if you want a master index showing all the words, a simple program to collect all the indexed terms and hyperlink them to their respective pages
  • i left out some words because their pages would have been excessively large: this is just an optional practical matter, though. To do that you need a stopword list: i created mine by sorting all the words by frequency and cutting everything that occured more than 100 times. Then you use this to filter which words you index.

Much as i’d love to take credit for my brilliant programming, there’s no real difficulties to any of this. But the proof will be to put my money where my mouth is, and create another one. Stay tuned.

Setting up Other Hyper-concordances

One question that came up in our discussion with Mike Perez was, what would it take to create a hyper-concordance for other languages and/or translations? My first response was that it would be pretty simple (programmers are eternal optimists). But even having thought more carefully, i still believe it. The essential elements:

  • a Bible text in some structured format (OSIS preferred, of course) where it’s possible to identify the verses and the words
  • a way to map words back to their dictionary forms. If this is imperfect, or even completely missing, it doesn’t stop you: it just means some forms that ought to be grouped together won’t be. For my RSV hyper-concordance, i just created a text file by hand, mapping e.g. “brethren” to “brother” and “broken” to “break”. For languages with richer morphology than English, you might need a lot more smarts (or a lot more manual effort)
  • a program to create a index mapping each unique word (in its dictionary form) to the verses it occurs in
  • a program to generate an HTML page for each such word=>verses index
  • if you want a master index showing all the words, a simple program to collect all the indexed terms and hyperlink them to their respective pages
  • i left out some words because their pages would have been excessively large: this is just an optional practical matter, though. To do that you need a stopword list: i created mine by sorting all the words by frequency and cutting everything that occured more than 100 times. Then you use this to filter which words you index.

Much as i’d love to take credit for my brilliant programming, there’s no real difficulties to any of this. But the proof will be to put my money where my mouth is, and create another one. Stay tuned.

Chewing on “Scripture Engagement”

Donna and i had the privilege this week of having lunch with two people deeply involved in using innovative technology to expand use of the Bible. Mike Perez helps lead a technology initiative group of the American Bible Society, including ForMinistry.com, which is a portal site for churches and other ministries. Steve deRose is the chairman of the Bible Techologies Group, which is sponsoring the OSIS initiative.

We had a wide-ranging discussion about things that might help the ABS with their mission of “Scripture engagement.” That’s a nice tight phrase with very broad scope: in particular, what’s different in the Digital Age about how we present God’s Word to people? This is what Blogos is all about, and it’s gotten me thinking in a lot of different directions, many of which will be the focus of subsequent posts. A sampler:

  • in the pre-Internet era, printing bound together the costs and values of content, production, and distribution of Bible information. That’s already changing, and content is becoming a completely separable value from binding and printing. How does that change the ministries of the Bible Societies and their traditional roles in Bible printing and physical distribution?
  • how can weblogs help churches in their ministry? what new opportunities does RSS open up?
  • what would it take to do hyper-concordances for other translations and languages? what new kind of Bible search paradigms might this lead to?
  • what other ways might you present the content of Scripture, moving beyond print to hyperlinked digital media that include sounds, images, and other approaches to organizing information than the traditional book/chapter/verse divisions?

We also talked about my interests in semantic annotation of the Bible, and i’m hoping we’ll find some ways to work together toward this goal.Stay tuned!

Hello to OSIS Visitors

I’m pleased (and a little surprised!) that the Open Scriptural Information Standard (OSIS) folks put a description of the Hyper-concordance on their front page. This is by far the most popular entry point to my site: i guess that means some people find it useful. But it’s always nice to get a wave from people you respect. The OSIS folks are working hard to create a solid standard for XML markup of the Bible and related literature: in a few years, we’ll probably wonder how we ever got along without it.  The Hyper-concordance project would have been a lot harder if i’d had to first figure out how to parse some unstructured version of the New Testament text. Having the RSV available in OSIS format made it the proverbial SMOP.

If you’re visiting from the OSIS site, welcome! And if you have a serious interest in advancing the digital use of Scripture, take a look at the SemANT idea. This is really the core vision behind semanticbible.com. I’m still in the early definition stages, but it’s a big vision that will take enormous effort, so i’ll be looking for help once the plan is a little clearer.

By the way, if you use the Hyper-concordance via a bookmark, please make sure you’ve bookmarked the new version at semanticbible.com (or skip the prose and go straight to the index). The older one at http://www.semanticbible.com/blogos/ will be going away someday.

Images for SemanticBible

A picture named WilliamHolmanHuntLight.jpgI’ve been looking for some images to improve the cosmetics of SemanticBible.com. The conceptual space is enormous. Illuminated manuscripts show the idea of “opening” the text to new understanding, and nicely tie in the textual aspects. There are of course a multitude of Biblical metaphors, and illustrations of these passages abound as well.

Two current favorites: William Holman Hunt’s “The Light of the World” (here’s a closeup of the lantern) is a classic from the 19th century. Ewangely Buoch’s “the Sower” is somewhat odd in its juxtaposition of Jesus and a European agricultural setting around the 16th century, but i like its simple depiction of the profound truth of “the seed is the Word”. Feel free to suggest others.A picture named EwangelyBuochSowersmall.jpg

I found a nice collection of art links, indexed according to Biblical themes and characters, at textweek.com.

Using RDF for Bible Bios

Here’s a possible early use of RDF to aid Bible study. My Life Application Bible (NIV, Tyndale) has about 100 “personality profiles”, brief descriptions of a particular Bible personality. There’s a few paragraphs of prose, some bullets on strengths and accomplishments, lessons from their life, vital statistics, and key verses. The prose also incorporates references to other people and places.

It wouldn’t take a lot of work to code some of these up in RDF in a way that would link them together through people and places, and tie them back to the verses they’re mentioned in, producing a small-scale Semantic Bible Web. Would this be useful? That’s the $64k RDF question …

Using RDF for Bible Bios

Here’s a possible early use of RDF to aid Bible study. My Life Application Bible (NIV, Tyndale) has about 100 “personality profiles”, brief descriptions of a particular Bible personality. There’s a few paragraphs of prose, some bullets on strengths and accomplishments, lessons from their life, vital statistics, and key verses. The prose also incorporates references to other people and places.

It wouldn’t take a lot of work to code some of these up in RDF in a way that would link them together through people and places, and tie them back to the verses they’re mentioned in, producing a small-scale Semantic Bible Web. Would this be useful? That’s the $64k RDF question …