God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
January 25th, 2010

BibleTech:2010 Talk – The Logos Controlled Vocabulary

The program for BibleTech:2010 has been up for a couple of weeks now, and i’ve been delinquent in failing to point that out. We’ve got a full roster of really interesting talks that span the gamut from friendly warm technology to hard-core geekishness: Bible translation, social media, Biblical linguistics, mobile computing, preaching, publishing, tweeting, and more. And this year, it’s in San Jose, CA: i’m hoping that will open up attendance to some folks who have the misfortune to not live in the beautiful Pacific NW. The dates are March 26-27, 2010.

I’ll be giving two talks this year: here’s my abstract for the first one, on the Libronix Logos Controlled Vocabulary.


Dozens of books provide terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other subject-oriented reference works. However, the terminology used varies between books, authors, and publishers, and doesn’t always include all the terms a user might employ to find information.

The Libronix Logos Controlled Vocabulary (LCV) organizes content from multiple Bible dictionaries to integrate information across the Logos library. As a controlled vocabulary, the LCV identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV (shipping now with Logos 4) comprises some 11,100 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future. This includes some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

Update: we’ve decided in general to retire the “Libronix” name for Logos technologies, so i’m trying to get on board by starting to call this the Logos Controlled Vocabulary.

November 2nd, 2009

Logos 4 Launches Today

I’m thrilled to announce that we’re releasing Logos Bible Software 4 today. This is a complete rewrite from the ground up of the best Bible study software on the planet, so that makes this an exciting day in my book.

Logos 4 sports an entirely new interface to make it easier than ever to find what you’re looking for and keep your study space organized and effective. There’s a wealth of new, visually oriented resources, and better controls for working through the enormous space of resources Logos makes available. There’s even an iPhone app for no extra charge!

That’s the marketing view (and i stand behind it). But this means much more to me on a very personal level. It’s been almost 3 years since i came at Logos, and this will be the first time most of my work has seen the light of day. Specifically, Logos 4 contains the work of my colleagues and me in several new areas:

  • Biblical People, which organizes information about the 3300 individuals, groups of people, and deities named in the Biblical text. It includes a comprehensive list of references, their family relationships, links to dictionary articles, and links to related items. It also includes family tree and story-based diagrams. And everything is hyperlinked.
  • Biblical Places includes all the same kinds of information for 1200 named places from the Bible: cities, regions, even geographic features like rivers and mountains. Along with the data, there are 60 new high-resolution maps commissioned by Logos and covering the major Biblical events, as well as a mega-map that shows all the places together.
  • Biblical Things describes the physical objects of the Bible: animals, plants, body parts, clothing, food and drink, and much more, as well as specific items like Noah’s ark and Goliath’s sword and weights and measures. There are more than 1000 objects here, which also bring together thousands of images from across the library.
  • There’s also a new collection of high-resolution infographics illustrating different aspects of the Biblical world (and i’m extra proud that the bulk of this work was managed by my wife Donna)
  • In additional to regular word search (which is much faster than ever), under the hood is the Libronix Controlled Vocabulary (LCV), working to organize 11,000 different subjects in the Biblical studies literature and coordinating information across the library.

So if you’ve been following my posts on the Bible Knowledgebase … well, now it’s here. I can’t overstate how important i think this is: this is quite literally the first time in the centuries-old history of Biblical studies that this information has been made available in this way. The LCV isn’t quite as visible (yet), but it’s also an important organizing feature that will continue to grow in power going forward.

I hope you’re catching my sense of excitement about these new resources (and this says nothing about all the hard work of my dozens of colleagues in other areas). I hoped i’ve piqued your interest to learn more about Logos 4. It really is a watershed event in Bible software.

Obligatory disclaimer: i work for Logos and highly value what i do there. So i’m not the least bit objective about this. (more detailed disclosures)

November 1st, 2009

Bible Chatbots

Suppose you had a database listing authorship and reported speech in the Bible, so that, for each set of words, you know who said or wrote them (the ESV folks did this using Amazon’s Mechanical Turk a few years back, and Jim Albright’s Dramatizer has similar data embedded in it). I assume the speakers have standardized identifiers.

Now imagine a matching algorithm (there are lots of candidates out there) that, when provided with either a question or a list of words, and optionally a speaker, retrieves passages that best match the input.

Example: “why does God allow evil?” might return

  • Eliphaz the Temanite: Job 15:14-16
  • the woman of Tekoa: 2 Sam 14:14
  • the apostle John: 1 John 3:11-17

Querying about “what about God and evil?” with speaker=Jesus might (in the best case) give answers like

  • Matt 5:45
  • Matt 12:35

Apart from how accurate such answers might be (that depends on the sophistication of the matching algorithm), you’ve now got the engine for a chatbot that gives Biblical “answers” . Aside from perhaps being an interesting hack, would this be useful? Lazyweb, are you listening?

June 16th, 2009

http://ref.ly for Bible References

My colleagues at Logos have launched http://ref.ly, a URL shortening service for Bible references: see this blog post. It provides the convenience of TinyURL (turning long unreadable URLs into something much more manageable), but unlike that service also provides readable, understandable content. Once you get past the prefix, you won’t have any trouble figuring out what verse http://ref.ly/Mk4.9 is referring to.

If you’re a Twitter person trying to shoehorn your message into 140-character tweets, you’ll like the fact that this gives you a brief and unambiguous way to both specify a Bible reference and link to the content behind it (the references resolve to the actual verse text at bible.logos.com). Since addressability matters, this is a good thing.

But it has precisely the same utility even if you’re not a Twitterhead (i’m not):

  • it clearly marks a string of characters as a Bible reference
  • it also normalizes the reference into a form that can be automatically processed

While it’s not quite a microformat, it’s really only a small step away from things like bibleref. In particular, if lots of people start using ref.ly references, it will be possible to process that content and understand things like what verses are most popular.

What’s more, editors that recognize and automatically link URLs (like MS Outlook for HTML-based email, and MS Word) will now automatically make Bible links for you (like RefTagger does for blog posts), as long as you’re willing to tack on “http://ref.ly/” and live with the slightly non-traditional format. You don’t need to know anything about how to make a hyperlink in HTML: just a little extra syntax (14 characters, to be precise) moves these references toward much greater usefulness.

May 22nd, 2009

The Most Important Verses? It Depends What You Mean

The title of this post is a deliberate take-off from a recent post at OpenBible.info entitled “What Are the Most Popular Verses in the Bible? It Depends Whom You Ask”. That post combines data from an earlier ESV analysis of search results, TopVerses.com, a BibleGateway (internal) study, and OpenBible data to present a list of 278 verses, all of which occur in the top hundred of at least one source’s “top 100″ list. It’s interesting to see both how much disparity there is (only 13% occur in at least three of the four lists), but also how uneven the distribution is. As one commenter points out, it’s somewhat surprising that there are no verses from Revelation, and Old Testament narrative in particular is largely absent except for Genesis. John’s gospel has about as many popular verses as all the other gospels combined: there are only four verses from Mark (two of them from the often-questioned ending). Less surprisingly, perhaps, there are none from the shortest NT books (Philemon, Jude, 2-3 John). Altogether it’s an interesting study.

The larger question this raises for me is how we might come up with a comprehensive, global score for verses to indicate their importance for a variety of purposes. As the OpenBible post suggests, this depends both on what the source of the data is, but also on what your purpose is and what you mean by “important” (which is certainly different from “popular”, though not completely unrelated).

One useful purpose is ranking verses to present them in response to searches: TopVerses.com is explicitly organized this way, as indicated in this news article about the site. They don’t go into much detail about how they gathered their data, though the scope (37M references scoured from the web) is impressive. But there’s a subtle disparity here: their data is based on counting mentions (citations) in published web pages, but their use case is prioritizing search results, and these may be out of sync. The fact that a given verse is frequently published on the web doesn’t necessarily mean it’s the one you want at the top of the list when you’re doing a word-based search, for example. The other three sources seem perhaps better matched to ranking search results, since they’re derived from searches themselves.

Another key hitch is these endeavors is how to handle range references, both in processing source data and (for search purposes) in handling queries. For example, many Bible dictionaries frequently reference ranges of verses, sometimes extensive, multi-chapter ones. If you’re going to count these, you need to think carefully about how you do the counting so you don’t introduce bias (or, better, you select the bias that’s best suited to your purposes).

For example, in the TopVerses.com ranking John 3.1 is #26, despite the rather plain descriptive content with little obvious spiritual impact.

Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council. (John 3.1, NIV)

While i can’t be sure, i strongly suspect this high rank is an unintended consequence of  dis-aggregating ranges and whole chapter references from John 3. In fact, scanning top verses by chapter from John, the first verse in each chapter is very often the highest or second-highest ranked, and near always among the top ten. This probably says more about the counting methodology than the significance of those verses in particular. The Bible Gateway study focuses on ranges of no more than three verses to explicit mitigate this problem.

Other Measures of Importance

Moving from popularity to importance, i can imagine several different factors that might be combined to produce a more general importance score:

  • citation frequency (based on some corpus). In the TopVerses.com approach, these are web pages, which provides a very large set of observations. A number of other digital text collections would also suit this purpose, and even allow segmentation by genre: for example, you get a very different ranking from the Anchor Yale Bible Dictionary compared to Easton’s (and neither have John 3.16 at the top of the list). See below for more about this.
  • search frequency, the basis for the other three sources in the OpenBible.info post. This could be refined further given data on follow-up activities. For example, depending on your application, verses searches whose results are then expanded into a chapter view or followed to the next verse might get a boost compared to those with no further action (this seems like a variant of “click through” rates used in search engine advertising)
  • content analysis (context-independent): this could have several different flavors.
    • word count: though John 11:35 gets mentioned more than you’d expect precisely because it’s the shortest verse in the (English) Bible, in general longer verses are more likely to be important. This could be refined further given a metric for important words (but now we’ve introduced a new problem: where does that data come from?), which could be used for weighting the counts.
    • We could do even better if, instead of counting words, we count concepts (and weight them). Assuming we think the concept of HUMILITY is important, we’d want verses expressing that concept to rank more highly, regardless of whether they used a more common word like “humilty”, or a less common one like “lowly”. Converting words to concepts is a difficult challenge, however.
    • Connections to other data also affect importance. In some sense, every verse that reports words of Jesus is probably more important to a Christian than one whose importance is otherwise comparable, which is why we have the convention of printing Bibles with the words of Christ in red (a binary system for visualizing importance).
    • We might even consider negative factors: a lower rank for unfamiliar, hard-to-pronounce names, or “taboo” words.

Unlike TopVerses.com, i don’t see a particular need to provide a unique rank for each verse. If each verse has a score (to simplify the math, a decimal between 0 and 1 is a common approach), you can simply pick the top n verses that fit your purpose, and then order any ties canonically.

Comparing Dictionary Reference Citations

I did a small experiment to compare the most frequent reference citations in seven Bible dictionaries that are incorporate in Logos’s software (so this is citation frequency, not search frequency). I extracted and counted all the references, and then aggregated the counts across all seven: the top 20 references are shown below, along with how many “votes” they received in the OpenBible.info list. In the case of whole chapter references (four of the top ten), i’ve indicated with yes/no whether any verse from that chapter occurs in the OpenBible list.

There’s relatively little overlap between the two lists: only seven of these are in the OpenBible list. Many of these make sense given the different purposes of reference works: for example, Is 61.1 is a key messianic text. The high rank for 2 Ki 15.29 is initially puzzling, but probably results from being commonly cited in discussions of the conquests of Tiglath-Pileser and the Babylonian exile. Overall, this is probably much too small a sample to show the correspondences: i presume we’d find much more overlap in the top few hundred.

Reference Aggregate Count Count In
OpenBible List
Jn 1:14 169.5 1
2 Ki 15:29 165.2 0
Is 61:1 159.8 0
Ac 1:13 151.7 0
Ge 1 150.0 yes
Ac 15 143.0 no
Ge 2:7 142.3 no
Ge 46:21 139.3 no
Jn 3:16 137.8 4
Ge 1:26 135.2 3
Is 7:14 134.3 1
Mt 28:19 130.2 3
Da 7:13 130.0 0
Ps 2:7 129.8 0
1 Pe 2:9 126.3 0
Ac 20:4 124.3 0
Lk 3:1 123.8 0
Mk 10:45 123.7 0
1 Sa 1:1 121.5 0
Ac 1:8 120.8 3

Details:

Conclusions

None of this is meant as criticism of the particular sites mentioned above. I strongly believe that any user-oriented, empirically-based data set is better than nothing, and in most endeavors like this, “the best data is more data”. * But with more data comes more complexity, and i’ve only scratched the surface here in considering some of the different factors.

The key point is this: if we want to measure something, we need to be clear up front about exactly what it is, and also what purpose we hope it will serve. I never stop being amazed at how often “obvious” approaches to data problems produce surprising results.


* In my recollection, this quote is attributed to Bob Mercer, a leading researcher in statistical language processing who was part of the IBM research group in the 1990s. I haven’t been able to verify a real source, however.

January 31st, 2009

Search Engine Optimization for Blogs and Non-profits

I listen to as many podcasts as i can, usually as a way to keep my mind engaged while my body is otherwise occupied with things like vacuuming, exercising, or taking long drives. I’m a glutton for ideas, so for me it’s a great way to spark creativity and explore new interests, usually in the realm of new technology. Some of my favorites feeds:

A recent IT Conversations podcast was on Search Engine Marketing, a discussion with Mike Moran and Bill Hunt, authors of the book Search Engine Marketing Inc. A lot of their discussion focuses on companies whose web presence provides real revenue, and who therefore have a strong financial motivation to think hard about Search Engine Optimization (SEO). They’ve got some good advice: focus on content, check your description, write an article that solves a real problem (so others can link to it and build your web rank). But there are still plenty of us producing blogs (like Blogos) and open resource websites (like SemanticBible) whose motivation may be different: and SEO still matters for us.

If you’re reading this, in marketing terms, you’re a potential customer of my “brand”, and each web page or blog post i create involves, at one level, a marketing activity directed at you. I don’t get any revenue from my readers: my only half-hearted attempt at this is when i remember to put my Amazon Associates tag in a book recommendation (and to my knowledge, that’s never paid off). I don’t do ads either. I do, however, get something less intangible, but perhaps more important: blogging enhances my digital identity, including my reputation. If you’re in a high tech field, your on-line identity is becoming as important a representation of you to prospective employers as your resume. In my case, my unpaid activities of blogging, conference speaking, and web site development led pretty directly to my current work at Logos.

Of course, given the wide-open nature of web search, there are plenty of people who get to my blog for unrelated reasons. While i don’t want to repeat them here and perpetuate the problem, at one point a popular set of keywords leading people to Blogos had to do with my quoting some news story about home-manufactured, uh, pharm-a-sue-tickles. While it’s possible some of those misdirected searchers found some higher knowledge, most of them probably spent one second’s attention before clicking away. Moran and Hunt make a really good point here: these people are not “good customers”, and you’re not helping them or yourself by trying to attract them. Instead, they recommend you think carefully about what makes your site or blog distinctive: what are the target keywords you want to attract? Then determine a strategy for “owning” (to the extent possible) the search results for those keywords.

Example: with Google, i’m #1 among the 750k results for “semantic bible” (entered without quotes). I’m #3 for “hyperconcordance” (a modest achievement given there are only 5000 results). A two-year old post is #4 for blogos (but not the home page?? i must be doing something wrong): since blogging has become more popular, so has the name (though i was there first).  But i’m not even in the top 50 for “digital bible”, even though those are important keywords related to my content. Given all the competition in that space, it would take enormous effort to achieve a high ranking there. In this case, my efforts are probably better spent elsewhere.

There are plenty of free resources out there: Moran’s Skinflint Search Marketing is a good place to begin, and Google Analytics already provide far more capability than i know how to take advantage of. Which brings me back to the real challenge of doing SEO for non-profit sites: deciding how much effort is really worth it. But if nothing else, thinking about SEO gets you thinking about what your site is for in the first place, and that’s always a good thing to keep in focus.

January 19th, 2009

Semantic Search in the Gospels

Cognition uses “Semantic NLP, the Company’s patented linguistic meaning-based text processing technology” to process natural language text and make the information in it searchable by meaning rather than simply by word. They’ve recently released a demo based on the Gospels and associated notes from the NET Bible.

Dr. Kathleen Dahlgren, their founder and CTO, has been working in the field of NLP for a long time, so this is not some newly-launched startup with more hype than substance. Their underlying technology represents an enormous investment in the linguistic data required for actually understanding language. Having worked in closely-related fields for most of my pre-Logos career (and having thought quite a bit about things like this for Bible study and search), i was very curious to take it for a spin and see how well it does. While they correctly claim that there’s a lot of figurative language in the Gospels, there’s also plenty of plain narrative description that ought to understandable.

Not surprisingly, the examples on their demo page look reasonably good (that’s what you do when you put together a demo, after all). “Who double-crossed the Lamb of God?” is a clever way to show off their ability to recognize double-cross as a synonym for betray, and Lamb of God as an alternate designation for Jesus. I might quibble with “blessed are the pure in heart” (Matt 5:8) as a hit for “blessed are the innocent”, but it’s clearly on the right track.

But they also allow you to try your own queries, which is where you can really see whether this approach helps or not. Some queries i tried:

  • “a valuable pearl” comes up empty. Just searching for “pearl” finds Matt 13:45-46, but not finding “a pearl of great value” as a valuable pearl seems like a definite lack of understanding. Just searching for “valuable” finds a great many hits (remember this includes the NET Bible notes as well as the text), but some of the senses it retrieves don’t seem like a good fit for “valuable”: for instance, ” a major category of meaning”, “an aorist main verb”, “is redundant” (?), “is not being critical of”. I understand why some of these matched, but they don’t convince me that there’s deep understanding going on.
  • “good soil” also comes up empty, even though this phrase occurs verbatim in Luke 8.15.
  • “a herd of swine” gets in the neighborhood: it apparently bridges the gap between swine and pig, and finds Matt 8.31 (apparently getting to “drive” from “herd”?), and some other notes related to “herdsmen”. But surprisingly it misses Mark 5.11 which has “a herd of pigs”.
  • “Peter’s brother” first tries the interpretation of “brother” as “member of a religious order” (!), but there’s a nice interface where you can choose alternate senses. After selecting the “sibling” sense, it does better, though the results aren’t always appropriate (e.g. Matt 17.1).
  • You can try questions like “Where did Jesus live?”, though the responses look like it’s merely searching on individual content words, not the semantics of the proposition. “Where did Herod live?” brings back a few interesting results where “live” has been connected to “palace”, which then results in helpful information because his palace was in Jerusalem.

Finding a use case for this particular demo comes down to finding an interesting intersection of several requirements: how many queries are there that

  • you’d actually want to look for
  • you couldn’t easily find based on the words alone
  • don’t require synthesis or reasoning (that’s really asking too much of this technology)

It was harder than i thought to come up with cases like this, and for most of them, the results still left something to be desired. But all critique aside, kudos to Cognition for being brave enough to put their technology out there and letting the results speak for themselves. Real understanding of text is an extremely difficult task: it looks to me like Cognition has made substantial progress, though the problem is still far from solved.

October 24th, 2008

BibleTech 2009

Things have been silent at Blogos for several months now: i needed to take a break and focus more intensely on moving along some of our major data projects at Logos (like the Bible Knowledgebase).

But i’m ready to get back to a more regular blogging schedule, and nothing gets the creative juices flowing like the prospects of another BibleTech conference! The first BibleTech (this past January) was one of the highlights of my year: here’s a list of 2008 speakers, including two presentations by me (you can find links to the slides here, and there’s an MP3 for the Zoomable Bible talk here, though be warned that it’s 150Mb and non-streaming). So i’m really looking forward to the next one, March 28-29 in Seattle.

The call for presentations has gone out, and so i face the dilemma of choosing among lots of different ideas and topics, and deciding what to propose. So many smart people attended the last conference that i’d love to just sit around and talk tech for several days straight, but i probably have to focus on just one or two topics.

So here’s your chance to give me some feedback (and for me to learn whether anybody’s still listening!). I’m planning to blog about some of my presentation ideas in subsequent posts, and i’d love to hear your comments about them. Does the topic make sense? Would you want to hear about it? Is it compelling, relevant, important, “cool”? Is it too obscure, too far out there, too geeky? What can i improve from last year (if you attended one of my talks)? It would really help me to have some feedback on these questions, especially from those who attended last year and therefore have a good feel for what the conference is all about (but i’ll take any comments i can get).


If you’re on Facebook, please join the BibleTech group.

Maybe you should be presenting at BibleTech 2009 too! The call for participation is open until Nov 3, and describes what we’re looking for, so get those abstracts in. And if i happen to mention a topic that you’re interested in presenting on, let me know and then go for it! There’s no shortage of things i’d like to talk about …

May 12th, 2008

Collective Intelligence Applied to Biblical Studies

Collective intelligence is a broad term covering many cases where intelligence or novel information result from the collaborative activities of many individuals. Recent and well-known examples include sites like

  • Wikipedia, where people work together to create encyclopedia-like content
  • del.icio.us: i label (or ‘tag’) web page content, and others can look at my tags, or lots of people’s tags, to find things of interest.
  • slashdot, digg, reddit, and similar sites that collect votes on the interest of web pages and then ranked the pages by popularity

Though more popular perhaps in the last few years, these kinds of approaches have been around for some time. Google’s dominance of web search, arguably the current “killer app” on the internet (along with email), comes from a kind of collective intelligence. Their PageRank algorithm uses the number of links to a page from other web sites to estimate how important the page is, and assign its rank in the results you get back from a web search.

The interesting question to me is how collective intelligence might be usefully applied to Biblical studies. There have been a few projects in this area, though i think it’s fair to say they haven’t yielded too much yet. I’ve written a few posts (here, and almost 2 years ago here) about applying “Web 2.0″ ideas to Bible study. YouVersion is perhaps the most promising of that bunch, but it still doesn’t collect nearly enough intelligence to really be different (meaning that the scale is too small, not that the comments are too stupid :-) ).

Another interesting set of data come from the ESV Bible Blog, where they analyzed their web searches to identify the most popular verses in the Bible. This provides some well-grounded analysis of people’s actual behavior (which is always better than guessing what they do). But as such it’s still just data, not information or knowledge (more about that in this rather conceptual post about the difference between data, information, and knowledge). In other words, how do we apply this data to do something new and different when it comes to Bible study?

Here’s one example collective intelligence project i’ve pondered (though i haven’t yet found time to actually construct it): identifying parables in the Gospels. We have numerous sayings of Jesus throughout the Gospels that use stories, allegories, or other metaphorical language to make a point. Some of these are explicitly described in their context as parables: for example, Mark 4:2 tells us

And he was teaching them many things in parables, and in his teaching he said to them …

We conventionally refer to the story that follows in Mark 4:3-8 as “the Parable of the Soils” (or, perhaps less appropriately given the focus of the story, the Parable of the Sower). However, other stories with the same character aren’t explicitly called parables in the text, or are labeled as parables in one gospel but not another. In fact, the Greek word parabolÄ“ (from which our word parable is a straightforward transliteration) doesn’t occur in the Gospel of John at all, though several of the teachings recorded there have a similar style as parables from the Synoptic Gospels.

If you consult the various Bible reference works, many of which contain lists of the parables of Jesus, you find a great deal of disagreement as to which passages are and are not parables. Not surprisingly, this also reflects divergence of opinion as to what ought to be considered a parable: only those instances where the term parabolÄ“ is used? Those as well as parallel stories? Any kind of figurative language? Wilmington’s Book of Bible Lists lists 38 parables of Jesus (several of which occur in multiple Gospels): the Baker Encyclopedia of the Bible lists 40; Harper’s Bible Dictionary has only 26 (plus a few others found only in the Gospel of Thomas).

Here’s a good candidate for applying collective intelligence to a real issue in Biblical studies: what should we list as a parable? You could approach it like this:

  • Identify the entire set of candidate passages that anybody anywhere has considered, or might consider, a parable (and maybe throw in a few others as a control group)
  • Create a web site where people could log in and simply vote up or down on each passage: Parable or Not?
  • Along with their votes, each participant should record their criteria for voting
  • Participants could also log in as proxies for existing reference lists or scholarly authorities and enter (as votes) what Wilmington, Dodd or Jeremias called a parable.

I’d think at least 100 participants would be required to make this exercise in distributed Biblical scholarship meaningful, and some might turn their noses up at the thought of letting unwashed masses have an equal say with the scholars. But wouldn’t this be an interesting exercise? In particular, rather than “the list” of parables, it would give us the basis for a distribution of opinions: for example, 95% might agree that Mark 4:3-8 is a parable, while perhaps only 10% would label Jesus’ saying about the vine and vinedresser (John 15:1-17) that way. And the criteria might provide some interesting clusters of votes. I’d love to add this kind of data to the Composite Gospel. In fact, that’s what started the idea: i sat down to label the parables, and quickly realized this wasn’t a straightforward task.

Additional resource: The Horizon Project is one product of the New Media Consortium that “charts the landscape of emerging technologies for teaching, learning and creative expression”. In my view, seminary education as well as pastoral preaching and teaching belong among this target audience. The Horizon Project produces an annual report on what’s here now, coming soon in the mid-term, and on the far-term horizon (3-5 years). Collective intelligence is one of their far-term horizon technologies: you can read more about in the Horizon Report.

April 24th, 2008

The Semantic Web as Data + Intelligence

Talking with Talis is rapidly becoming my favorite podcast source: Paul Miller has a lot of really interesting guests addressing topics at the intersection of libraries and the Semantic Web.

Today i listened to an interview with Dr. Jim Hendler, now at Rensselaer Polytechnic Institute, but previously at University of Maryland and a key figure in the establishment of OWL during his tenure at DARPA. My comments here are really just a rehash of some things he said much better, and with much more authority (given his history in the field) — but blame me, not him, for what i say below.

The concept of the Semantic Web brings together two different communities , along with their respective priorities and technologies. Many of the disagreements within what looks like a single community are just two sets of people talking about different things (but using similar terminology). The “semantic” part is mostly represented by the Artificial Intelligence community, with interests in careful ontology development, deep reasoning, theoretical correctness, and academic activities. The “web” community has been out there for more than a decade, building the World Wide Web with HTML and lots and lots of data, and is now looking for ways to make it more useful, connected, and extensible.

You can represent these two concerns as two axes on a graph, and many different endeavors tend strongly toward one side or the other, depending on whether they emphasize the “intelligence” dimension, or the “data” dimension. Just a few examples on the data side (that could be multiplied many times over):

  • Yahoo plans to start indexing RDFa content (i discussed this a bit in my post about Bibleref and RDFa). As one of the major web players, this adds just a little more intelligence to a lot of data (potentially: users still have to create RDFa markup)
  • Freebase is harvesting data from Wikipedia and other sources, and then adding a modest amount of structured relations.
  • Talis has their own set of data from a long history of library applications.

On the “intelligence” side would be big ontology development efforts, and academics working on reasoning: Hendler also called out pharmaceutical companies as tending toward this dimension. Hendler’s own bet is that progress is more likely to come from data-side approaches than the hard-core intelligence side (and i think he’s right). He sees the combination of SPARQL and persistent identifiers as two recent developments that are likely to move the field ahead: these are things i’m looking at closely as well in Bible Knowledgebase development (more on the second one to come soon).