God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
February 26th, 2010

LCV Talk at Semantic Technology Conference

I’ll be giving a talk at the Semantic Technology Conference, June 23 from 7:30AM8:20am (ouch!), in San Francisco, CA. The talk title is “Using a Controlled Vocabulary for Managing a Digital Library Platform“: no talk page yet, but the abstract follows. If you’re there, come by and say hello!

(Astute readers will note some similarities between this and my upcoming BibleTech talk. But the audiences are quite different, so the content will be too. This talk will provide “a practical case study on semantically organizing reference material to support search and navigation, using a controlled vocabulary.”)

Abstract

Encyclopedias and other subject-oriented reference books frequently present the same content using different names: and users often look for this information using other names altogether.

The Logos Controlled Vocabulary (LCV) organizes parallel but distinct content in the domain of Biblical studies to integrate reference information and support search, discovery, and knowledge management. The LCV captures

  • preferred and alternate terminology
  • inter-term relationships
  • term hierarchy
  • linkage to other semantic information

The initial version of the LCV (now shipping in the Logos digital library platform) comprises some 11,000 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions to terminology and content.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future.

Keywords: , , , ,

February 22nd, 2010

Building an Architecture of Participation in Bible Study

The Cornucopia of the Commons

Some time back, Tim O’Reilly (The Architecture of Participation) echoed and applied some observations from Dan Bricklin (the Cornucopia of the Commons) about the architecture of Napster and  other significant web-based systems. The individual details are well worth reading, but here’s the summary form. There are several common models for how to build large datasets that are valuable to people:

  1. Pay people to build it (Bricklin calls this “Organized Manual”). Examples include the original Yahoo! directory of the web, and the Encyclopedia Britannica. There’s an variant that represents smart algorithms rather than just human effort (Bricklin: “Organized Mechanical”): this is how Google has built its indexes. But it still represents a significant monetary investment by somebody who probably expects something in return.
  2. Get volunteers (Bricklin’s “Volunteer Manual”): Wikipedia is the preeminent example here, along with Linux, the Open Directory Project, and a great many open source projects. People do this work because they value the end result, and the project coordinates and magnifies those efforts.
  3. Architect in such a way that individual self-interest creates collective value.

Napster (the original peer-to-peer version) was proposed by Bricklin as a prime example of the third model: simply by listening to your music (within the Napster ecosystem), the default settings meant you were also sharing that music with everybody else. Quoting Bricklin:

What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present, especially since sharing is the default.

This is Bricklin’s Cornucopia of the Commons (an allusion to Garrett Hardin’s Tragedy of the Commons): a system designed in such a way that use brings overflowing abundance.

(You might think blogging and twittering are like this, but they’re not. Nobody tweets because it has direct, inherent value to them: instead, it’s an outgrowth of a narcissistic, self-centered open, generous belief that what i say might have value to others. Few of us would do it if nobody else was listening. )

Models for Data Creation In Biblical Studies

All that (and Napster!) is now history, and i don’t want to get distracted by the peer-to-peer model that made Napster so powerful (Bricklin argues that’s not the reason it succeeded), or the legal issues that led to its demise. Instead, i want to reflect here on how these principles apply to Biblical studies and software.

With Logos 4, we’ve launched a major expansion of our Biblical Knowledge, by expanding Biblical People, adding Places and Things, and building around the large set of concepts we call the Logos Controlled Vocabulary. This was accomplished through the Organized Manual method: we paid a bunch of people (me included) to architect and populate this data, in a major development effort that stretched over several years. You could view the vast network of links that make Logos more than just a collection of texts as an extension of the same principle (through the resulting software program doesn’t look so much like a database). It represents literally hundreds of thousands of hours of effort in book markup and design, along with lots of “Organized Mechanical” algorithmic work.

There are also lots of examples of Volunteer Manual projects related to the Bible. The Sword Project is like Linux for Bible software. e-Sword has a smaller group of developers, but the same framework of a volunteer effort which is given away. Open Scriptures is building a platform and API for others to use in building Bible-based applications. Web 2.0 efforts like YouVersion let people tie their reflections directly to the Biblical text, and numerous projects have sprung from the Wikipedia mold like Theopedia. My own SemanticBible projects are much more limited, but in a similar spirit.

Logos has been active with the Volunteer Manual approach as well. The Logos Topics website combines our Organized Manual data and architecture of topics with user-contributed extensions of additional terminology, links within Logos, and even links to other websites. This lets us do some neat things like extending the desktop application content through user contributions on the web. Like Wikipedia, these are altruistic contributions from people who want to share their knowledge with others.

Sermons.logos.com works in a similar fashion: if you’re a pastor who writes down your sermon, and you’re willing to upload and share it, lots of others (both on the web and in Logos software) can benefit from what you’ve created. This is closer to the Cornucopia of the Commons model, but it’s still a voluntary and indirect process: my sermon doesn’t get shared as a natural by-product of my preparation activity.

The Cornucopia and Bible Study

The interesting question to me is how to achieve the third model, where my own use of a tool provides a direct benefit to others through a network, not because i’m behaving altruistically but simply because the system is architected to work that way. This is closely related to the whole Web2.0 meme (can it really have been five years already?!?) of “software that gets better the more it gets used.”

One thought: lots of web sites use RefTagger to provide a nice pop-up of Bible text for their readers, a benefit that enriches the experience of visitors to their site. Twitter users can similarly use ref.ly to shorten Bible references, which, like RefTagger links,  in turn resolve to references on Bible.Logos.com.   Could those links be converted into data indicating, for example, the relative popularity of different verses, and then displayed back to users?

Aggregating users’ operation of Logos software (in a suitably anonymized fashion, of course) could also provide data on the most popular resources, searches, and topics, which could then be turned around into recommendations (”Looking for a Bible dictionary article on ‘marriage’? Here are the ones our users have found most useful ….”).

But none of these seem to me to accomplish the full promise of the Cornucopia of the Commons. There has to be more here than simply harnessing popularity (though sites like Digg and del.icio.us have shown how useful that can be). I’m still trying to imagine what data sets could be created by people who are already committed to Bible study, as a normal outgrowth of what they do anyway. Any thoughts? Please share a comment.

January 25th, 2010

BibleTech:2010 Talk – The Logos Controlled Vocabulary

The program for BibleTech:2010 has been up for a couple of weeks now, and i’ve been delinquent in failing to point that out. We’ve got a full roster of really interesting talks that span the gamut from friendly warm technology to hard-core geekishness: Bible translation, social media, Biblical linguistics, mobile computing, preaching, publishing, tweeting, and more. And this year, it’s in San Jose, CA: i’m hoping that will open up attendance to some folks who have the misfortune to not live in the beautiful Pacific NW. The dates are March 26-27, 2010.

I’ll be giving two talks this year: here’s my abstract for the first one, on the Libronix Logos Controlled Vocabulary.


Dozens of books provide terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other subject-oriented reference works. However, the terminology used varies between books, authors, and publishers, and doesn’t always include all the terms a user might employ to find information.

The Libronix Logos Controlled Vocabulary (LCV) organizes content from multiple Bible dictionaries to integrate information across the Logos library. As a controlled vocabulary, the LCV identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV (shipping now with Logos 4) comprises some 11,100 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future. This includes some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

Update: we’ve decided in general to retire the “Libronix” name for Logos technologies, so i’m trying to get on board by starting to call this the Logos Controlled Vocabulary.

November 2nd, 2009

Logos 4 Launches Today

I’m thrilled to announce that we’re releasing Logos Bible Software 4 today. This is a complete rewrite from the ground up of the best Bible study software on the planet, so that makes this an exciting day in my book.

Logos 4 sports an entirely new interface to make it easier than ever to find what you’re looking for and keep your study space organized and effective. There’s a wealth of new, visually oriented resources, and better controls for working through the enormous space of resources Logos makes available. There’s even an iPhone app for no extra charge!

That’s the marketing view (and i stand behind it). But this means much more to me on a very personal level. It’s been almost 3 years since i came at Logos, and this will be the first time most of my work has seen the light of day. Specifically, Logos 4 contains the work of my colleagues and me in several new areas:

  • Biblical People, which organizes information about the 3300 individuals, groups of people, and deities named in the Biblical text. It includes a comprehensive list of references, their family relationships, links to dictionary articles, and links to related items. It also includes family tree and story-based diagrams. And everything is hyperlinked.
  • Biblical Places includes all the same kinds of information for 1200 named places from the Bible: cities, regions, even geographic features like rivers and mountains. Along with the data, there are 60 new high-resolution maps commissioned by Logos and covering the major Biblical events, as well as a mega-map that shows all the places together.
  • Biblical Things describes the physical objects of the Bible: animals, plants, body parts, clothing, food and drink, and much more, as well as specific items like Noah’s ark and Goliath’s sword and weights and measures. There are more than 1000 objects here, which also bring together thousands of images from across the library.
  • There’s also a new collection of high-resolution infographics illustrating different aspects of the Biblical world (and i’m extra proud that the bulk of this work was managed by my wife Donna)
  • In additional to regular word search (which is much faster than ever), under the hood is the Libronix Controlled Vocabulary (LCV), working to organize 11,000 different subjects in the Biblical studies literature and coordinating information across the library.

So if you’ve been following my posts on the Bible Knowledgebase … well, now it’s here. I can’t overstate how important i think this is: this is quite literally the first time in the centuries-old history of Biblical studies that this information has been made available in this way. The LCV isn’t quite as visible (yet), but it’s also an important organizing feature that will continue to grow in power going forward.

I hope you’re catching my sense of excitement about these new resources (and this says nothing about all the hard work of my dozens of colleagues in other areas). I hoped i’ve piqued your interest to learn more about Logos 4. It really is a watershed event in Bible software.

Obligatory disclaimer: i work for Logos and highly value what i do there. So i’m not the least bit objective about this. (more detailed disclosures)

November 1st, 2009

Bible Chatbots

Suppose you had a database listing authorship and reported speech in the Bible, so that, for each set of words, you know who said or wrote them (the ESV folks did this using Amazon’s Mechanical Turk a few years back, and Jim Albright’s Dramatizer has similar data embedded in it). I assume the speakers have standardized identifiers.

Now imagine a matching algorithm (there are lots of candidates out there) that, when provided with either a question or a list of words, and optionally a speaker, retrieves passages that best match the input.

Example: “why does God allow evil?” might return

  • Eliphaz the Temanite: Job 15:14-16
  • the woman of Tekoa: 2 Sam 14:14
  • the apostle John: 1 John 3:11-17

Querying about “what about God and evil?” with speaker=Jesus might (in the best case) give answers like

  • Matt 5:45
  • Matt 12:35

Apart from how accurate such answers might be (that depends on the sophistication of the matching algorithm), you’ve now got the engine for a chatbot that gives Biblical “answers” . Aside from perhaps being an interesting hack, would this be useful? Lazyweb, are you listening?

June 16th, 2009

http://ref.ly for Bible References

My colleagues at Logos have launched http://ref.ly, a URL shortening service for Bible references: see this blog post. It provides the convenience of TinyURL (turning long unreadable URLs into something much more manageable), but unlike that service also provides readable, understandable content. Once you get past the prefix, you won’t have any trouble figuring out what verse http://ref.ly/Mk4.9 is referring to.

If you’re a Twitter person trying to shoehorn your message into 140-character tweets, you’ll like the fact that this gives you a brief and unambiguous way to both specify a Bible reference and link to the content behind it (the references resolve to the actual verse text at bible.logos.com). Since addressability matters, this is a good thing.

But it has precisely the same utility even if you’re not a Twitterhead (i’m not):

  • it clearly marks a string of characters as a Bible reference
  • it also normalizes the reference into a form that can be automatically processed

While it’s not quite a microformat, it’s really only a small step away from things like bibleref. In particular, if lots of people start using ref.ly references, it will be possible to process that content and understand things like what verses are most popular.

What’s more, editors that recognize and automatically link URLs (like MS Outlook for HTML-based email, and MS Word) will now automatically make Bible links for you (like RefTagger does for blog posts), as long as you’re willing to tack on “http://ref.ly/” and live with the slightly non-traditional format. You don’t need to know anything about how to make a hyperlink in HTML: just a little extra syntax (14 characters, to be precise) moves these references toward much greater usefulness.

May 22nd, 2009

The Most Important Verses? It Depends What You Mean

The title of this post is a deliberate take-off from a recent post at OpenBible.info entitled “What Are the Most Popular Verses in the Bible? It Depends Whom You Ask”. That post combines data from an earlier ESV analysis of search results, TopVerses.com, a BibleGateway (internal) study, and OpenBible data to present a list of 278 verses, all of which occur in the top hundred of at least one source’s “top 100″ list. It’s interesting to see both how much disparity there is (only 13% occur in at least three of the four lists), but also how uneven the distribution is. As one commenter points out, it’s somewhat surprising that there are no verses from Revelation, and Old Testament narrative in particular is largely absent except for Genesis. John’s gospel has about as many popular verses as all the other gospels combined: there are only four verses from Mark (two of them from the often-questioned ending). Less surprisingly, perhaps, there are none from the shortest NT books (Philemon, Jude, 2-3 John). Altogether it’s an interesting study.

The larger question this raises for me is how we might come up with a comprehensive, global score for verses to indicate their importance for a variety of purposes. As the OpenBible post suggests, this depends both on what the source of the data is, but also on what your purpose is and what you mean by “important” (which is certainly different from “popular”, though not completely unrelated).

One useful purpose is ranking verses to present them in response to searches: TopVerses.com is explicitly organized this way, as indicated in this news article about the site. They don’t go into much detail about how they gathered their data, though the scope (37M references scoured from the web) is impressive. But there’s a subtle disparity here: their data is based on counting mentions (citations) in published web pages, but their use case is prioritizing search results, and these may be out of sync. The fact that a given verse is frequently published on the web doesn’t necessarily mean it’s the one you want at the top of the list when you’re doing a word-based search, for example. The other three sources seem perhaps better matched to ranking search results, since they’re derived from searches themselves.

Another key hitch is these endeavors is how to handle range references, both in processing source data and (for search purposes) in handling queries. For example, many Bible dictionaries frequently reference ranges of verses, sometimes extensive, multi-chapter ones. If you’re going to count these, you need to think carefully about how you do the counting so you don’t introduce bias (or, better, you select the bias that’s best suited to your purposes).

For example, in the TopVerses.com ranking John 3.1 is #26, despite the rather plain descriptive content with little obvious spiritual impact.

Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council. (John 3.1, NIV)

While i can’t be sure, i strongly suspect this high rank is an unintended consequence of  dis-aggregating ranges and whole chapter references from John 3. In fact, scanning top verses by chapter from John, the first verse in each chapter is very often the highest or second-highest ranked, and near always among the top ten. This probably says more about the counting methodology than the significance of those verses in particular. The Bible Gateway study focuses on ranges of no more than three verses to explicit mitigate this problem.

Other Measures of Importance

Moving from popularity to importance, i can imagine several different factors that might be combined to produce a more general importance score:

  • citation frequency (based on some corpus). In the TopVerses.com approach, these are web pages, which provides a very large set of observations. A number of other digital text collections would also suit this purpose, and even allow segmentation by genre: for example, you get a very different ranking from the Anchor Yale Bible Dictionary compared to Easton’s (and neither have John 3.16 at the top of the list). See below for more about this.
  • search frequency, the basis for the other three sources in the OpenBible.info post. This could be refined further given data on follow-up activities. For example, depending on your application, verses searches whose results are then expanded into a chapter view or followed to the next verse might get a boost compared to those with no further action (this seems like a variant of “click through” rates used in search engine advertising)
  • content analysis (context-independent): this could have several different flavors.
    • word count: though John 11:35 gets mentioned more than you’d expect precisely because it’s the shortest verse in the (English) Bible, in general longer verses are more likely to be important. This could be refined further given a metric for important words (but now we’ve introduced a new problem: where does that data come from?), which could be used for weighting the counts.
    • We could do even better if, instead of counting words, we count concepts (and weight them). Assuming we think the concept of HUMILITY is important, we’d want verses expressing that concept to rank more highly, regardless of whether they used a more common word like “humilty”, or a less common one like “lowly”. Converting words to concepts is a difficult challenge, however.
    • Connections to other data also affect importance. In some sense, every verse that reports words of Jesus is probably more important to a Christian than one whose importance is otherwise comparable, which is why we have the convention of printing Bibles with the words of Christ in red (a binary system for visualizing importance).
    • We might even consider negative factors: a lower rank for unfamiliar, hard-to-pronounce names, or “taboo” words.

Unlike TopVerses.com, i don’t see a particular need to provide a unique rank for each verse. If each verse has a score (to simplify the math, a decimal between 0 and 1 is a common approach), you can simply pick the top n verses that fit your purpose, and then order any ties canonically.

Comparing Dictionary Reference Citations

I did a small experiment to compare the most frequent reference citations in seven Bible dictionaries that are incorporate in Logos’s software (so this is citation frequency, not search frequency). I extracted and counted all the references, and then aggregated the counts across all seven: the top 20 references are shown below, along with how many “votes” they received in the OpenBible.info list. In the case of whole chapter references (four of the top ten), i’ve indicated with yes/no whether any verse from that chapter occurs in the OpenBible list.

There’s relatively little overlap between the two lists: only seven of these are in the OpenBible list. Many of these make sense given the different purposes of reference works: for example, Is 61.1 is a key messianic text. The high rank for 2 Ki 15.29 is initially puzzling, but probably results from being commonly cited in discussions of the conquests of Tiglath-Pileser and the Babylonian exile. Overall, this is probably much too small a sample to show the correspondences: i presume we’d find much more overlap in the top few hundred.

Reference Aggregate Count Count In
OpenBible List
Jn 1:14 169.5 1
2 Ki 15:29 165.2 0
Is 61:1 159.8 0
Ac 1:13 151.7 0
Ge 1 150.0 yes
Ac 15 143.0 no
Ge 2:7 142.3 no
Ge 46:21 139.3 no
Jn 3:16 137.8 4
Ge 1:26 135.2 3
Is 7:14 134.3 1
Mt 28:19 130.2 3
Da 7:13 130.0 0
Ps 2:7 129.8 0
1 Pe 2:9 126.3 0
Ac 20:4 124.3 0
Lk 3:1 123.8 0
Mk 10:45 123.7 0
1 Sa 1:1 121.5 0
Ac 1:8 120.8 3

Details:

Conclusions

None of this is meant as criticism of the particular sites mentioned above. I strongly believe that any user-oriented, empirically-based data set is better than nothing, and in most endeavors like this, “the best data is more data”. * But with more data comes more complexity, and i’ve only scratched the surface here in considering some of the different factors.

The key point is this: if we want to measure something, we need to be clear up front about exactly what it is, and also what purpose we hope it will serve. I never stop being amazed at how often “obvious” approaches to data problems produce surprising results.


* In my recollection, this quote is attributed to Bob Mercer, a leading researcher in statistical language processing who was part of the IBM research group in the 1990s. I haven’t been able to verify a real source, however.

January 31st, 2009

Search Engine Optimization for Blogs and Non-profits

I listen to as many podcasts as i can, usually as a way to keep my mind engaged while my body is otherwise occupied with things like vacuuming, exercising, or taking long drives. I’m a glutton for ideas, so for me it’s a great way to spark creativity and explore new interests, usually in the realm of new technology. Some of my favorites feeds:

A recent IT Conversations podcast was on Search Engine Marketing, a discussion with Mike Moran and Bill Hunt, authors of the book Search Engine Marketing Inc. A lot of their discussion focuses on companies whose web presence provides real revenue, and who therefore have a strong financial motivation to think hard about Search Engine Optimization (SEO). They’ve got some good advice: focus on content, check your description, write an article that solves a real problem (so others can link to it and build your web rank). But there are still plenty of us producing blogs (like Blogos) and open resource websites (like SemanticBible) whose motivation may be different: and SEO still matters for us.

If you’re reading this, in marketing terms, you’re a potential customer of my “brand”, and each web page or blog post i create involves, at one level, a marketing activity directed at you. I don’t get any revenue from my readers: my only half-hearted attempt at this is when i remember to put my Amazon Associates tag in a book recommendation (and to my knowledge, that’s never paid off). I don’t do ads either. I do, however, get something less intangible, but perhaps more important: blogging enhances my digital identity, including my reputation. If you’re in a high tech field, your on-line identity is becoming as important a representation of you to prospective employers as your resume. In my case, my unpaid activities of blogging, conference speaking, and web site development led pretty directly to my current work at Logos.

Of course, given the wide-open nature of web search, there are plenty of people who get to my blog for unrelated reasons. While i don’t want to repeat them here and perpetuate the problem, at one point a popular set of keywords leading people to Blogos had to do with my quoting some news story about home-manufactured, uh, pharm-a-sue-tickles. While it’s possible some of those misdirected searchers found some higher knowledge, most of them probably spent one second’s attention before clicking away. Moran and Hunt make a really good point here: these people are not “good customers”, and you’re not helping them or yourself by trying to attract them. Instead, they recommend you think carefully about what makes your site or blog distinctive: what are the target keywords you want to attract? Then determine a strategy for “owning” (to the extent possible) the search results for those keywords.

Example: with Google, i’m #1 among the 750k results for “semantic bible” (entered without quotes). I’m #3 for “hyperconcordance” (a modest achievement given there are only 5000 results). A two-year old post is #4 for blogos (but not the home page?? i must be doing something wrong): since blogging has become more popular, so has the name (though i was there first).  But i’m not even in the top 50 for “digital bible”, even though those are important keywords related to my content. Given all the competition in that space, it would take enormous effort to achieve a high ranking there. In this case, my efforts are probably better spent elsewhere.

There are plenty of free resources out there: Moran’s Skinflint Search Marketing is a good place to begin, and Google Analytics already provide far more capability than i know how to take advantage of. Which brings me back to the real challenge of doing SEO for non-profit sites: deciding how much effort is really worth it. But if nothing else, thinking about SEO gets you thinking about what your site is for in the first place, and that’s always a good thing to keep in focus.

January 19th, 2009

Semantic Search in the Gospels

Cognition uses “Semantic NLP, the Company’s patented linguistic meaning-based text processing technology” to process natural language text and make the information in it searchable by meaning rather than simply by word. They’ve recently released a demo based on the Gospels and associated notes from the NET Bible.

Dr. Kathleen Dahlgren, their founder and CTO, has been working in the field of NLP for a long time, so this is not some newly-launched startup with more hype than substance. Their underlying technology represents an enormous investment in the linguistic data required for actually understanding language. Having worked in closely-related fields for most of my pre-Logos career (and having thought quite a bit about things like this for Bible study and search), i was very curious to take it for a spin and see how well it does. While they correctly claim that there’s a lot of figurative language in the Gospels, there’s also plenty of plain narrative description that ought to understandable.

Not surprisingly, the examples on their demo page look reasonably good (that’s what you do when you put together a demo, after all). “Who double-crossed the Lamb of God?” is a clever way to show off their ability to recognize double-cross as a synonym for betray, and Lamb of God as an alternate designation for Jesus. I might quibble with “blessed are the pure in heart” (Matt 5:8) as a hit for “blessed are the innocent”, but it’s clearly on the right track.

But they also allow you to try your own queries, which is where you can really see whether this approach helps or not. Some queries i tried:

  • “a valuable pearl” comes up empty. Just searching for “pearl” finds Matt 13:45-46, but not finding “a pearl of great value” as a valuable pearl seems like a definite lack of understanding. Just searching for “valuable” finds a great many hits (remember this includes the NET Bible notes as well as the text), but some of the senses it retrieves don’t seem like a good fit for “valuable”: for instance, ” a major category of meaning”, “an aorist main verb”, “is redundant” (?), “is not being critical of”. I understand why some of these matched, but they don’t convince me that there’s deep understanding going on.
  • “good soil” also comes up empty, even though this phrase occurs verbatim in Luke 8.15.
  • “a herd of swine” gets in the neighborhood: it apparently bridges the gap between swine and pig, and finds Matt 8.31 (apparently getting to “drive” from “herd”?), and some other notes related to “herdsmen”. But surprisingly it misses Mark 5.11 which has “a herd of pigs”.
  • “Peter’s brother” first tries the interpretation of “brother” as “member of a religious order” (!), but there’s a nice interface where you can choose alternate senses. After selecting the “sibling” sense, it does better, though the results aren’t always appropriate (e.g. Matt 17.1).
  • You can try questions like “Where did Jesus live?”, though the responses look like it’s merely searching on individual content words, not the semantics of the proposition. “Where did Herod live?” brings back a few interesting results where “live” has been connected to “palace”, which then results in helpful information because his palace was in Jerusalem.

Finding a use case for this particular demo comes down to finding an interesting intersection of several requirements: how many queries are there that

  • you’d actually want to look for
  • you couldn’t easily find based on the words alone
  • don’t require synthesis or reasoning (that’s really asking too much of this technology)

It was harder than i thought to come up with cases like this, and for most of them, the results still left something to be desired. But all critique aside, kudos to Cognition for being brave enough to put their technology out there and letting the results speak for themselves. Real understanding of text is an extremely difficult task: it looks to me like Cognition has made substantial progress, though the problem is still far from solved.

October 24th, 2008

BibleTech 2009

Things have been silent at Blogos for several months now: i needed to take a break and focus more intensely on moving along some of our major data projects at Logos (like the Bible Knowledgebase).

But i’m ready to get back to a more regular blogging schedule, and nothing gets the creative juices flowing like the prospects of another BibleTech conference! The first BibleTech (this past January) was one of the highlights of my year: here’s a list of 2008 speakers, including two presentations by me (you can find links to the slides here, and there’s an MP3 for the Zoomable Bible talk here, though be warned that it’s 150Mb and non-streaming). So i’m really looking forward to the next one, March 28-29 in Seattle.

The call for presentations has gone out, and so i face the dilemma of choosing among lots of different ideas and topics, and deciding what to propose. So many smart people attended the last conference that i’d love to just sit around and talk tech for several days straight, but i probably have to focus on just one or two topics.

So here’s your chance to give me some feedback (and for me to learn whether anybody’s still listening!). I’m planning to blog about some of my presentation ideas in subsequent posts, and i’d love to hear your comments about them. Does the topic make sense? Would you want to hear about it? Is it compelling, relevant, important, “cool”? Is it too obscure, too far out there, too geeky? What can i improve from last year (if you attended one of my talks)? It would really help me to have some feedback on these questions, especially from those who attended last year and therefore have a good feel for what the conference is all about (but i’ll take any comments i can get).


If you’re on Facebook, please join the BibleTech group.

Maybe you should be presenting at BibleTech 2009 too! The call for participation is open until Nov 3, and describes what we’re looking for, so get those abstracts in. And if i happen to mention a topic that you’re interested in presenting on, let me know and then go for it! There’s no shortage of things i’d like to talk about …