God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
March 18th, 2013

BibleTech talk: Automatically Learning Topical Content

I’ve posted the slides from my BibleTech 2013 talk. Here’s the abstract:

Continued work on the Logos Controlled Vocabulary (BibleTech 2010, “A Controlled Vocabulary for Biblical Studies”) has produced a unique collection of topic-aligned content across more than 50 different Bible dictionaries, encyclopedias, and topical indexes in both English and Spanish. This presentation will describe the information we’re learning automatically from this content, including:

  • determining concept importance
  • associating concepts with Bible references
  • extracting and associating names and descriptive terms for concepts
  • relating concepts to each other

You can see the other talks at the BibleTech website. I’ve had a number of positive comments on the talk, which is always gratifying. Slowly but surely, we’re climbing up the data stack …


3/21/2013 update: I’ve used the Slidy framework for presentations for several years because i like the way it puts the whole content out on the web in HTML. However, @JohnRGentry pointed out that my slides don’t work on the iPad because moving forward and backward requires keys. There’s a newer version of Slidy which does support swiping to move through slides, though my experience with it on both Safari and Chrome on iOS hasn’t been great: it’s not easy to register a swipe, and title text sometimes gets lost. I assume these are issues with javascript support on iOS, though i’m not really sure. I’ll try to update my slides to the newest version of Slidy, which will help a little, but i’ll also look for another framework that’s more tablet friendly.

April 25th, 2011

BibleTech Talk Slides: Using the Bible Knowledgebase For Information Integration

Finally got my slides posted from BibleTech:2011 on Using the Bible Knowledgebase for Information Integration. Since i listened to good advice and went a little more toward graphics than bullet points, they’re not completely self-explanatory (but that’s why you should have come, right?).

Audio will show up too at some point, probably at http://www.bibletechconference.com/speakers.

As i’ve told a few of my colleagues since: giving the talk helped convince me even more strongly that Biblical Events will be a really important database for Bible study. Looking forward to getting it all put together.

March 26th, 2011

BibleTech 2011

I had to miss the first day because of another commitment, but today i’m here at BibleTech:2011 and looking forward to a great day of talks. Hopefully mine will be one of them: here’s my abstract.

Using the Bible Knowledgebase for Information Integration

In 2009 I reported on the Bible Knowledgebase (BK), a machine-readable collection of semantically-organized data about people, places, and things in the Bible. This talk will describe how the BK now functions as an essential information resource for Logos, tying together information across the software. In addition, I’ll discuss the continued work on the data over the last two years, including:

  • building a database of Biblical Events
  • adding unnamed entities to the database
  • coordinating information about these entities with the Logos Controlled Vocabulary

I’ll also present prototypes for visualizing BK data to enhance discovery and exploration in the Biblical text.

I’ll be live-blogging a few talks during the day to give a quick-take on the subject for those who can’t be here. You can also follow on Twitter via #BibleTech.

April 2nd, 2010

A Python Interface for api.Biblia.com

Last week Logos announced a public API for their new website, Biblia.com, at BibleTech. Of course, i want to wave the flag for my employer. But i’m also interested as somebody who’s dabbled in Bible web services in the past, most notably the excellent ESV Bible web service (many aspects of which are mirrored in the Biblia API: some previous posts around this can be found here at Blogos in the Web Services category). Dabblers like me often face a perennial problem: the translations people most want to read are typically not the most accessible via API, or have various other limitations.

So i’m happy with the other announcement from BibleTech last week: Logos is making the Lexham English Bible available under very generous terms (details here). The LEB is in the family of “essentially literal” translations, which makes it a good choice for tasks where the precise wording matters. And the LEB is available through the API (unlike most other versions you’re likely to want, at least until we resolve some other licensing issues).

I don’t want to do a review of the entire API here (and it will probably continue to evolve). But here are a couple of things about it that excite me:

  • The most obvious one is the ability to retrieve Bible text given a reference (the content service). Of the currently available Bible versions, the LEB is the one that interests me the most here (i hope we’ll have others in the future).
  • Another exciting aspect for me is the tag service. You provide text which may include Bible references: the service identifies any references embedded in it, and then inserts hyperlinks for them to enrich the text. So this is like RefTagger on demand (not just embedded in your website template). You can also supply a URL and tag the text that’s retrieved from it. One caveat with this latter functionality: if you want to run this on HTML, you should plan to do some pre-processing first, rather than treating it all as one big string. Otherwise random things (like “XHTML 1.0″ in a DOCTYPE declaration) wind up getting tagged in strange ways (like <a href="http://ref.ly/Mal1">ML 1.0</a>).

I’ve just started working through the Biblia API today, but since i’m a Pythonista, developing a Python interface seemed like the way to go. This is still very much a work in progress, but you can download the code from this zip file and give it a whirl. Caveats abound:

  • I’ve only implemented three of the services so far: content() (retrieves Bible content for a reference), find() (lists available Bibles and metadata), and tag() (finds references in  text and enhances it with hyperlinks). And even with these three services, i haven’t supported all the parameters (maybe i will, maybe i won’t).
  • This is my first stab at creating a Python interface to an API, so there may be many stylistic shortcomings.
  • Testing has also gotten very little attention, and bugs doubtless remain.

If you’re interested and want to play along, let me know: we can probably set up a Google group or something for those who want to improve this code further.

March 30th, 2010

BibleTech:2010 Debrief

The BibleTech conference is an annual highlight for those of us who work at the intersection of Bible stuff and technology, and last week’s meeting in San Jose was no exception. This was the third BibleTech — i’ve been fortunate to have attended (and presented at) them all — and there’s always a great mix of new ideas, updates on ongoing projects, and lots of interesting people to talk to. (some other reviews: Rick Brannan, Mike Aubrey, Trey Gourley)

Some of the talks i liked best this year:

  • I was already interested in Pinax before hearing James Tauber’s talk on Using Django and Pinax for Collaborative Linguistics: now i’m itching to get started!
  • Stephen Smith had a nice analysis of the most frequently tweeted Bible passages (though the evidence of vast swaths of Scripture that get very little attention was perhaps a bit depressing).
  • Neil Rees showed Concordance Builder, a program that lets you use a Swahili concordance to bootstrap one for Welsh (or any other pair of languages) with no linguistic knowledge. Building on the Paratext tool, it leverages the verse indexes along with approximate string matching and statistical glossing (technical paper by J D Riding) to produce results that are about 90-95% correct out of the book. This can reduce concordance development to a matter of weeks rather than years.
  • There were several talks related to semantics in addition to mine: Randall Tan talked about more automated methods and fleshed them out relative to the higher-level structure of Galatians, and Andi Wu gave what looked like a really interesting presentation on semantic search based on syntax and cross-language correspondence (alas, i missed it).
  • Weston Ruter talked about APIs they’re developing at OpenScriptures.org (and brought in the Linked Data idea). Logos also unveiled their new API for Biblia.

I felt my talks went well and i got some good feedback. My slides are now posted (if you wrote down URLs at the conference, i didn’t get them quite right :-( but here they’re correct):

(As with some previous talks, i did my presentation with Slidy (previous post): i feel like it’s going a little more smoothly each time.)

February 26th, 2010

LCV Talk at Semantic Technology Conference

I’ll be giving a talk at the Semantic Technology Conference, June 23 from 7:30AM8:20am (ouch!), in San Francisco, CA. The talk title is “Using a Controlled Vocabulary for Managing a Digital Library Platform“: no talk page yet, but the abstract follows. If you’re there, come by and say hello!

(Astute readers will note some similarities between this and my upcoming BibleTech talk. But the audiences are quite different, so the content will be too. This talk will provide “a practical case study on semantically organizing reference material to support search and navigation, using a controlled vocabulary.”)

Abstract

Encyclopedias and other subject-oriented reference books frequently present the same content using different names: and users often look for this information using other names altogether.

The Logos Controlled Vocabulary (LCV) organizes parallel but distinct content in the domain of Biblical studies to integrate reference information and support search, discovery, and knowledge management. The LCV captures

  • preferred and alternate terminology
  • inter-term relationships
  • term hierarchy
  • linkage to other semantic information

The initial version of the LCV (now shipping in the Logos digital library platform) comprises some 11,000 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions to terminology and content.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future.

Keywords: , , , ,

February 22nd, 2010

Building an Architecture of Participation in Bible Study

The Cornucopia of the Commons

Some time back, Tim O’Reilly (The Architecture of Participation) echoed and applied some observations from Dan Bricklin (the Cornucopia of the Commons) about the architecture of Napster and  other significant web-based systems. The individual details are well worth reading, but here’s the summary form. There are several common models for how to build large datasets that are valuable to people:

  1. Pay people to build it (Bricklin calls this “Organized Manual”). Examples include the original Yahoo! directory of the web, and the Encyclopedia Britannica. There’s an variant that represents smart algorithms rather than just human effort (Bricklin: “Organized Mechanical”): this is how Google has built its indexes. But it still represents a significant monetary investment by somebody who probably expects something in return.
  2. Get volunteers (Bricklin’s “Volunteer Manual”): Wikipedia is the preeminent example here, along with Linux, the Open Directory Project, and a great many open source projects. People do this work because they value the end result, and the project coordinates and magnifies those efforts.
  3. Architect in such a way that individual self-interest creates collective value.

Napster (the original peer-to-peer version) was proposed by Bricklin as a prime example of the third model: simply by listening to your music (within the Napster ecosystem), the default settings meant you were also sharing that music with everybody else. Quoting Bricklin:

What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present, especially since sharing is the default.

This is Bricklin’s Cornucopia of the Commons (an allusion to Garrett Hardin’s Tragedy of the Commons): a system designed in such a way that use brings overflowing abundance.

(You might think blogging and twittering are like this, but they’re not. Nobody tweets because it has direct, inherent value to them: instead, it’s an outgrowth of a narcissistic, self-centered open, generous belief that what i say might have value to others. Few of us would do it if nobody else was listening. )

Models for Data Creation In Biblical Studies

All that (and Napster!) is now history, and i don’t want to get distracted by the peer-to-peer model that made Napster so powerful (Bricklin argues that’s not the reason it succeeded), or the legal issues that led to its demise. Instead, i want to reflect here on how these principles apply to Biblical studies and software.

With Logos 4, we’ve launched a major expansion of our Biblical Knowledge, by expanding Biblical People, adding Places and Things, and building around the large set of concepts we call the Logos Controlled Vocabulary. This was accomplished through the Organized Manual method: we paid a bunch of people (me included) to architect and populate this data, in a major development effort that stretched over several years. You could view the vast network of links that make Logos more than just a collection of texts as an extension of the same principle (through the resulting software program doesn’t look so much like a database). It represents literally hundreds of thousands of hours of effort in book markup and design, along with lots of “Organized Mechanical” algorithmic work.

There are also lots of examples of Volunteer Manual projects related to the Bible. The Sword Project is like Linux for Bible software. e-Sword has a smaller group of developers, but the same framework of a volunteer effort which is given away. Open Scriptures is building a platform and API for others to use in building Bible-based applications. Web 2.0 efforts like YouVersion let people tie their reflections directly to the Biblical text, and numerous projects have sprung from the Wikipedia mold like Theopedia. My own SemanticBible projects are much more limited, but in a similar spirit.

Logos has been active with the Volunteer Manual approach as well. The Logos Topics website combines our Organized Manual data and architecture of topics with user-contributed extensions of additional terminology, links within Logos, and even links to other websites. This lets us do some neat things like extending the desktop application content through user contributions on the web. Like Wikipedia, these are altruistic contributions from people who want to share their knowledge with others.

Sermons.logos.com works in a similar fashion: if you’re a pastor who writes down your sermon, and you’re willing to upload and share it, lots of others (both on the web and in Logos software) can benefit from what you’ve created. This is closer to the Cornucopia of the Commons model, but it’s still a voluntary and indirect process: my sermon doesn’t get shared as a natural by-product of my preparation activity.

The Cornucopia and Bible Study

The interesting question to me is how to achieve the third model, where my own use of a tool provides a direct benefit to others through a network, not because i’m behaving altruistically but simply because the system is architected to work that way. This is closely related to the whole Web2.0 meme (can it really have been five years already?!?) of “software that gets better the more it gets used.”

One thought: lots of web sites use RefTagger to provide a nice pop-up of Bible text for their readers, a benefit that enriches the experience of visitors to their site. Twitter users can similarly use ref.ly to shorten Bible references, which, like RefTagger links,  in turn resolve to references on Bible.Logos.com.   Could those links be converted into data indicating, for example, the relative popularity of different verses, and then displayed back to users?

Aggregating users’ operation of Logos software (in a suitably anonymized fashion, of course) could also provide data on the most popular resources, searches, and topics, which could then be turned around into recommendations (“Looking for a Bible dictionary article on ‘marriage’? Here are the ones our users have found most useful ….”).

But none of these seem to me to accomplish the full promise of the Cornucopia of the Commons. There has to be more here than simply harnessing popularity (though sites like Digg and del.icio.us have shown how useful that can be). I’m still trying to imagine what data sets could be created by people who are already committed to Bible study, as a normal outgrowth of what they do anyway. Any thoughts? Please share a comment.

January 25th, 2010

BibleTech:2010 Talk – The Logos Controlled Vocabulary

The program for BibleTech:2010 has been up for a couple of weeks now, and i’ve been delinquent in failing to point that out. We’ve got a full roster of really interesting talks that span the gamut from friendly warm technology to hard-core geekishness: Bible translation, social media, Biblical linguistics, mobile computing, preaching, publishing, tweeting, and more. And this year, it’s in San Jose, CA: i’m hoping that will open up attendance to some folks who have the misfortune to not live in the beautiful Pacific NW. The dates are March 26-27, 2010.

I’ll be giving two talks this year: here’s my abstract for the first one, on the Libronix Logos Controlled Vocabulary.


Dozens of books provide terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other subject-oriented reference works. However, the terminology used varies between books, authors, and publishers, and doesn’t always include all the terms a user might employ to find information.

The Libronix Logos Controlled Vocabulary (LCV) organizes content from multiple Bible dictionaries to integrate information across the Logos library. As a controlled vocabulary, the LCV identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV (shipping now with Logos 4) comprises some 11,100 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future. This includes some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

Update: we’ve decided in general to retire the “Libronix” name for Logos technologies, so i’m trying to get on board by starting to call this the Logos Controlled Vocabulary.

November 2nd, 2009

Logos 4 Launches Today

I’m thrilled to announce that we’re releasing Logos Bible Software 4 today. This is a complete rewrite from the ground up of the best Bible study software on the planet, so that makes this an exciting day in my book.

Logos 4 sports an entirely new interface to make it easier than ever to find what you’re looking for and keep your study space organized and effective. There’s a wealth of new, visually oriented resources, and better controls for working through the enormous space of resources Logos makes available. There’s even an iPhone app for no extra charge!

That’s the marketing view (and i stand behind it). But this means much more to me on a very personal level. It’s been almost 3 years since i came at Logos, and this will be the first time most of my work has seen the light of day. Specifically, Logos 4 contains the work of my colleagues and me in several new areas:

  • Biblical People, which organizes information about the 3300 individuals, groups of people, and deities named in the Biblical text. It includes a comprehensive list of references, their family relationships, links to dictionary articles, and links to related items. It also includes family tree and story-based diagrams. And everything is hyperlinked.
  • Biblical Places includes all the same kinds of information for 1200 named places from the Bible: cities, regions, even geographic features like rivers and mountains. Along with the data, there are 60 new high-resolution maps commissioned by Logos and covering the major Biblical events, as well as a mega-map that shows all the places together.
  • Biblical Things describes the physical objects of the Bible: animals, plants, body parts, clothing, food and drink, and much more, as well as specific items like Noah’s ark and Goliath’s sword and weights and measures. There are more than 1000 objects here, which also bring together thousands of images from across the library.
  • There’s also a new collection of high-resolution infographics illustrating different aspects of the Biblical world (and i’m extra proud that the bulk of this work was managed by my wife Donna)
  • In additional to regular word search (which is much faster than ever), under the hood is the Libronix Controlled Vocabulary (LCV), working to organize 11,000 different subjects in the Biblical studies literature and coordinating information across the library.

So if you’ve been following my posts on the Bible Knowledgebase … well, now it’s here. I can’t overstate how important i think this is: this is quite literally the first time in the centuries-old history of Biblical studies that this information has been made available in this way. The LCV isn’t quite as visible (yet), but it’s also an important organizing feature that will continue to grow in power going forward.

I hope you’re catching my sense of excitement about these new resources (and this says nothing about all the hard work of my dozens of colleagues in other areas). I hoped i’ve piqued your interest to learn more about Logos 4. It really is a watershed event in Bible software.

Obligatory disclaimer: i work for Logos and highly value what i do there. So i’m not the least bit objective about this. (more detailed disclosures)

March 30th, 2009

BibleTech:2009 Postlude

BibleTech:2009 is past now, and (just like last year) was a great opportunity both to hear new ideas about Bible and technology, but also meet and talk with many others with common interests. The few scattered thoughts i jotted down as i was live-blogging talks certainly don’t do justice to the richness of many of the presentations: so don’t judge the quality of their talks by my quick-take notes.

I’ve got slides from my talk on the Bible Knowledgebase posted now on SemanticBible: the navigational structure above them isn’t in place yet, but you should be able to follow the link directly to get there. Once again, i’ve used Slidy for the presentation, and that process went a little more smoothly this time (which probably just means i’ve gotten better at it). View the source if you want to see how it works.

[Important note: if you were at my talk and wrote down the URL for the slides, i had it wrong. The correct URL is:

http://semanticbible.com/other/talks/2009/bibletech/BK.html

Yes, i know that Cool URIs don't change, which is why i wanted to make this one adjustment before publishing them, so i won't have to change it in the future.]

At some point there should be audio from the talk posted on the BibleTech site (probably on the BibleTech speakers page, which has links to talks from last year and audio where available). Future Blogos posts on the Bible Knowledgebase will go in my WordPress category of that name (RSS feed here), and will also be tagged with bk if you want to follow along.