BibleTech talk: Automatically Learning Topical Content

I’ve posted the slides from my BibleTech 2013 talk. Here’s the abstract:

Continued work on the Logos Controlled Vocabulary (BibleTech 2010, “A Controlled Vocabulary for Biblical Studies”) has produced a unique collection of topic-aligned content across more than 50 different Bible dictionaries, encyclopedias, and topical indexes in both English and Spanish. This presentation will describe the information we’re learning automatically from this content, including:

  • determining concept importance
  • associating concepts with Bible references
  • extracting and associating names and descriptive terms for concepts
  • relating concepts to each other

You can see the other talks at the BibleTech website. I’ve had a number of positive comments on the talk, which is always gratifying. Slowly but surely, we’re climbing up the data stack …


3/21/2013 update: I’ve used the Slidy framework for presentations for several years because i like the way it puts the whole content out on the web in HTML. However, @JohnRGentry pointed out that my slides don’t work on the iPad because moving forward and backward requires keys. There’s a newer version of Slidy which does support swiping to move through slides, though my experience with it on both Safari and Chrome on iOS hasn’t been great: it’s not easy to register a swipe, and title text sometimes gets lost. I assume these are issues with javascript support on iOS, though i’m not really sure. I’ll try to update my slides to the newest version of Slidy, which will help a little, but i’ll also look for another framework that’s more tablet friendly.

BibleTech Talk Slides: Using the Bible Knowledgebase For Information Integration

Finally got my slides posted from BibleTech:2011 on Using the Bible Knowledgebase for Information Integration. Since i listened to good advice and went a little more toward graphics than bullet points, they’re not completely self-explanatory (but that’s why you should have come, right?).

Audio will show up too at some point, probably at http://www.bibletechconference.com/speakers.

As i’ve told a few of my colleagues since: giving the talk helped convince me even more strongly that Biblical Events will be a really important database for Bible study. Looking forward to getting it all put together.

Aaron Marshall: User Adoption Strategy

It really changes things when the CEO gets on board with digital literacy. Book by Michael Sampson User Adoption Strategies.

No adoption = no value: you have to plan for adoption. Rogers Bell Curve: perceived utility and ease-of-use matter a lot, which comes back to design. Tip: establish a glossary. “It’s really hard to sit behind someone using your software and not tell them what to do”. “Ideas are cheap, but they still feel like my heart.” “Analytics is the one area I’ve neglected most.” Everything BIG started small. Progressive disclosure: give people a slow introduction to features, don’t overwhelm them up front.

Some interesting sites for augmented reality:

  • stickybits: attach comments to physical objects with barcodes.
  • Greengoose.com: temperature/sound/vibration sensors. Instrumentation of everything.
  • GE smart grid
  • Layar: find people who tweeted nearby, wikipedia articles. You can create your own.

Steven Cummings: Bringing the Power of Search to Mobile

The mobile revolutions means the goal of software now must be to reach the user wherever they are.

BibleReader 5 is their application: showed it on iPad. Originally used EverNote for note synchronization, but wasn’t a good fit. Resource Guide is a new additional to pull in everything in their library that relates to the passage you’re reading. Other library integration around lexicon entries.

Aaron Linne: the Road to MyStudyBible.com

Core goals:

  • create an environment for studying the Bible
  • maintain feature set of bible.lifeway.com
  • raise awareness of the HCSB
  • sampling strategy for HCSB study notes

Bad timing: started with Silverlight, then moved to HTML 4 (with a little flash). “We get more compliments over how we presented our Strong’s data … ” Development from features to community and awareness. Using MSB.to for URL shortening. Windows Phone 7 is more valuable because of its connection to Xbox (in response to a tweet from @BobPritchett). My Notes tab is coming soon.

Questions:

  • Feedback mechanisms? “we consider our feedback link our most important feature”
  • Other feedback channels?

Aaron is @linne on Twitter.

Jim Albright: Publishing using CSS

Given the explosion of devices, how can we write once and then publish to all the different formats and devices? Jim’s current assignment: figure out how to go from the basic translation programs to other formats using markup languages.

Some deeper details about how to use CSS for specific formatting purposes.

BibleTech 2011

I had to miss the first day because of another commitment, but today i’m here at BibleTech:2011 and looking forward to a great day of talks. Hopefully mine will be one of them: here’s my abstract.

Using the Bible Knowledgebase for Information Integration

In 2009 I reported on the Bible Knowledgebase (BK), a machine-readable collection of semantically-organized data about people, places, and things in the Bible. This talk will describe how the BK now functions as an essential information resource for Logos, tying together information across the software. In addition, I’ll discuss the continued work on the data over the last two years, including:

  • building a database of Biblical Events
  • adding unnamed entities to the database
  • coordinating information about these entities with the Logos Controlled Vocabulary

I’ll also present prototypes for visualizing BK data to enhance discovery and exploration in the Biblical text.

I’ll be live-blogging a few talks during the day to give a quick-take on the subject for those who can’t be here. You can also follow on Twitter via #BibleTech.

A Python Interface for api.Biblia.com

Last week Logos announced a public API for their new website, Biblia.com, at BibleTech. Of course, i want to wave the flag for my employer. But i’m also interested as somebody who’s dabbled in Bible web services in the past, most notably the excellent ESV Bible web service (many aspects of which are mirrored in the Biblia API: some previous posts around this can be found here at Blogos in the Web Services category). Dabblers like me often face a perennial problem: the translations people most want to read are typically not the most accessible via API, or have various other limitations.

So i’m happy with the other announcement from BibleTech last week: Logos is making the Lexham English Bible available under very generous terms (details here). The LEB is in the family of “essentially literal” translations, which makes it a good choice for tasks where the precise wording matters. And the LEB is available through the API (unlike most other versions you’re likely to want, at least until we resolve some other licensing issues).

I don’t want to do a review of the entire API here (and it will probably continue to evolve). But here are a couple of things about it that excite me:

  • The most obvious one is the ability to retrieve Bible text given a reference (the content service). Of the currently available Bible versions, the LEB is the one that interests me the most here (i hope we’ll have others in the future).
  • Another exciting aspect for me is the tag service. You provide text which may include Bible references: the service identifies any references embedded in it, and then inserts hyperlinks for them to enrich the text. So this is like RefTagger on demand (not just embedded in your website template). You can also supply a URL and tag the text that’s retrieved from it. One caveat with this latter functionality: if you want to run this on HTML, you should plan to do some pre-processing first, rather than treating it all as one big string. Otherwise random things (like “XHTML 1.0” in a DOCTYPE declaration) wind up getting tagged in strange ways (like <a href="http://ref.ly/Mal1">ML 1.0</a>).

I’ve just started working through the Biblia API today, but since i’m a Pythonista, developing a Python interface seemed like the way to go. This is still very much a work in progress, but you can download the code from this zip file and give it a whirl. Caveats abound:

  • I’ve only implemented three of the services so far: content() (retrieves Bible content for a reference), find() (lists available Bibles and metadata), and tag() (finds references in  text and enhances it with hyperlinks). And even with these three services, i haven’t supported all the parameters (maybe i will, maybe i won’t).
  • This is my first stab at creating a Python interface to an API, so there may be many stylistic shortcomings.
  • Testing has also gotten very little attention, and bugs doubtless remain.

If you’re interested and want to play along, let me know: we can probably set up a Google group or something for those who want to improve this code further.

BibleTech:2010 Debrief

The BibleTech conference is an annual highlight for those of us who work at the intersection of Bible stuff and technology, and last week’s meeting in San Jose was no exception. This was the third BibleTech — i’ve been fortunate to have attended (and presented at) them all — and there’s always a great mix of new ideas, updates on ongoing projects, and lots of interesting people to talk to. (some other reviews: Rick Brannan, Mike Aubrey, Trey Gourley)

Some of the talks i liked best this year:

  • I was already interested in Pinax before hearing James Tauber’s talk on Using Django and Pinax for Collaborative Linguistics: now i’m itching to get started!
  • Stephen Smith had a nice analysis of the most frequently tweeted Bible passages (though the evidence of vast swaths of Scripture that get very little attention was perhaps a bit depressing).
  • Neil Rees showed Concordance Builder, a program that lets you use a Swahili concordance to bootstrap one for Welsh (or any other pair of languages) with no linguistic knowledge. Building on the Paratext tool, it leverages the verse indexes along with approximate string matching and statistical glossing (technical paper by J D Riding) to produce results that are about 90-95% correct out of the book. This can reduce concordance development to a matter of weeks rather than years.
  • There were several talks related to semantics in addition to mine: Randall Tan talked about more automated methods and fleshed them out relative to the higher-level structure of Galatians, and Andi Wu gave what looked like a really interesting presentation on semantic search based on syntax and cross-language correspondence (alas, i missed it).
  • Weston Ruter talked about APIs they’re developing at OpenScriptures.org (and brought in the Linked Data idea). Logos also unveiled their new API for Biblia.

I felt my talks went well and i got some good feedback. My slides are now posted (if you wrote down URLs at the conference, i didn’t get them quite right 🙁 but here they’re correct):

(As with some previous talks, i did my presentation with Slidy (previous post): i feel like it’s going a little more smoothly each time.)

BibleTech:2010 Talk – The Logos Controlled Vocabulary

The program for BibleTech:2010 has been up for a couple of weeks now, and i’ve been delinquent in failing to point that out. We’ve got a full roster of really interesting talks that span the gamut from friendly warm technology to hard-core geekishness: Bible translation, social media, Biblical linguistics, mobile computing, preaching, publishing, tweeting, and more. And this year, it’s in San Jose, CA: i’m hoping that will open up attendance to some folks who have the misfortune to not live in the beautiful Pacific NW. The dates are March 26-27, 2010.

I’ll be giving two talks this year: here’s my abstract for the first one, on the Libronix Logos Controlled Vocabulary.


Dozens of books provide terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other subject-oriented reference works. However, the terminology used varies between books, authors, and publishers, and doesn’t always include all the terms a user might employ to find information.

The Libronix Logos Controlled Vocabulary (LCV) organizes content from multiple Bible dictionaries to integrate information across the Logos library. As a controlled vocabulary, the LCV identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV (shipping now with Logos 4) comprises some 11,100 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future. This includes some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

Update: we’ve decided in general to retire the “Libronix” name for Logos technologies, so i’m trying to get on board by starting to call this the Logos Controlled Vocabulary.