A Python Interface for api.Biblia.com

Last week Logos announced a public API for their new website, Biblia.com, at BibleTech. Of course, i want to wave the flag for my employer. But i’m also interested as somebody who’s dabbled in Bible web services in the past, most notably the excellent ESV Bible web service (many aspects of which are mirrored in the Biblia API: some previous posts around this can be found here at Blogos in the Web Services category). Dabblers like me often face a perennial problem: the translations people most want to read are typically not the most accessible via API, or have various other limitations.

So i’m happy with the other announcement from BibleTech last week: Logos is making the Lexham English Bible available under very generous terms (details here). The LEB is in the family of “essentially literal” translations, which makes it a good choice for tasks where the precise wording matters. And the LEB is available through the API (unlike most other versions you’re likely to want, at least until we resolve some other licensing issues).

I don’t want to do a review of the entire API here (and it will probably continue to evolve). But here are a couple of things about it that excite me:

  • The most obvious one is the ability to retrieve Bible text given a reference (the content service). Of the currently available Bible versions, the LEB is the one that interests me the most here (i hope we’ll have others in the future).
  • Another exciting aspect for me is the tag service. You provide text which may include Bible references: the service identifies any references embedded in it, and then inserts hyperlinks for them to enrich the text. So this is like RefTagger on demand (not just embedded in your website template). You can also supply a URL and tag the text that’s retrieved from it. One caveat with this latter functionality: if you want to run this on HTML, you should plan to do some pre-processing first, rather than treating it all as one big string. Otherwise random things (like “XHTML 1.0” in a DOCTYPE declaration) wind up getting tagged in strange ways (like <a href="http://ref.ly/Mal1">ML 1.0</a>).

I’ve just started working through the Biblia API today, but since i’m a Pythonista, developing a Python interface seemed like the way to go. This is still very much a work in progress, but you can download the code from this zip file and give it a whirl. Caveats abound:

  • I’ve only implemented three of the services so far: content() (retrieves Bible content for a reference), find() (lists available Bibles and metadata), and tag() (finds references in  text and enhances it with hyperlinks). And even with these three services, i haven’t supported all the parameters (maybe i will, maybe i won’t).
  • This is my first stab at creating a Python interface to an API, so there may be many stylistic shortcomings.
  • Testing has also gotten very little attention, and bugs doubtless remain.

If you’re interested and want to play along, let me know: we can probably set up a Google group or something for those who want to improve this code further.

Bookmarklets Redux

Time spent on the web can be oh-so tedious if you’re constantly cutting things from one page and pasting them elsewhere just to get to another, related page. Someday Linked Data may make this all better, but until then, we all get by with helpful tricks.

Bookmarklets are one essential weapon in the arsenal of the web-info-warrior. Usually they’re little JavaScript programs stored as a bookmark in Firefox, providing one-click access to some simple functionality like looking things up elsewhere, resizing your window, etc. I’ve blogged previously about bookmarklets to find local library sources for a book on an Amazon page (or PaperBackSwap).

I dusted off my bookmarklet skills this past week and came up with some nifty tools that i wanted to share.

First off, imagine you’re looking at a website with Bible references whose benighted author somehow failed to include RefTagger. So rather than a nice pop-up with the text of the reference, or even a helpful link to that text on some Bible site, you’re just looking at a inanimate, unlinked string: boo. The Bible Reference Bookmarklet to the rescue! Simply select the text of the reference, click the bookmarklet, and you’ll be whisked off to that reference at Bible.Logos.com. If you haven’t selected any text first, you get a dialogue box asking for it.

To get this goodie in Firefox, first make sure the Bookmarks Toolbar is showing (View > Toolsbars > Bookmarks Toolbar must be checked). I’d love to give you a link to just drag onto the toolbar, but i don’t seem to be able to get the code past WordPress. So go to Bookmarks > Organize Bookmarks, and select Organize > New Bookmark. Give it a useful name like “Bible Reference Lookup”, and paste the code below in Location field.

javascript:(function(){%20function%20getSearchString%20(promptString)%20{%20s%20=%20null;
if%20(document.selection%20&&%20document.selection.createRange)%20{%20s%20=document.selection.createRange().text;%20}%20
else%20if%20(document.getSelection)%20{%20s=%20document.getSelection();%20}%20
if%20(!%20(s%20&&%20s.length))%20{%20s%20=prompt(promptString,'');%20}
%20return%20s;%20}%20searchString%20=%20getSearchString('Bible%20Reference%20to%20look%20up%20:');%20
if%20(searchString%20!=%20null)%20{%20if(searchString.length)%20{%20location%20='http://bible.logos.com/#ref='+escape(searchString);%20}%20
else%20{%20location%20='http://bible.logos.com/';%20}%20}%20%20})();

After you’ve clicked ok, you should see it on your toolbar.

You can do similar tricks for a wide variety of strings that you just want to look up elsewhere (i discovered one here while writing this post that lets you look up articles on Wikipedia). This isn’t fundamentally different from copying the string into a search box: but sometimes it’s more convenient.

Descending into more esoteric purposes (to give you ideas for your own bookmarklets): as part of an earlier post on Tools for Personal Knowledge Management, i mentioned my use of TiddlyWiki for quick organization of hyperlinked notes. Like other wiki software, TiddlyWiki has its own link syntax, that looks like

[[Link text | URL]]

When linking to lots of other web pages, i was getting tired of copying the URL, pasting that in, then typing the square brackets, link text, vertical bar, and more square brackets, all in the right format. Wouldn’t it be more convenient to just construct this expression from the title of the page and its URL, rather than having to type it myself? YES! and the TiddlyWiki Page Link bookmarklet does just that, putting the result in a little pop-up window where a triple-click selects the whole thing, ready to copy and paste into your tiddlywiki (and tailor as desired: the title isn’t always what you want, but it’s often easier to edit and throw things out rather than type afresh). This one you can just drag to your bookmarks toolbar and use right away.

TiddlyWiki Page Link

Also, i’ve switched to a much better library lookup bookmarklet (and a service to help you create one for your local library) from WorldCat. Among other things, it generates the list of all the different ISBNs that might exist for a title (which can be very long indeed), and when there are many, it provides links for alternate searches in case the first group comes up empty handed.

Some other cool bookmarklets in my collection include:

  • CiteULike Popup Post and kin to make it easy to add (certain kinds of) articles to your reading list management. Adds more value for sources whose structure it understands.
  • Show del.icio.us citations of the current URL (you can find it there)
  • Resize your browser window to 1024 x 768 (if you want to see how a page will look on a smaller monitor or projector): the bookmarklet follows, just drag to your toolbar. 1024 x 768
  • A CSS validator for the current page: see Pete Freitag’s page.

Hat tips:

http://ref.ly for Bible References

My colleagues at Logos have launched http://ref.ly, a URL shortening service for Bible references: see this blog post. It provides the convenience of TinyURL (turning long unreadable URLs into something much more manageable), but unlike that service also provides readable, understandable content. Once you get past the prefix, you won’t have any trouble figuring out what verse http://ref.ly/Mk4.9 is referring to.

If you’re a Twitter person trying to shoehorn your message into 140-character tweets, you’ll like the fact that this gives you a brief and unambiguous way to both specify a Bible reference and link to the content behind it (the references resolve to the actual verse text at bible.logos.com). Since addressability matters, this is a good thing.

But it has precisely the same utility even if you’re not a Twitterhead (i’m not):

  • it clearly marks a string of characters as a Bible reference
  • it also normalizes the reference into a form that can be automatically processed

While it’s not quite a microformat, it’s really only a small step away from things like bibleref. In particular, if lots of people start using ref.ly references, it will be possible to process that content and understand things like what verses are most popular.

What’s more, editors that recognize and automatically link URLs (like MS Outlook for HTML-based email, and MS Word) will now automatically make Bible links for you (like RefTagger does for blog posts), as long as you’re willing to tack on “http://ref.ly/” and live with the slightly non-traditional format. You don’t need to know anything about how to make a hyperlink in HTML: just a little extra syntax (14 characters, to be precise) moves these references toward much greater usefulness.

The Most Important Verses? It Depends What You Mean

The title of this post is a deliberate take-off from a recent post at OpenBible.info entitled “What Are the Most Popular Verses in the Bible? It Depends Whom You Ask”. That post combines data from an earlier ESV analysis of search results, TopVerses.com, a BibleGateway (internal) study, and OpenBible data to present a list of 278 verses, all of which occur in the top hundred of at least one source’s “top 100” list. It’s interesting to see both how much disparity there is (only 13% occur in at least three of the four lists), but also how uneven the distribution is. As one commenter points out, it’s somewhat surprising that there are no verses from Revelation, and Old Testament narrative in particular is largely absent except for Genesis. John’s gospel has about as many popular verses as all the other gospels combined: there are only four verses from Mark (two of them from the often-questioned ending). Less surprisingly, perhaps, there are none from the shortest NT books (Philemon, Jude, 2-3 John). Altogether it’s an interesting study.

The larger question this raises for me is how we might come up with a comprehensive, global score for verses to indicate their importance for a variety of purposes. As the OpenBible post suggests, this depends both on what the source of the data is, but also on what your purpose is and what you mean by “important” (which is certainly different from “popular”, though not completely unrelated).

One useful purpose is ranking verses to present them in response to searches: TopVerses.com is explicitly organized this way, as indicated in this news article about the site. They don’t go into much detail about how they gathered their data, though the scope (37M references scoured from the web) is impressive. But there’s a subtle disparity here: their data is based on counting mentions (citations) in published web pages, but their use case is prioritizing search results, and these may be out of sync. The fact that a given verse is frequently published on the web doesn’t necessarily mean it’s the one you want at the top of the list when you’re doing a word-based search, for example. The other three sources seem perhaps better matched to ranking search results, since they’re derived from searches themselves.

Another key hitch is these endeavors is how to handle range references, both in processing source data and (for search purposes) in handling queries. For example, many Bible dictionaries frequently reference ranges of verses, sometimes extensive, multi-chapter ones. If you’re going to count these, you need to think carefully about how you do the counting so you don’t introduce bias (or, better, you select the bias that’s best suited to your purposes).

For example, in the TopVerses.com ranking John 3.1 is #26, despite the rather plain descriptive content with little obvious spiritual impact.

Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council. (John 3.1, NIV)

While i can’t be sure, i strongly suspect this high rank is an unintended consequence of  dis-aggregating ranges and whole chapter references from John 3. In fact, scanning top verses by chapter from John, the first verse in each chapter is very often the highest or second-highest ranked, and near always among the top ten. This probably says more about the counting methodology than the significance of those verses in particular. The Bible Gateway study focuses on ranges of no more than three verses to explicit mitigate this problem.

Other Measures of Importance

Moving from popularity to importance, i can imagine several different factors that might be combined to produce a more general importance score:

  • citation frequency (based on some corpus). In the TopVerses.com approach, these are web pages, which provides a very large set of observations. A number of other digital text collections would also suit this purpose, and even allow segmentation by genre: for example, you get a very different ranking from the Anchor Yale Bible Dictionary compared to Easton’s (and neither have John 3.16 at the top of the list). See below for more about this.
  • search frequency, the basis for the other three sources in the OpenBible.info post. This could be refined further given data on follow-up activities. For example, depending on your application, verses searches whose results are then expanded into a chapter view or followed to the next verse might get a boost compared to those with no further action (this seems like a variant of “click through” rates used in search engine advertising)
  • content analysis (context-independent): this could have several different flavors.
    • word count: though John 11:35 gets mentioned more than you’d expect precisely because it’s the shortest verse in the (English) Bible, in general longer verses are more likely to be important. This could be refined further given a metric for important words (but now we’ve introduced a new problem: where does that data come from?), which could be used for weighting the counts.
    • We could do even better if, instead of counting words, we count concepts (and weight them). Assuming we think the concept of HUMILITY is important, we’d want verses expressing that concept to rank more highly, regardless of whether they used a more common word like “humilty”, or a less common one like “lowly”. Converting words to concepts is a difficult challenge, however.
    • Connections to other data also affect importance. In some sense, every verse that reports words of Jesus is probably more important to a Christian than one whose importance is otherwise comparable, which is why we have the convention of printing Bibles with the words of Christ in red (a binary system for visualizing importance).
    • We might even consider negative factors: a lower rank for unfamiliar, hard-to-pronounce names, or “taboo” words.

Unlike TopVerses.com, i don’t see a particular need to provide a unique rank for each verse. If each verse has a score (to simplify the math, a decimal between 0 and 1 is a common approach), you can simply pick the top n verses that fit your purpose, and then order any ties canonically.

Comparing Dictionary Reference Citations

I did a small experiment to compare the most frequent reference citations in seven Bible dictionaries that are incorporate in Logos’s software (so this is citation frequency, not search frequency). I extracted and counted all the references, and then aggregated the counts across all seven: the top 20 references are shown below, along with how many “votes” they received in the OpenBible.info list. In the case of whole chapter references (four of the top ten), i’ve indicated with yes/no whether any verse from that chapter occurs in the OpenBible list.

There’s relatively little overlap between the two lists: only seven of these are in the OpenBible list. Many of these make sense given the different purposes of reference works: for example, Is 61.1 is a key messianic text. The high rank for 2 Ki 15.29 is initially puzzling, but probably results from being commonly cited in discussions of the conquests of Tiglath-Pileser and the Babylonian exile. Overall, this is probably much too small a sample to show the correspondences: i presume we’d find much more overlap in the top few hundred.

Reference Aggregate Count Count In
OpenBible List
Jn 1:14 169.5 1
2 Ki 15:29 165.2 0
Is 61:1 159.8 0
Ac 1:13 151.7 0
Ge 1 150.0 yes
Ac 15 143.0 no
Ge 2:7 142.3 no
Ge 46:21 139.3 no
Jn 3:16 137.8 4
Ge 1:26 135.2 3
Is 7:14 134.3 1
Mt 28:19 130.2 3
Da 7:13 130.0 0
Ps 2:7 129.8 0
1 Pe 2:9 126.3 0
Ac 20:4 124.3 0
Lk 3:1 123.8 0
Mk 10:45 123.7 0
1 Sa 1:1 121.5 0
Ac 1:8 120.8 3

Details:

Conclusions

None of this is meant as criticism of the particular sites mentioned above. I strongly believe that any user-oriented, empirically-based data set is better than nothing, and in most endeavors like this, “the best data is more data”. * But with more data comes more complexity, and i’ve only scratched the surface here in considering some of the different factors.

The key point is this: if we want to measure something, we need to be clear up front about exactly what it is, and also what purpose we hope it will serve. I never stop being amazed at how often “obvious” approaches to data problems produce surprising results.


* In my recollection, this quote is attributed to Bob Mercer, a leading researcher in statistical language processing who was part of the IBM research group in the 1990s. I haven’t been able to verify a real source, however.