God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
November 26th, 2007

Amazon Kindle: a Bible Study Platform?

Gadget lovers have doubtless already heard that Amazon recently released Kindle, their new e-book technology. Reviews on their site (like reviews in general) tend to be somewhat skewed between those who love it and those who really don’t. I’m interested in a different question, though: what might Kindle mean for the future of digital Bible study?

In general, people tend to like:

  • convenience (you can download new books wirelessly, no cables, long battery life)
  • ease of use (good design, more readable than traditional monitors and PDAs)

Things they don’t like so much:

  • #1 seems to be the cost: $400 is pretty expensive if your main objective is to read best-sellers. Many of the enthusiastic reviews on Amazon’s site are from beta-testers who were given devices to try: even among those who loved it, however, it’s telling that some said they wouldn’t buy one once they had to return theirs, and cost was the main reason. On the other hand, the incremental cost for books ($10 for best sellers, less for others) doesn’t seem high to me compared to the paper versions. Some people find it galling to have to pay again for books they already own in print, though i don’t see any easy way across that particular digital divide (i still have lots of music that i love only on cassettes because i was too cheap to buy it again on CD).
  • Digital rights management: you’re really buying access to books. So you can’t copy them to other formats, or give them to anybody else.
  • It’s not a completely open platform: some fussing about is required for PDF and other file formats, apparently you have to pay to have content emailed to your Kindle.
  • The wireless coverage is still quite limited, which means the US heartland and other rural areas are mostly out of luck.

But what about Kindle as a Bible study device: would it work, and how might it compare to Logos Bible Software? These seem like the relevant features:

  • There’s no way to beat the convenience of having a library in your pocket (also one of the main selling points for Logos), even more so when you can bookmark pages, write notes on passages, etc.
  • Kindle provides word-based search of your whole Kindle library, another unbeatable feature of digital resources over print. I don’t have a Kindle to try out (but i’d be glad to review it, Amazon, hint hint): but what we’ve learned from a decade of web search is that word-based approaches only get you so far. I’d be interested to know what additional search capabilities it provides. For example, as your library grows, can you search only a subset? How flexible is the search syntax: wildcards? data-type specific searches?
  • Kindle currently provides hypertext links to other resources like a dictionary and Wikipedia. So it’s not hard to imagine providing links to other resources as well.
  • Will third-party vendors be able to provide books in Kindle’s format? In particular, will they be able to enrich them with their own hypertext markup? That’s where these digital formats really shine. As a personal user of Logos software, the ability to hover over a Scripture reference and get the text in a popup has become second nature: now i find myself putting my finger on footnotes and cross-references in print books, waiting for the popup (just kidding, but wouldn’t it be great?).
  • While there are quite a few books by well-known Christian authors (Max Lucado, Rick Warren, etc.), the collection of Bibles is quite small: KJV, NIV, TNIV. Likewise, there are relatively few Bible study resources. Maybe this will change over time, and it may say something about how little reading most Christians do.

Bottom line: i don’t see Kindle today as any kind of competitor for Bible study software, when so many more specialized resources are available. But it will be interesting to see if it succeeds, and to see how this market changes over time. Certainly the future of reading has to include e-books: while paper will never go away, the advantages of digital resources are simply overwhelming.

November 26th, 2007

Slides from SBL Talks

The Society for Biblical Literature meeting is now over, and we enjoyed a wonderful Thanksgiving weekend following it with our kids in the Poconos. So i’ve been relaxing rather than minding the blog, and have only now posted the slides from my two talks at my presentations page at SemanticBible.

Both talks were well-received, and i especially enjoyed the opportunity to work and present with Steve, and to hang out with some of my Logos colleagues that i don’t see around the office much.

November 20th, 2007

Real-time Information and Air Travel Planning

Today Donna and i flew back to Pennsylvania, after the SBL Conference in San Diego (i hope to post the slides from my talks soon!), to spend Thanksgiving with our kids, all of whom live on the East Coast. This is a challenging time of year for air travel, but i was really dismayed by the experience we had this morning on US Airways.

Before boarding passengers onto the plane, the agents in the terminal announced that the flight was overbooked (as often occurs), and they were soliciting passengers to voluntarily give up their seats. Then they boarded the flight, and everybody got settled in: the usual routine. So we’re all sitting on the plane, seat belts fastened low and tight across our hips and luggage securely stowed, and it’s time to go: now an agent comes on board and say “Folks, i’m begging you: we need nine more people to volunteer to give up their seats. This is a direct flight across the country so we need lots of fuel, people have lots of luggage, so the plane is heavy, and San Diego has a short runway: so we’re going to have to sit here until we get enough volunteers.”

Next thing, a bunch of people dutifully get up and go forward (incentivized by the promise of a free ticket). While the flight was nearly full before (though we were lucky enough to have an empty seat next to us, that was the only one i could see prior to the exodus), now some seats are open, so the flight attendants helpfully re-situate some people: the 6′-6″ guy stuck in the tiny window seat now gets an aisle, the family in the very back row with a small child moves forward to a row that’s become vacant, etc.

More time passes – what’s the delay? – and lo and behold, 7 or 8 of the people who had gotten off the plane now get back on. Apparently the airline got more volunteers than they needed, so these poor folks who had already retrieved their carry-on luggage and trudged off the plane are back again, looking for seats. But of course, the seats they only recently left are now occupied by somebody else! So now there’s a couple wandering up and down looking for two seats together, the 6′-6″ guy goes back to his tiny seat, the family moves to the back of the plane again, and so forth. The attendants in the back are encouraging the returnees to just take any available seat, and then the attendant in the front announces over the loudspeaker than everyone should return to their originally assigned seats (and the attendants in the back groan). More chaos and shuffling of bodies and luggage ensues. When we finally take off, it’s a good half hour later than our scheduled departure, the people who volunteered and then got turned back are dejected, the people who moved only to have their new seats taken away from then are ticked off, and we’re all just shaking our heads (or worse).

What made this all such a mess (and here’s the point of my post) is that they apparently didn’t use the information they already had in a timely manner to avoid all this chaos. In the terminal, they never quantified how many volunteers they needed. I’m hard pressed to believe that this need only appeared after loading the plane: were the 200 or so passengers so much heavier on average that all of a sudden they required an additional 5% reduction in the number of passengers?!? Did the winds suddenly change direction, requiring a drastic change in the fuel requirements? What’s the point in loading a passenger on the plane, only to them cajole him or her into giving up their seat 20 minutes later? I can see this for one or maybe two seats, but nine?!? If they had just communicated clearly (with the appropriate incentives) before loading the plane, most of this could have been avoided.

That was only the first information failure: the second was that anybody (who wasn’t missing fingers) could easily tally up how many people got off the plane as volunteers, and it was considerably more than nine. I don’t begrudge them trying for a free ticket: but after #10, why didn’t the airline personnel say “thanks, we’ve got our volunteers” and stop there? (You might also question whether the attendants should have moved people around if there was a chance others might come back.)

The point of this rant, if there is one: the purpose of information is to help you make better decisions. You’re far better off using the information you have now to prevent a problem, rather than allowing a problem to develop and then trying to fix it later. You can sometimes be excused for not having the information you need to prevent a problem (though to be effective, you have a responsibility to proactively determine what information you need and get it). But there’s no excuse for having information that tells you there’s a problem, and then just ignoring it.

November 15th, 2007

Upcoming Talks at the Society for Biblical Literature Meeting

This weekend is the annual meeting for the Society for Biblical Literature, held this year in San Diego and preceded as usual by the Evangelical Theological Society’s (ETS) National Conference. ETS/SBL is a significant annual event for Logos, both for marketing our product but also for presenting scholarly research on a variety of topics. Rick posted previously about several ETS talks by Logos folks (and it’s not too late to catch a couple of them if you’re there).

At SBL this year, i’ll be giving two talks:

“So, Brothers”: Pauline Use of the Vocative
Biblical Greek Language and Linguistics
Saturday (11/17), 9:00 AM to 11:30 AM
Room: Betsy A – GH
Abstract: Use of the vocative by New Testament writers represents a pragmatic choice, yet there is little understanding of what motivates its use, or of its exegetical value. Most descriptions cast it as a structural marker of discourse units, corresponding to paragraph boundaries. However, many vocatives in the Greek New Testament text occur within paragraphs, calling the traditional account into question. This paper will review previous work on vocative use in the Greek New Testament, and briefly describe its discourse function based on its similarity to pragmatic markers in other languages. Representative examples from the Pauline corpus will be examined to demonstrate the exegetical value of careful attention to vocative use.
[This is joint work with my colleague Dr. Steven Runge, Logos scholar-in-residence.]
Integrating Greek and English Digital Resources
Computer Assisted Research Group (CARG)
Monday (11/19), 4:00 PM to 5:30 PM
Room: 22 – CC
Abstract: Common corpora and data standards help advance research, by providing a shared focus for researchers and elevating the baseline from which investigation begins. Several digital resources specifically designed for GNT study (for example, the Louw-Nida lexicon, the OpenText.org Syntactically Analyzed Greek New Testament, and the ESV English-Greek reverse interlinear) can be usefully integrated with other English resources and corpora to provide benefit to English-speaking Bible students as well. Examples include WordNet (a semantic lexicon), PropBank (a corpus annotated with verbal propositions and their arguments), and FrameNet (corpus-supported semantic “frames” for concepts). The resulting hybrids may also be more broadly useful for the study of other Hellenistic corpora, and may point toward the development of new resources for Biblical scholarship. This paper will briefly overview some of these digital resources, and describe ongoing work at Logos Research Systems to integrate them. The paper will also propose several practical steps to facilitate greater integration of resources from the community of biblical scholars with those from the computational linguistics community.

I’m excited about my collaboration with Steve on the first talk, which (without giving too much away in advance) allowed for a nice hybrid of data-driven syntax searching and discourse analysis of vocative function.

Note that, while the abstract for my CARG talk is worded rather broadly (these got submitted quite a while ago), the real focus will be on Louw-Nida and WordNet integration. In particular, i don’t expect to have much to say about PropBank and FrameNet.

Steve will also be giving an additional talk:

Quotation of Isaiah 6:9-10 in the New Testament: With Emphasis on the Quotation in Matthew 13:14-15
Greek Bible
11/17/2007, 1:00 PM to 3:30 PM
Room: Torrey 1 – MM
Abstract: Matthew’s quotation of Isa 6:9-10 in Mt 13:14-15 betrays some intriguing elements which are different from his quotations in other parts. First, though usually deviating from the LXX for the so-called fulfillment quotations, he adopts the verbatim of the LXX in Mt 13:14-15. Second, the fulfillment introductory formula in this passage is different from the one frequently used in the Gospel; it is modified by omitting the conjunction i;na (in order that) which clearly points to the purpose of the quoted OT passage. Finally, this is the only instance in which the fulfillment quotation is presented not as coming from his hands but from Jesus’ mouth. Why did Matthew decide to deviate the common way to quote the OT? This study purports to examine this problem by looking at the context of Mt 13:14-15. The study will also compare the quotation of Isa 6:9-10 in Matthew with that in other NT books, especially in Mark 4:12 and Jn 12:40 where the verbatim departs from the LXX. The study will investigate how Matthew, different from other Gospel writers, quotes Isa 6:9-10 according to his own purpose.

I’ve given talks at the two previous meetings, which i particularly enjoyed since i was then very much an amateur and newcomer to the field of academic Biblical studies (not that my employment at Logos makes me a professional scholar!). I still think they represent good work that i hope to extend further.

November 14th, 2007

Fun with XML Parsing in Python

Let’s suppose for a minute that …

  1. You’re a Pythonista wanting to process the content in some really big XML documents (10s of Mb or more)
  2. Most of the content isn’t important, so you can just drop it, but …
  3. There are ‘islands’ of XML content that are both the heart of your task, and structurally complex

(Consider that introduction fair warning that there’s heavy programmer talk ahead …)

If you’ve done XML parsing before, #1 and #2 will probably make you think of SAX, the Simple API for XML. Since reading an XML document into memory can take up to 10x the size of the original document, the alternative approach to SAX (DOM processing) isn’t feasible for really big files. And using SAX would be okay, but …

#3 should make you think of XPath, since that’s a much clearer (and more declarative) way to express the semantics of what you want out of that complex XML. However, XPath processing requires you to have the full fragment of interest in memory.

What you’d really like as a general approach is to process the document with SAX until you find an island of interest, then capture that whole fragment in memory so you can do something structural with it. After that, you can go back to SAX parsing again until another interesting island arises.

How do you get the benefits of both approaches?

Well, if you’re like me, you

  • scratch your head a bit
  • think through, and perhaps try, some home-brewed approaches
  • get frustrated that they’re not general enough
  • do some more head-scratching
  • think to yourself “somebody, somewhere, must have already solved this!”
  • spend some time looking around
  • finally find a Better Way (or two)

The first Better Way is Python’s standard pulldom module (Uche Ogbuji has a nice illustration of its usage here). But, as he himself points out here, the pulldom approach doesn’t offer much to aid the clarity of your code.

The Even Better Way, particularly if you’re already using 4Suite‘s XML tools (and you should really consider it if you’re not), is to use their Sax.DomBuilder() method. That’s mentioned in their documentation, but with only a single sentence. Here’s a little more detail on how you might use such an approach. I assume you already know how SAX parsing works (there are plenty of other resources out there if you don’t: you might start with this recipe).

Sax.DomBuilder() is meant to essentially mirror the structure of a normal SAX processor. When the usual SAX events fire (like the start of an element), Sax.Dombuilder() turns these into their corresponding activities (like starting a new child element) to build a fresh document from scratch.
So the conceptual challenge is figuring out how to switch mid-stream from one handler to another. It’s not good enough to set up two handlers, and simply switch at a beginning of an island of interest in the stream of SAX events. For my application (and probably most others), you also have to switch back once you’re done (and perhaps repeat the cycle). That means you need an additional level of control above either of the handlers (the “normal” SAX callbacks and the DomBuilder variant).

If you’re thinking in state-machine terms (SAX tends to do that to you), you might add conditionals or flags to your code. I think the approach below is even more better, though. The standard SAX callbacks are defined with a layer of indirection, so the “real” definition for starting an element is in _startElementNS (note the leading underscore). Then when you find the beginning of the content of interest (in this example, the details element and its children), you simply switch the callback definitions for the DomBuilder ones.

The key reason to use this indirection is so you can detect when you’re done and switch back.

(Download the code.)

If you’ll run this example (which is laced with some print statements to make the execution flow clear), you’ll see that the “normal” callbacks fall silent once the details element is passed. Also, note that you have to test for the entry condition at the beginning of the startElementNS, but at the end of endElementNS, to make sure the right things do (and don’t happen). In this example, i only collect one island of interest and print it out at the end: in a real application, you’ll probably update some data structure as you go.

Note that Uche’s Amara toolkit offers even more capabilities, and i suspect Amara’s Bindery may make this kind of task even more straightforward. But i haven’t crossed that bridge yet.

(If you’re a Perl programmer, this whole story should make you think of XML::Twig. That’s a fair answer, though Twig’s API is a bit non-standard. However, you may already be working in Python for plenty of other good reasons – I’m not trying to start a language war, though, honest! – which reason enough to use this approach.)