God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
July 31st, 2007

On the Importance of Semantic Representation

An excerpt from “A.I. Limericks” by Henry Kautz:

If your thesis is utter vacuous
Use first-order predicate calculus.
With sufficient formality
The sheerist banality
Will be hailed by the critics: “Miraculous!”
If your thesis is quite indefensible
Reach for semantics intensional.
Your committee will stammer
Over Montague grammer
Not admitting it’s incomprehensible.

(Original at http://www.cs.rochester.edu/u/kautz/misc/limericks.html.)

July 31st, 2007

In Praise of Python

In computer science, you have to learn new languages and frameworks on a regular basis, because the field changes so quickly. I’ve learned my share of languages over the years, well over a dozen last time i counted (most of which i don’t use anymore). Perl was the last one that i really invested time in at BBN, and it was my main language for scripting and data processing for the last half dozen years or so (as a manager, this was the only kind of programming i could get away with :-)).

A couple of years ago, i made the decision that my next language would be Python. My reasoning was based on a pretty simple kind of social networking: over and over, the bloggers i read, colleagues i respected, and projects of interest i discovered kept talking about Python. I figured if this many smart people were using it, there must be a good reason. When i started work at Logos this year, i had more time to focus on programming, and finally got to make good on my intention. I haven’t been sorry, and having done a lot of Python coding over the last 6 months (the only way you really learn a language), i’m really loving the language.

Here are some of the things i’m finding that are great about Python:

  • Interactive evaluation: how many times have you embedded print statements inside your Perl code so you can figure out what’s going on? Python lets you short-circuit that because you can evaluate expressions directly, inspect the results, and work on it until you get it right. It has the side-benefit of encouraging modularization of your code, simply because that makes it easier to test interactively. For me, interactive evaluation provides an enormous productivity boost.
  • List processing: I’m not embarassed to admit that i still think Lisp is one of the best languages i’ve ever coded in. When i did a little recreational programming last year, i went back to my Lisp roots, because some problems just cry out for list-based solutions. Perl has lists, lists of lists, etc. but there’s just enough friction in working with them that it always feels hard to me. Python brings back a rich list-based environment with mapping, filtering, and other useful features. Though some people find lambda functions intimidating (the name doesn’t help),they’re a very powerful feature. Python also has set operations (intersection, union, etc.) which are very helpful for data cleanup.
  • Introspection: you can ask the environment what objects it knows about, and you can ask an object what its methods and attributes are. This enables powerful kinds of meta-programming capabilities.
  • Django, a web application framework (like the popular Ruby on Rails) that makes it very easy to build rich, data-driven, web-based systems. I’ve been using Django to build a thesaurus development interface for in-house use, and i strongly recommend it (i hope to start a side project soon using Django for publishing genealogy information).

I’m not trolling for flamewars here (i’ve managed to do so inadvertently in the past), and i still like Perl. But from now on, i’m a Python guy.
Other reading:

  • JoelOnSoftware has a characteristically insightful post about why the question “what’s the best language to use?” isn’t really meaningful. He points out the value of the language’s ecosystem as a key criteria: that was one reason Python made it to the top of my list.
  • I’ve been playing with the Natural Language Toolkit, which is written in Python. It’s still in flux, but has a lot of interesting capabilities, including a good WordNet interface.
July 24th, 2007

BibleTech 2008

If you’re reading this blog, you’re probably going to be interested in BibleTech 2008, January 25-26 in Seattle. Logos is organizing this conference to bring together people who are interested in the intersection of technology and Bible study. This is not focused on Logos’ own software, but on technology and applications for publishing, programming, web site development, markup, blogging, etc. There will be both high- and low-tech tracks, so there will still be lots of interesting and useful information even if you don’t dream in code. You can see a range of topics on the site, and we’re open to suggestions for other topics to cover.

Several speakers are already listed on the website, and i’ll be there along with many of my colleagues. If you have interest and expertise in a relevant technology area, please consider proposing a presentation. It’s the interchange of ideas that makes an event like this worthwhile, and your ideas may be just what somebody else needs to hear (and vice-versa).
(Obligatory disclaimer: i work for Logos because they do cool stuff like this. I guess that means i’m not a disinterested party. Bloggers: please use “bibletech08” as the tag for posts related to the conference.)

July 24th, 2007

Data, Information, Knowledge and Bible Study

Living in the “information age”, we use terms like “data” and “information” all the time, without necessarily being precise in our usage. But it can be helpful to think more carefully about the differences between these terms, and how they impact the way we sift through and apply all the resources that are now available to us.
In this more specific sense, we can distinguish three terms:

observations, measurements, or other facts. Data may be incomplete, redundant or irrelevant.
data that is selected and enriched with structure, context and meaning. “Connecting the dots” describes the process of moving beyond mere data to some larger picture, with larger meaning and significance. Information in this sense implies purpose: what’s information to you may not be information to me, at least at this point in time, if i have no use for it.
Knowledge is information in action, applied toward solving a problem, answering a question, or accomplishing some other objective. Knowledge extends beyond a particular set of data and information, providing predictive power, thereby also extending over time from today to tomorrow. It can be individual (my personal view) or shared across a community.

These elements are often visualized in a hierarchical or pyramid structure, showing how information is built upon data, and knowledge builds further upon information. The pyramid also suggests a quantitative relationship: we need relatively more data to derive a smaller amount information, and similarly more information to derive knowledge. Some would add wisdom, representing knowledge coupled with values or fundamental principles at a higher level, as another layer on top (consequently, this is sometimes called the DIKW model).

Data, Information, Knowledge: a Hierachy

Data moving toward information is more objective: we can look together at the data and talk about what they mean. Information moving into knowledge is more subjective: i can try to communicate my knowledge to you, but that process is complicated by our different contexts and experience, and we may disagree about what conclusions to draw from the facts we agree on.

In the words of Harvard zoology professor Louis Agassiz: “Facts are stupid things, until brought into connection with some general law.” (recast in the terms above, data and knowledge.) (Samuel H. Scudder’s “The Student, the Fish, and Agassiz“, from which this quote is taken, is a classic account of the value of first-hand observation and inductive learning) The hierarchy, and the importance of moving toward the upper levels, is famously echoed in T.S Eliot’s poem “The Rock”:

Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

One of the wonders of human intelligence is that we can move easily between the various levels of this model. We leap (sometimes too quickly!) from data through information up to knowledge. Our ability to retain extensive knowledge in our heads can make some otherwise insignificant piece of data suddenly meaningful (that elusive experience factor that schools can’t streamline the teaching of, and automated systems still can’t touch). In this way, experience can create a communications barrier between the knowledgeable and the novice. Knowledge and information in context makes an easy progression transparent to the experienced. The novice, however, can’t move between levels so easily, and requires explanation, background information, and other help to move up the stack.

The entire scientific enterprise is a constant process of acquiring data, forging information from it, deriving knowledge, and then using that knowledge to focus the search for additional data to expand our knowledge even further. One consequence of modern “information” technology is a data and information surplus, which naturally leads to an attention deficit, as more and more information stretches our ability to focus and prioritize.

To make this less abstract, here are some examples from the domain of learning about the Bible.

Word Study:
The occurrence of a particular Greek word in a verse of Scripture constitutes data. It is not the subject of debate: once past the manuscript evidence, we all agree on what the words are. Looking at the distribution of words across text moves toward information: we might observe that the words translated ‘tax collector’ and ‘sinner’ frequently occur together in the Gospels (of the 21 verses that include ‘tax collector’ (τελώνης), 9 also include ‘sinner’ (ἁμαρτωλός)), but that John’s Gospel never uses τελώνης at all. We have not yet reached a ‘why’ (knowledge) behind this information: for that, we might look to other resources (the data per se do not contain a direct ‘why’). For example, the New Bible Dictionary suggests the tax collectors’ extortion, support of the ruling power of Rome, and habitual contact with Gentiles were several of the reasons that rabbis taught a good Jew should not eat with tax collectors. This attitude may help explain the linguistic association, and suggests other investigation (requiring new data) to either confirm or deny the conclusion (for example, several verses that don’t include ‘sinner’ do include semantically related terms like ‘prostitute’ or ‘Gentile’).
Like word observations, the particular tense of a verb or case of a noun are observable facts (data). Our knowledge of the significance of e.g. the aorist tense in a given passage comes from our general grammatical information (based on other instances of its usage), and our conclusions of the deeper understanding of a passage.
We can observe particular syntactic constructions in the Scripture (data): what, if anything, are their deeper functions (a knowledge question which can only be answered based on information)?
Discourse Structure:
An observed phrase (data) may be repeated, or put in a syntactically unusual position like the beginning of a clause (‘unusual’ here assumes we have information about what ‘usual’ is based on observing patterns of syntax). This is only information until we determine what the function of this linguistic device might be: perhaps making a particular detail of the narrative stand out, or introducing new material. This hypothesized knowledge can then be tested against other passages to see whether the data support the conclusion.

These are all linguistic examples, though the data-information-knowledge framework is not restricted to the domain of language (you could do similar kinds of analysis within the field of systematic theology, for example). Learning to think within this framework can help us more clearly identify what we “know” and where that knowledge comes from (that is, its supporting data and information), as well as helping us more effectively communicate our knowledge to others.
(There are a few more websites on this subject under my del.icio.us tag ‘dikw’.)

July 6th, 2007

I Want a Note Genie …

… some kind of pocket-sized voice recorder that captures brief spoken messages. Then, when i connect it to my computer and download the recordings, the speech-to-text software transcribes them and pops up the results in a text file for save/edit/cut-and-paste. No harder than getting pictures from a digital camera, no fussing with starting up Dragon Dictate or copying files, just a seamless transfer of spoken notes to textual ones. Surely they’ve got this technology by now?
That way i could capture all the brilliant thoughts and turns of phrase that come to me while making coffee, and retain them long enough to get them into my blog …

July 5th, 2007

Bibleref Progress

There’s been some good progress on bibleref, and i’ve made corresponding updates (finally!) to the bibleref pages on SemanticBible: the overview is still the best starting place. There are some new blogging tools, and a validation/processor page where you can check your markup. I’ve also tried to state more precisely what processors should return when they find bibleref markup. The resulting grammar looks a little daunting (and those of you who understand this stuff should check if i’ve got it right): i probably thought too hard about weird cases that aren’t likely to occur often or ever. Maintaining the balance between making it as easy as possible for people to use, but being both precise and flexible enough, is a challenge.

I’d welcome further comments on the discussion forum: there’s a new topic for this version of the specification. I found the discussion format really helpful for the last round: Chris Roberts in particular helped me not go off the deep end on several points, and having an implementation really makes the discussion concrete.

After i’ve collected and digested comments on this draft, i’ll start button-holing some well-known bibliobloggers to see who i can persuade to adopt bibleref (Rico, you’re in my crosshairs). This should also help broaden the feedback in case there are still wrinkles to iron out.

By the way, “Creating Standards is Altruistic” by

July 4th, 2007

Why Presenting Bible Text is Hard

I happened to stumble across this discussion from a web developer who wants to figure out how to present Bible text (not references) on a website. There are a lot of reasons why this is a lot harder than one might think at first glance. Most Blogos readers will already know all this, but i thought it might be helpful to summarize some of the issues here for the next person who stops to think and google.

  1. The Good Book comprises a number of different books, each with their own special requirements and challenges. One size doesn’t usually fit all.
  2. Chapter, paragraph, and verse boundaries don’t always line up, so while a neatly nested hierarchy is tempting, it doesn’t work out well in practice.
  3. There are a lot of special formatting challenges in Scripture: here’s a list that’s almost certainly incomplete.
    • Poetry
    • Dialogue (including quotes inside quotes, like Jesus repeating a proverb in Luke 4:23)
    • Chapter and verse numbers
    • Many Bible versions include headings like “the Beatitudes” for Matt 5:2-12. While they’re not part of the text, they are part of the version.
    • The convention of displaying the words of Christ in red (which presumes markup that distinguishes them)
    • Footnotes (common in many Bible versions)
    • The Psalms have some special requirements: there are actually five “books” (collections of Psalms) within the one Book of Psalms, and many of the Psalms have additional header material (e.g. Ps 73, which identifies Asaph as the psalm’s author)
    • Certain passages (John 7:53-8:11 is a good example) are usually distinguished typographically along with an explanatory note, since they’re not included in all manuscripts.
  4. No matter what you think the One True Markup Style might be, it doesn’t do you any good unless you actually have a Bible text in that format, and that takes a lot of hard work.

So what’s a web developer to do? While i can’t solve the problem for you, i can point to some existing standards. Of course, different desktop applications (like Logos Bible Software) have their own (typically proprietary) internal markup systems (disclaimer: even though i work for Logos, i couldn’t begin to explain all the intricacies of their markup!).

OSIS is a very thorough (hence fairly heavy-weight) XML standard for Scripture markup. While they’ve covered most all the bases and then some, i don’t know of any complete Bible versions that are freely available in OSIS format. On top of that, the OSIS group doesn’t seem all that active these days.

The ESV website provides a web service with a rich API that also illustrates a numbers of the issues mentioned above. While their format is not a “standard” (in the “committee-approved” sense), it’s certainly a well-thought-out approach, with the additional benefit that you can actually retrieve text that’s already formatted coherently, rather than formatting it yourself.

PS: notice the bibleref markup here? I’ll bet you didn’t, but someday a smart web spider might! I typed it in by hand, just for the experience (apparently WordPress lets you enter HTML as text that gets converted to markup when you save), though a plugin makes it even easier.