God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
May 4th, 2007

The Importance of Being Ambiguous

(with apologies to Oscar Wilde)

My colleague Rick Brannan has done some recent posts on the Logos Blog about some ambiguities in James:4:5-6. Given the work i’m doing on knowledge representation applied to Bible information, his latest post got me thinking about the general problem, and what best practices to use for representing ambiguity.

Ambiguity is a bad thing, and we should always try to get rid of it, right? Well, it’s always nice to have a clear understanding of things, and a lot of information technology, from language compilers to air traffic controllers to traffic lights, only functions well in its absence (though we’d lose a lot of our humor by banishing ambiguity altogether). But i’d claim the following (which i’ll glorify With Capital Letters) as the Best Practices for Representing Ambiguity:

  • sort out and resolve any ambiguity when you can
  • don’t create any spurious ambiguity when you can avoid it
  • where’s there’s genuine ambiguity, preserve it

It’s this last practice that i want to address here: don’t guess, don’t arbitrarily pick one of several alternatives, instead find a way to represent ambiguity when it’s real.

One major knowledge representation project that provides a good model here is the Penn Treebank, a million-word corpus of English annotated with grammatical analysis. Early on, the creators recognized that syntactic theories are all over the map, and by tying the Treebank too closely to one theory or its conclusions, they’d risk being ignored by the others. So they adopted some basic practices to ensure a theory-neutral representation, including tags specifically designed to indicate “there’s an ambiguity here, and we’re not resolving it”.

Things are never black-and-white in the absence of complete information: how to decide the tricky cases? A fair rule of thumb would be to imagine you and several equally brilliant friends reviewing the case (imagine an even number of friends, so you can be the tiebreaker). Would the decision about how to resolve a given ambiguity be unanimous? Then go with the consensus and consider it settled (even though you might imagine some nameless Bible scholar somewhere disagreeing about it) . Would the vote be close? Then see if there’s a way to capture the other possibility, rather than forcing a decision.

Here’s one way this works out in practice for creating a detailed knowledgebase about people in the Bible. One of the fundamental differences between semantic search and word-based search is that you have the chance to resolve ambiguity by deciding which references to people are the same, and which are different. Take Gaius as an example: this name occurs 5 times in the New Testament, twice in narrative passages in Acts (Acts.19.29 and Acts.20.4), Rom.16.23, 1Cor.1.14, and in the opening address of 3John.1.1. Assigning the mentions in Romans and 1Cor to the same person is pretty solid (this implies Paul was in Corinth when he wrote the letter to the Romans). But otherwise, the contexts leave open the possibility that there might be four different individuals here that just happen to share the same name (which was a common one). For example, Gaius in Acts 19 is Macedonian, but Acts 20 says Gaius of Derbe. While it’s possible some of these other mentions are the same person (and it’s always interesting to speculate on possible connections), in the absence of any solid evidence the best practice is to represent them as separate individuals.

So i’m modeling this in the Bible Knowledgebase as follows:

  • when different mentions seem clearly to be the same individual (e.g. Simon Peter and Cephas), standardize on a single URI for all their properties
  • when there are clearly different individuals, given them each a unique URI
  • when they might be the same but the evidence is weak (for example, the other Gaius mentions), treat them as different individuals (with unique URIs), and then use the property possiblySameAs to represent this hypothetical linkage.

It’s much easier to join things later if you decide they’re the same: the OWL property owl:sameAs exists to accomplish that with a single assertion that two entities are definitely the same. But splitting involves sorting out all their properties and reassigning them to the correct URI, which can be tricky business. So when in doubt, they’re left separate, with indicators where they might be the same as another.

There’s another extreme to this general principle, which is to be so afraid of making any commitments that you leave everything ambiguous. Obviously this doesn’t work well either (you wind up saying nothing), and deciding which is which is as much art as science. But as a generalization, i’d always prefer to err on the side of conservatism and joining later rather than risk losing ambiguity when it’s real.

Tags: , , , , ,
May 4th, 2007

Add Some New Blogs!

Just what you need, more blogs to follow, right? But i have two suggestions for Blogos readers:

  • Many of the issues i blog about are now closely related to my day job at Logos Bible Software. So if you’re interested in Blogos, you really ought to read the Logos blog too, if only because some of my material will now get posted there rather than here. In fact, my inaugural post appeared today (though it’s mostly a repeat of the series here on name weights, which means if you’re reading this, you probably already read that).
  • Several others have pointed to Amazon UnSpun’s blog lists, including one for blogs about Biblical Studies. I read many of these regularly, so you should check out the list and perhaps discover some new perspectives. If nothing else, it’s entertaining to read the titles (i wish i had thought of Sean the Baptist!). While you’re there, you can vote for your favorite blogs (Blogos is currently down around 96, which isn’t too bad given how eclectic my perspective is compared to many of the more traditional biblioblogs).
Tags: ,
May 4th, 2007

Blogos Back to Normal?

I finally figured out how to resolve the lingering Blogos issues, which were both a blank page at http://www.semanticbible.com/blogos/ (the version without www has been working for a while), and a number of blog posts that had gone AWOL.  The culprit was a misfunctioning WordPress plug-in. When i had the “missing post” problem with something that was only a new draft, i started to suspect the problem wasn’t my hosting service. So i just went through the post and removed different pieces until i found the one whose absence made things normal again, and then played “one of these things is not like the other” until i figured out what the culprit was. For reasons i don’t understand, that fixed the blank home page too.

So at this point, i think all the recent wreckage caused by Lunarpages has been cleaned up (but i’m still upset with them for breaking it in the first place, and very slow response to critical problems). If you have problems with either SemanticBible or Blogos, please let me know.
Trust in our information systems is a fragile thing: once something goes broken, i naturally tend to assume (sometimes only unconsciously) that other new brokenness is related. Alas, brokenness can be as complex as everything else in life, and i have to question my assumptions about it (as everywhere else) to avoid getting stuck. It’s funny how often that questioning process becomes the portal to improvement.

Tags: