Logos 5, Behind the Curtain: Curation

This post is part of Logos 5, Behind the Curtain, a series of blog posts looking at new data sets that are part of the latest Bible study application from Logos Bible Software.


At the heart of Logos’ approach to data is the practice of curation. If you’ve heard this term at all, it’s probably in the context of museums: but we mean something rather different. As usual, Scott Adams’ Dilbert has the pulse of corporate America:

The Official Dilbert Website featuring Scott Adams Dilbert strips, animations and more

For Logos, though, curation is not just trendy new jargon, but an essential practice that we’ve been pursing for quite a few years now (five under my direction, and more before my tenure). It’s a critical part of what makes Logos unique in its market. For example, we have not one but three Lexical Curators working on the Bible Sense Lexicon (one for Greek, two for Hebrew). To my knowledge (and Google’s), these appear to be the only people in the world with this job title. In fact, most people in the Content Innovation department at Logos are doing one kind of curation task or another.

So what is curation in the context of Bible study software, anyway? In the simplest terms, curation means organizing and maintaining a collection of things. In the museum context, that’s artifacts they display. Our kind of curation involves computer-readable data that captures knowledge relevant to biblical studies. You can summarize it briefly with three key practices:

  1. Collect
  2. Correct
  3. Connect

(“Describe” is probably a more apt term than “Correct”, but i just couldn’t resist the awesome alliteration.)

To “Collect” means imposing organization on a complicated and messy world of information, with an eye toward structuring it and making it useful in some way in the Logos system. Often this information is expressed in prose sentences in a reference book: for example, a Bible dictionary might describe a city, and include language about where the city was located, what larger region it was part of, where it’s mentioned in the Bible, etc. Fundamentally, a plan for collection requires deciding what information ought to be collected, and which things belong in the collection and which ones are excluded (in more technical terms, an ontology).

Capturing and formalizing knowledge typically involves some tradeoffs. First of all, we have to find categories and labels that balance and maximize utility and (formal) correctness. When it comes to categorization in particular, ultimately “everything is miscellaneous” (which is also the title of a really important book by David Weinberger), and if you push hard enough, each thing is its own unique category . But data sets are typically more useful if things are grouped in some way. So we categorize the following as “people” in our data set:

  • individuals (whether named or not)
  • groups, whether defined by residence (Greeks), common ancestry (Levites), belief systems (Pharisees and Sadducees) , etc.
  • supernatural entities (including those that most Bible readers would accept like the God of Israel, and those that biblical authors argue are not real at all, like Baal, Asherah, or Zeus)

To “Correct” or “Describe” means that we choose and populate particular attributes for the items we’re curating. In the case of people, that includes things like names, gender, and roles. In addition, we create three special attributes for nearly every data set:

  • a unique identifier: though you’ll probably know who i mean when i use a biblical name like “David”, you won’t for “John” because there are five different individuals known by that name. This kind of ambiguity, and other variations (“John”, “John the Baptist”, “John Baptizer”, etc.) means names are usually poor identifiers. Instead, we create a symbol like “John.4” that uniquely identifies one particular item in our collection (in this case, a member of the Sanhedrin mentioned in Acts 4:6). Since we don’t show these to users, we don’t have to worry about people understanding them.
  • a label: since our data is for people to look at and understand, we need user-friendly ways to display an entity. Labels are typically brief (less than 20 -25 characters), and also unique, so that in a drop-down list showing names that match “John”, i can distinguish “John (the Baptist)” from “John (Ac 4:6)”. Since the label is also unique, we could use it as the identifier: but since data stability is a primary goal, we separate the two, so that we can change the label if necessary. Since identifiers (not labels) are the means by which we connect data, we (almost) never change them, since that risks breaking the integrity of the data set.
  • a description: labels are brief so they take up minimum space, but consequently, they can’t carry much information. So we often provide a longer prose description, perhaps a sentence or two, that helps identify the entity and its most basic information. You could compare this to the leading sentence in a Wikipedia article. In the case of John.4, that’s “a member of the family of high priests in Jerusalem following Jesus’ ascension.”, which is probably enough to help you decide whether this is a John you want to know more about or not.

To “Connect” means linking entities to other entities (or other data sets). For people, family relationships are an important ways that people connect to each other. We also label those relationship (father, mother, sister), and, for biblical information, capture the textual sources that support this relationship (more technically, the provenance of the data).

Connecting information is one of the most important aspects of curation for Logos. While it may be interesting to learn that King David was also a shepherd, that’s an isolated fact. But if you can get a list of other individuals in the Bible who were also shepherds (or kings, or musicians), now you’re discovering new information. You might not have started out looking for this, or known how to find it for yourself.

Question: which part of Logos’ curation process (Collect, Correct, Connect) do you find the most interesting or appealing? Please leave me a comment.

(Edit: saw a good piece today by John Chambers of Cisco about the power of connection. http://www.wired.com/insights/2012/12/the-internet-of-everything/)

BibleTech 2011

I had to miss the first day because of another commitment, but today i’m here at BibleTech:2011 and looking forward to a great day of talks. Hopefully mine will be one of them: here’s my abstract.

Using the Bible Knowledgebase for Information Integration

In 2009 I reported on the Bible Knowledgebase (BK), a machine-readable collection of semantically-organized data about people, places, and things in the Bible. This talk will describe how the BK now functions as an essential information resource for Logos, tying together information across the software. In addition, I’ll discuss the continued work on the data over the last two years, including:

  • building a database of Biblical Events
  • adding unnamed entities to the database
  • coordinating information about these entities with the Logos Controlled Vocabulary

I’ll also present prototypes for visualizing BK data to enhance discovery and exploration in the Biblical text.

I’ll be live-blogging a few talks during the day to give a quick-take on the subject for those who can’t be here. You can also follow on Twitter via #BibleTech.

LCV Talk at Semantic Technology Conference

I’ll be giving a talk at the Semantic Technology Conference, June 23 from 7:30AM8:20am (ouch!), in San Francisco, CA. The talk title is “Using a Controlled Vocabulary for Managing a Digital Library Platform“: no talk page yet, but the abstract follows. If you’re there, come by and say hello!

(Astute readers will note some similarities between this and my upcoming BibleTech talk. But the audiences are quite different, so the content will be too. This talk will provide “a practical case study on semantically organizing reference material to support search and navigation, using a controlled vocabulary.”)

Abstract

Encyclopedias and other subject-oriented reference books frequently present the same content using different names: and users often look for this information using other names altogether.

The Logos Controlled Vocabulary (LCV) organizes parallel but distinct content in the domain of Biblical studies to integrate reference information and support search, discovery, and knowledge management. The LCV captures

  • preferred and alternate terminology
  • inter-term relationships
  • term hierarchy
  • linkage to other semantic information

The initial version of the LCV (now shipping in the Logos digital library platform) comprises some 11,000 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions to terminology and content.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future.

Keywords: , , , ,

Building an Architecture of Participation in Bible Study

The Cornucopia of the Commons

Some time back, Tim O’Reilly (The Architecture of Participation) echoed and applied some observations from Dan Bricklin (the Cornucopia of the Commons) about the architecture of Napster and  other significant web-based systems. The individual details are well worth reading, but here’s the summary form. There are several common models for how to build large datasets that are valuable to people:

  1. Pay people to build it (Bricklin calls this “Organized Manual”). Examples include the original Yahoo! directory of the web, and the Encyclopedia Britannica. There’s an variant that represents smart algorithms rather than just human effort (Bricklin: “Organized Mechanical”): this is how Google has built its indexes. But it still represents a significant monetary investment by somebody who probably expects something in return.
  2. Get volunteers (Bricklin’s “Volunteer Manual”): Wikipedia is the preeminent example here, along with Linux, the Open Directory Project, and a great many open source projects. People do this work because they value the end result, and the project coordinates and magnifies those efforts.
  3. Architect in such a way that individual self-interest creates collective value.

Napster (the original peer-to-peer version) was proposed by Bricklin as a prime example of the third model: simply by listening to your music (within the Napster ecosystem), the default settings meant you were also sharing that music with everybody else. Quoting Bricklin:

What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present, especially since sharing is the default.

This is Bricklin’s Cornucopia of the Commons (an allusion to Garrett Hardin’s Tragedy of the Commons): a system designed in such a way that use brings overflowing abundance.

(You might think blogging and twittering are like this, but they’re not. Nobody tweets because it has direct, inherent value to them: instead, it’s an outgrowth of a narcissistic, self-centered open, generous belief that what i say might have value to others. Few of us would do it if nobody else was listening. )

Models for Data Creation In Biblical Studies

All that (and Napster!) is now history, and i don’t want to get distracted by the peer-to-peer model that made Napster so powerful (Bricklin argues that’s not the reason it succeeded), or the legal issues that led to its demise. Instead, i want to reflect here on how these principles apply to Biblical studies and software.

With Logos 4, we’ve launched a major expansion of our Biblical Knowledge, by expanding Biblical People, adding Places and Things, and building around the large set of concepts we call the Logos Controlled Vocabulary. This was accomplished through the Organized Manual method: we paid a bunch of people (me included) to architect and populate this data, in a major development effort that stretched over several years. You could view the vast network of links that make Logos more than just a collection of texts as an extension of the same principle (through the resulting software program doesn’t look so much like a database). It represents literally hundreds of thousands of hours of effort in book markup and design, along with lots of “Organized Mechanical” algorithmic work.

There are also lots of examples of Volunteer Manual projects related to the Bible. The Sword Project is like Linux for Bible software. e-Sword has a smaller group of developers, but the same framework of a volunteer effort which is given away. Open Scriptures is building a platform and API for others to use in building Bible-based applications. Web 2.0 efforts like YouVersion let people tie their reflections directly to the Biblical text, and numerous projects have sprung from the Wikipedia mold like Theopedia. My own SemanticBible projects are much more limited, but in a similar spirit.

Logos has been active with the Volunteer Manual approach as well. The Logos Topics website combines our Organized Manual data and architecture of topics with user-contributed extensions of additional terminology, links within Logos, and even links to other websites. This lets us do some neat things like extending the desktop application content through user contributions on the web. Like Wikipedia, these are altruistic contributions from people who want to share their knowledge with others.

Sermons.logos.com works in a similar fashion: if you’re a pastor who writes down your sermon, and you’re willing to upload and share it, lots of others (both on the web and in Logos software) can benefit from what you’ve created. This is closer to the Cornucopia of the Commons model, but it’s still a voluntary and indirect process: my sermon doesn’t get shared as a natural by-product of my preparation activity.

The Cornucopia and Bible Study

The interesting question to me is how to achieve the third model, where my own use of a tool provides a direct benefit to others through a network, not because i’m behaving altruistically but simply because the system is architected to work that way. This is closely related to the whole Web2.0 meme (can it really have been five years already?!?) of “software that gets better the more it gets used.”

One thought: lots of web sites use RefTagger to provide a nice pop-up of Bible text for their readers, a benefit that enriches the experience of visitors to their site. Twitter users can similarly use ref.ly to shorten Bible references, which, like RefTagger links,  in turn resolve to references on Bible.Logos.com.   Could those links be converted into data indicating, for example, the relative popularity of different verses, and then displayed back to users?

Aggregating users’ operation of Logos software (in a suitably anonymized fashion, of course) could also provide data on the most popular resources, searches, and topics, which could then be turned around into recommendations (“Looking for a Bible dictionary article on ‘marriage’? Here are the ones our users have found most useful ….”).

But none of these seem to me to accomplish the full promise of the Cornucopia of the Commons. There has to be more here than simply harnessing popularity (though sites like Digg and del.icio.us have shown how useful that can be). I’m still trying to imagine what data sets could be created by people who are already committed to Bible study, as a normal outgrowth of what they do anyway. Any thoughts? Please share a comment.

BibleTech:2010 Talk – The Logos Controlled Vocabulary

The program for BibleTech:2010 has been up for a couple of weeks now, and i’ve been delinquent in failing to point that out. We’ve got a full roster of really interesting talks that span the gamut from friendly warm technology to hard-core geekishness: Bible translation, social media, Biblical linguistics, mobile computing, preaching, publishing, tweeting, and more. And this year, it’s in San Jose, CA: i’m hoping that will open up attendance to some folks who have the misfortune to not live in the beautiful Pacific NW. The dates are March 26-27, 2010.

I’ll be giving two talks this year: here’s my abstract for the first one, on the Libronix Logos Controlled Vocabulary.


Dozens of books provide terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other subject-oriented reference works. However, the terminology used varies between books, authors, and publishers, and doesn’t always include all the terms a user might employ to find information.

The Libronix Logos Controlled Vocabulary (LCV) organizes content from multiple Bible dictionaries to integrate information across the Logos library. As a controlled vocabulary, the LCV identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV (shipping now with Logos 4) comprises some 11,100 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future. This includes some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

Update: we’ve decided in general to retire the “Libronix” name for Logos technologies, so i’m trying to get on board by starting to call this the Logos Controlled Vocabulary.

Survey: the World of the Bible

The Society of Biblical Literature has received a planning grant to

… develop a website, “The World of the Bible: exploring people, places, and passages.” The site is intended for general audiences and will share scholarly views and encourage critical engagement with the Bible, including its ancient contexts and interpretive legacy.

We encourage you to share this survey with people who are not bible scholars—your students, perhaps, or friends and family. The goal is to gain a diverse representation of our intended audience and to assess their current level of familiarity with and interest in the Bible.

Please feel free to post this link in your blog or webpage.

Here’s the link to the survey: if you’re in their target group, i’d encourage you to give them some feedback. I’ve had some discussion with the principals, who know about Logos’ work on the Bible Knowledgebase (but we don’t have any official role in the project). This could become a useful resource for translating some of the scholarly work on Biblical studies to a wider audience.

(Hat tip: Mike Heiser’s Naked Bible blog)

Technology in Scripture

John Dyer points to a effort by Matthew Clarke to catalog references to technology in the Bible at WikiChristian. I really like the idea of looking at the Bible through technology glasses.

If you have Logos 4, you can easily play along using the Biblical Things feature (brief tutorial video), which provides a comprehensive list of references for all the physical, depictable artifacts of technology (though not more abstract things like metal refining techniques).

This kind of broad study across the whole of Scripture can provide new perspectives on things that, in their immediate context, often go right by us.

Logos 4 Launches Today

I’m thrilled to announce that we’re releasing Logos Bible Software 4 today. This is a complete rewrite from the ground up of the best Bible study software on the planet, so that makes this an exciting day in my book.

Logos 4 sports an entirely new interface to make it easier than ever to find what you’re looking for and keep your study space organized and effective. There’s a wealth of new, visually oriented resources, and better controls for working through the enormous space of resources Logos makes available. There’s even an iPhone app for no extra charge!

That’s the marketing view (and i stand behind it). But this means much more to me on a very personal level. It’s been almost 3 years since i came at Logos, and this will be the first time most of my work has seen the light of day. Specifically, Logos 4 contains the work of my colleagues and me in several new areas:

  • Biblical People, which organizes information about the 3300 individuals, groups of people, and deities named in the Biblical text. It includes a comprehensive list of references, their family relationships, links to dictionary articles, and links to related items. It also includes family tree and story-based diagrams. And everything is hyperlinked.
  • Biblical Places includes all the same kinds of information for 1200 named places from the Bible: cities, regions, even geographic features like rivers and mountains. Along with the data, there are 60 new high-resolution maps commissioned by Logos and covering the major Biblical events, as well as a mega-map that shows all the places together.
  • Biblical Things describes the physical objects of the Bible: animals, plants, body parts, clothing, food and drink, and much more, as well as specific items like Noah’s ark and Goliath’s sword and weights and measures. There are more than 1000 objects here, which also bring together thousands of images from across the library.
  • There’s also a new collection of high-resolution infographics illustrating different aspects of the Biblical world (and i’m extra proud that the bulk of this work was managed by my wife Donna)
  • In additional to regular word search (which is much faster than ever), under the hood is the Libronix Controlled Vocabulary (LCV), working to organize 11,000 different subjects in the Biblical studies literature and coordinating information across the library.

So if you’ve been following my posts on the Bible Knowledgebase … well, now it’s here. I can’t overstate how important i think this is: this is quite literally the first time in the centuries-old history of Biblical studies that this information has been made available in this way. The LCV isn’t quite as visible (yet), but it’s also an important organizing feature that will continue to grow in power going forward.

I hope you’re catching my sense of excitement about these new resources (and this says nothing about all the hard work of my dozens of colleagues in other areas). I hoped i’ve piqued your interest to learn more about Logos 4. It really is a watershed event in Bible software.

Obligatory disclaimer: i work for Logos and highly value what i do there. So i’m not the least bit objective about this. (more detailed disclosures)

BibleTech:2009 Postlude

BibleTech:2009 is past now, and (just like last year) was a great opportunity both to hear new ideas about Bible and technology, but also meet and talk with many others with common interests. The few scattered thoughts i jotted down as i was live-blogging talks certainly don’t do justice to the richness of many of the presentations: so don’t judge the quality of their talks by my quick-take notes.

I’ve got slides from my talk on the Bible Knowledgebase posted now on SemanticBible: the navigational structure above them isn’t in place yet, but you should be able to follow the link directly to get there. Once again, i’ve used Slidy for the presentation, and that process went a little more smoothly this time (which probably just means i’ve gotten better at it). View the source if you want to see how it works.

[Important note: if you were at my talk and wrote down the URL for the slides, i had it wrong. The correct URL is:

http://semanticbible.com/other/talks/2009/bibletech/BK.html

Yes, i know that Cool URIs don’t change, which is why i wanted to make this one adjustment before publishing them, so i won’t have to change it in the future.]

At some point there should be audio from the talk posted on the BibleTech site (probably on the BibleTech speakers page, which has links to talks from last year and audio where available). Future Blogos posts on the Bible Knowledgebase will go in my WordPress category of that name (RSS feed here), and will also be tagged with bk if you want to follow along.

BibleTech 2009 Topic: the Bible Knowledgebase

My most significant activity at Logos over the last year and a half has been building a database of people, places, and things i call the Bible Knowledgebase (BK). I’ve posted on numerous aspects of this project before (collected in this category), and thanks to lots of hard work by a number of individuals, we’re closing in on a relatively complete internal version. This won’t be released until the next major version of Logos software, so it’s public debut is still some ways off.

So one strong candidate for a BibleTech talk is a review of the BK, a machine-readable knowledge base of semantically-organized Bible data that is linked to Biblical texts to support search, navigation, visualization. The thousands of entities in the BK (people, places, and things, along with their names) have a variety of attributes that are appropriate to their type: people have family relationships, places have geo-coordinates, etc. Relationships between entities support discovery and exploration.
Unlike knowledge expressed in prose (like Bible dictionaries), BK data provides reusable content that can serve a variety of purposes. It also provides an important integration framework for Libronix resources, in the general spirit of Tim Berners-Lee’s Linked Data idea.

Some other topics the talk might address:

  • visualizing and learning from the graph of relationships
  • BK as an information architecture for other Libronix resources
  • challenges in building and using BK
  • some specific tools that have proved useful in managing BK development
  • a possible future for community participation in BK extension

So now, the audience participation portion of our program:

  • would you be interested in hearing a talk like this at BibleTech 2009?
  • what aspects are most/least interesting to you?

I’d encourage you to post a comment with your responses.