Next in my experiment to gather feedback on possible BibleTech 2009 topics: the Libronix Controlled Vocabulary. This is the second of my two major activities over the last year (the other was described in my previous post), and therefore a pretty strong contender for a BibleTech presentation.

Unlike the Bible Knowledgebase, which is about real-world entities in the Biblical text, the Libronix Controlled Vocabulary (LCV) organizes terminology from the field of Biblical studies, principally Bible dictionaries, encyclopedias, and other kinds of subject-oriented reference works. A controlled vocabulary identifies, organizes, and systematizes a specific set of terms for indexing content, capturing inter-term relationships, and expressing term hierarchies. Like other kinds of metadata, this infrastructure then supports applications in search, discovery, and general knowledge management. The initial version of the LCV was built by merging content from 7 of the most important Bible dictionaries in Libronix, and currently comprises some 11k terms: i expect it will eventually grow to 15k or perhaps more.

One interesting aspect of working in the specific domain of Biblical studies is that there is a core set of subjects that are common to many or most Bible dictionaries. This includes named individuals and places in the Bible, but also subjects like Heaven or Heresy. But while one dictionary has an article on Heresy (NBD [Libronix link], or Eastons [Libronix link]), another might have one entitled “Heresy and Orthodoxy in the NT” (Anchor [Libronix link]). These articles may have both common content but also significant differences, stemming from their intended audiences (scholarly vs. popular), theological orientation, comprehensiveness, etc. The LCV provides a way to capture some of these similarities, as well as enabling some interesting new capabilities for machine learning from existing prose content. For example:

  • what are the prototypical Bible references, names, or phrases used to discuss a topic?
  • can we learn anything about the importance of topics by looking at how much is written about them, how many dictionaries cover them, and other kinds of automated analysis?
  • what knowledge can be gleaned from the topology of terminology linkage (what links to what)?

I’m not sure i’ve provided enough information here to give a clear sense of what might be covered in such a talk, but i welcome any feedback from potential BibleTech attendees (or others) as to whether this sounds interesting, and which aspects of it you’d most like to learn about.