Thursday, September 01, 2005

Part of what makes the Hyper-concordance interesting is that it maps inflected words back to their base forms ("gave" => give, "men" => man, etc.). This is done with a simple map that i constructed semi-automatically and then corrected by hand. This would be a major project for all of English, but the vocabulary of the English NT is limited enough to make it feasible.

Having started down this road, though, i find myself wanting more (of course). For example

  • distinguishing content terms (nouns, verbs, adjectives, adverbs) from others, to enable term counting
  • frequency analysis (which is relative to particular translations: the current version was derived from the RSV, but i'd like to include the ESV as well)
  • co-indexing against Strong's numbers
  • capturing collocations ("come out" really functions like a unit, not two separate words)
  • co-indexing against Wordnet to take advantage of their hierarchy and sense information

So i'm starting to think about how to set this up as a structured XML resource, sharable (of course) with others. If you know of existing resources (that are digital and freely sharable), and/or if you're interested in working on a project like this, let me know!


9:01:21 AM #  Click here to send an email to the editor of this weblog.  comment []  trackback []

Things have been pretty quiet at Blogos this summer. Though i might wish that were a reflection of time spent at the beach or otherwise relaxing, my skin tone is as pasty-white as ever. In actuality, my leisure time in the early part of the summer was focused on a class in distance education (i learned a lot, though my impressions of on-line learning are mixed: still haven't done a post on that). Then my mother had a stroke in mid-July, and that's changed a lot of our priorities and activities as we've been spending time with her in the hospital and preparing for her to come back home.

But i find myself still bubbling up thoughts and ideas about how to move forward on SemanticBible.org, and i expect to get back to some blogging, in part as i prepare for a presentation at the Society for Biblical Literature meeting in Philadelphia in November (any Blogos readers, consider this an invitation to introduce yourself if you're there). I'm in the session on Computer Assisted Research, Monday Nov 21 from 9-11 AM in Room 411&412. This being my first SBL meeting, i'm both honored to have an opportunity to present (since i'm not exactly a card-carrying Biblical scholar), and excited about hearing many of the other presentations.

Just to whet your appetites, here's my abstract:

Information visualization is an established computer technique for providing rich, typically interactive, visual presentations of complex multivariate data. In this paper we present several visualizations of the Gospels texts, focusing on the length and overlap (or lack thereof) of their various accounts. The fundamental data comes from Composite Gospel Index (http://www.semanticbible.org/cgi/cgi-overview.html), a unified index and alignment of the pericopes in the four canonical Gospels, expressed in the Resource Description Framework (RDF), an XML-based language for representing meta-data. The Composite Gospel Index as the underlying data source will be briefly introduced, followed by several live visualization examples based primarily on treemaps, a "space-filling visualization" that uses size and color to effectively show complex relationships, developed by Ben Shneiderman of the Human-Computer Interaction Laboratory at the University of Maryland. Our claim is that treemaps are a novel and useful tool for investigating textual overlap within the Gospels.

8:34:11 AM #  Click here to send an email to the editor of this weblog.  comment []  trackback []