The background of this talk: Zack Hubert’s talk from the last BibleTech. Zack developed a very useful web site which ultimately failed because he couldn’t maintain it, and couldn’t get other developers to pitch in and help.

The vision: an open web repository for integrated scriptural data and a platform for building applications of scripture (OpenScriptures.org). What kinds of data? Manuscripts, translations, versification systems, morphosyntactic parsings, user tags/annotations/cross-references. But it takes a lot of effort to get started with all this data, each of which is typically in its own format, and unlinked to other data.

Linked data principles (from timbl):

  • use URIs as names for things
  • use HTTP URIs so that people can look up those names
  • provide useful information behind the URIs
  • and links to other URIs so they can discover more things

“… the more things you have to connect together, the more powerful it is.” Can we connect things together through a unified manuscript that links together semantic units (words, phrases, clauses)?

Manuscript unification: normalize a manuscript (lowercase and remove diacritics: no spelling normalization yet), insert and save links to the unified manuscript. Then for additional manuscripts, normalize, merge links, and save them. Now you’ve got all the attested readings linked together. This unified manuscript now has an automated critical apparatus. [demo here of the manuscript comparator]

Potential applications include:

  • translation comparator (can also help with the versification problem)
  • comprehensive concordance
  • translation-independent cross-references (e.g. NT quotations of the OT)
  • interlinear/bilingual editions

You can automatically link manuscripts in the same language, but not different languages. Use collective intelligence to capture semantic linking between languages. Use the “games with a purpose” (GWAP) approach to gather links.

Copyright is a major challenge: you can’t link texts together if you can’t access them, and you can’t share them if they’re not open. Recently MorphGNT texts have been taken down from several sites because they’re not freely sharable. If the key benefit is connections between data, then data (including texts) should be more valuable if they’re sharable and connected. One solution: an Open Scriptures Platform that connects content owners, developers, and end-users. Passionate developers could build applications based on content licensed to Open Scriptures (as a proxy), and Open Scriptures makes sure than end-users provide revenue to content owners.