Talking with Talis is rapidly becoming my favorite podcast source: Paul Miller has a lot of really interesting guests addressing topics at the intersection of libraries and the Semantic Web.
Today i listened to an interview with Dr. Jim Hendler, now at Rensselaer Polytechnic Institute, but previously at University of Maryland and a key figure in the establishment of OWL during his tenure at DARPA. My comments here are really just a rehash of some things he said much better, and with much more authority (given his history in the field) — but blame me, not him, for what i say below.
The concept of the Semantic Web brings together two different communities , along with their respective priorities and technologies. Many of the disagreements within what looks like a single community are just two sets of people talking about different things (but using similar terminology). The “semantic” part is mostly represented by the Artificial Intelligence community, with interests in careful ontology development, deep reasoning, theoretical correctness, and academic activities. The “web” community has been out there for more than a decade, building the World Wide Web with HTML and lots and lots of data, and is now looking for ways to make it more useful, connected, and extensible.
You can represent these two concerns as two axes on a graph, and many different endeavors tend strongly toward one side or the other, depending on whether they emphasize the “intelligence” dimension, or the “data” dimension. Just a few examples on the data side (that could be multiplied many times over):
- Yahoo plans to start indexing RDFa content (i discussed this a bit in my post about Bibleref and RDFa). As one of the major web players, this adds just a little more intelligence to a lot of data (potentially: users still have to create RDFa markup)
- Freebase is harvesting data from Wikipedia and other sources, and then adding a modest amount of structured relations.
- Talis has their own set of data from a long history of library applications.
On the “intelligence” side would be big ontology development efforts, and academics working on reasoning: Hendler also called out pharmaceutical companies as tending toward this dimension. Hendler’s own bet is that progress is more likely to come from data-side approaches than the hard-core intelligence side (and i think he’s right). He sees the combination of SPARQL and persistent identifiers as two recent developments that are likely to move the field ahead: these are things i’m looking at closely as well in Bible Knowledgebase development (more on the second one to come soon).