Information Moving To The Web

Xerox Star 8010 We all know there’s a massive shift of information onto the Internet, with Google Books scanning whole libraries, more content being born digital, the transformation of digital libraries, and tera-peta-exa-zeta-yotta-yadayadayada-bytes of data going online. But somehow, those abstract notions don’t have quite the same tangible impact as actual physical artifacts (like books) with their connections to our personal histories. Here’s how this hit home for me today.

I first got interested in computational linguistics around 1979, when i was finishing up my degree at Occidental College (an independent major combining linguistics and anthropology) and playing around with computers. Later, as a graduate student in linguistics at UCLA, i attended my first academic conference in the field: COLING 84 at Stanford, a combined gathering of the 10th International Conference on Computational Linguistics and the 22nd Annual Meeting of the Association for Computational Linguistics. It was a pretty heady experience for this young man: i still remember playing with the bit-mapped graphics on what i think was a Xerox Star, one of the earliest commercial systems with many of the display and interface innovations that are commonplace today.

I brought back the proceedings, a hefty volume about 3cm thick. Later i joined the Association for Computational Linguistics, which included getting the journal Computational Linguistics, and over the course of my 19 years with BBN Technologies i attended many annual meetings and other workshops, collecting proceedings all the time (they started distributing them on CDs around 2000). COLING 1984 Proceedings I have close to a complete collection of the journal for many years (dozens of volumes). I count 16 proceedings volumes, typically several cm each. All told. these were taking up about a meter of shelf space in my office, as they have for the last 10 years or so (the last one i have is from 2000, which is about when i got more involved in management and had a harder time justifying these kinds of technical conferences).

Today, casting about for a place to put some new books i’d acquired, i looked at these journals and proceedings, and had an epiphany. I googled a few articles: sure enough, they were all on-line. In fact, the journal became open access in 2009, and they’ve put all the back issues on the web as well. The ACL Anthology hosts thousands of computational linguistics papers, and they’ve provided digital versions of all the proceedings i have (and many many others). So all of a sudden, i realized i had a meter of useless paper volumes on my bookshelf.

You might wonder what took me so long. I do too: I guess one answer is simply inertia. I’ve had these volumes on my shelves for so long i hadn’t gotten around to reconsidering whether i really needed them. I’m also an information omnivore, so i’ve always been reluctant to just give them up (though i couldn’t tell you the last time i actually cracked the cover on one). In part, I suppose another reason is that having a shelf of professional journals and proceedings makes me feel smarter (silly though that sounds when said out loud): it’s evidence of many years of commitment to the field. In the digital age, these markers of industriousness are becoming as scarce as the artifacts themselves. 20 Years of Journals and Proceedings

Some of these volumes have moved with me many times, from Los Angeles to Massachusetts when i took my first research position with BBN (1987), through various office moves there, when we moved to Maryland in 2000 (and more internal moves there), and when we moved to the northwest to work for Logos in 2007. That first COLING volume has been on my office bookshelf as long as i’ve had an office with bookshelves! But, with ever more information on-line (and much more findable and useful there), new books that need to find a home, and doubtless other office moves ahead … it’s time to let go and continue the march into the digital future.

Digital Journals for Biblical Studies

John Hobbins over at the Ancient Hebrew Poetry blog has been musing about this question:

What do you think a state-of-the-art electronic journal in biblical studies would look like?

This question lives right where so many interesting discussions are currently taking place around topics like

It’s still too early to know the answers, but here are a few areas of interest to me:

  1. The value of search, hyperlinked information, and other digital conveniences seems indisputable.
  2. There’s a lot of momentum from openness so far. Wikipedia has clearly won the day against the Encyclopedia Britannica, through its combination of free access, timely update of content, and tremendous scope – and despite criticisms of its lack of authoritativeness and editorial control (a caution to those who want peer review to be a control gate). But clearly part of Wikipedia’s real success is its ability to motivate and manage an enormous community of volunteers: it remains to be seen how easily others can replicate that feat. Hobbins rightly questions how this will all work with databases that are behind pay walls.
  3. In the five years of Web 2.0, we’ve all learned the value of having a community that can tag, rate, and comment on content. But the network effects here require a certain critical mass to pay off: how would that be accomplished in a field like Biblical studies? How will authors feel having others leave comments directly on their articles (including those of a contrary nature)?
  4. Can such a thing really work out on the open web, or does it need a rich community of resources like Logos to really thrive?

The technical issues aren’t likely to prove stumbling blocks: there are plenty of solutions there. I expect the tough problems will have a lot more to do with community building, rethinking scholarship and publication, clarifying the value propositions and business issues, and gaining traction.

Bob’s Talk at TOC

I blogged a funny story last week about Logos CEO Bob Pritchett’s attendance at the O’Reilly Tools of Change for Publishing (TOC) conference. But here’s a serious comment from Mark Coker of the Huffington Post that warrants quoting (italics are mine):

The Best Presentation at TOC

My favorite presentation of the conference was from Bob Pritchett of Logos Bible Software, in a session titled, Network Effects Support Premium Pricing. I remember attending his presentation four years ago at the first TOC in San Jose, so I knew I didn’t want to miss his presentation this time. They’re doing amazing stuff at Logos. They face an interesting challenge, one that every author and publisher faces: How do you compete against free? In their case, they sell about 10,000 bible study ebooks. How much has the bible changed over the last two hundred years? Not much. But what Logos excels at is making this information more accessible than ever before. They take a database-centric view of their vast and ever-growing library of content.

When you purchase a book from them, you’re not just getting a static ebook, you’re buying into a dynamic, integrated online application environment that becomes richer with each new publication, and with each new member to their community. Even if Bible study isn’t your thing, check them out for future-of-publishing inspiration. I can’t do them justice here.

High praise indeed from somebody who isn’t necessarily into Bible study, but recognizes that what Logos is doing is really quite unique in the entire publishing industry. Our “database-centric views” are only getting stronger, so you can expect to hear more about this in the months to come.

LCV Talk at Semantic Technology Conference

I’ll be giving a talk at the Semantic Technology Conference, June 23 from 7:30AM8:20am (ouch!), in San Francisco, CA. The talk title is “Using a Controlled Vocabulary for Managing a Digital Library Platform“: no talk page yet, but the abstract follows. If you’re there, come by and say hello!

(Astute readers will note some similarities between this and my upcoming BibleTech talk. But the audiences are quite different, so the content will be too. This talk will provide “a practical case study on semantically organizing reference material to support search and navigation, using a controlled vocabulary.”)


Encyclopedias and other subject-oriented reference books frequently present the same content using different names: and users often look for this information using other names altogether.

The Logos Controlled Vocabulary (LCV) organizes parallel but distinct content in the domain of Biblical studies to integrate reference information and support search, discovery, and knowledge management. The LCV captures

  • preferred and alternate terminology
  • inter-term relationships
  • term hierarchy
  • linkage to other semantic information

The initial version of the LCV (now shipping in the Logos digital library platform) comprises some 11,000 terms, and continues to grow as more reference works are added. It also provides the backbone of, a website for user contributions to terminology and content.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future.

Dynamic Textbooks

New York Times article: “Macmillan … is introducing software called DynamicBooks, which will allow college instructors to edit digital editions of textbooks and customize them for their individual classes.” That includes rewriting and deleting individual paragraphs.The effort is hosted at DynamicBooks.

This is yet another step in what Nicholas Carr has called “the Great Unbundling“, freeing the smaller bits of content embedded in print objects like newspapers and books to live their own independent digital lives.

It raises all kinds of interesting questions, some of which are addressed in the NYT article:

  • who controls the changes? (in Macmillan’s case, they claim to not control it, but also that they will “rely on students, parents and other instructors to help monitor changes” and remove inappropriate changes. And how do they decide exactly who qualifies as an instructor?)
  • how does this affect style? (from the article: “there’s a flow to books, and there’s voice to them”)
  • what about divergent points of view? (from the article: “if an instructor decided to rewrite paragraphs about the origins of the universe from a religious rather than an evolutionary perspective, <an astronomy author> said, “I would absolutely, positively be livid.””)

Macmillan’s choice to really put this out in the open is bold: i’m not sure i’d go that far. But i have no doubt that blurring the line of who owns the content is the direction of the future.