God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
January 31st, 2009

Search Engine Optimization for Blogs and Non-profits

I listen to as many podcasts as i can, usually as a way to keep my mind engaged while my body is otherwise occupied with things like vacuuming, exercising, or taking long drives. I’m a glutton for ideas, so for me it’s a great way to spark creativity and explore new interests, usually in the realm of new technology. Some of my favorites feeds:

A recent IT Conversations podcast was on Search Engine Marketing, a discussion with Mike Moran and Bill Hunt, authors of the book Search Engine Marketing Inc. A lot of their discussion focuses on companies whose web presence provides real revenue, and who therefore have a strong financial motivation to think hard about Search Engine Optimization (SEO). They’ve got some good advice: focus on content, check your description, write an article that solves a real problem (so others can link to it and build your web rank). But there are still plenty of us producing blogs (like Blogos) and open resource websites (like SemanticBible) whose motivation may be different: and SEO still matters for us.

If you’re reading this, in marketing terms, you’re a potential customer of my “brand”, and each web page or blog post i create involves, at one level, a marketing activity directed at you. I don’t get any revenue from my readers: my only half-hearted attempt at this is when i remember to put my Amazon Associates tag in a book recommendation (and to my knowledge, that’s never paid off). I don’t do ads either. I do, however, get something less intangible, but perhaps more important: blogging enhances my digital identity, including my reputation. If you’re in a high tech field, your on-line identity is becoming as important a representation of you to prospective employers as your resume. In my case, my unpaid activities of blogging, conference speaking, and web site development led pretty directly to my current work at Logos.

Of course, given the wide-open nature of web search, there are plenty of people who get to my blog for unrelated reasons. While i don’t want to repeat them here and perpetuate the problem, at one point a popular set of keywords leading people to Blogos had to do with my quoting some news story about home-manufactured, uh, pharm-a-sue-tickles. While it’s possible some of those misdirected searchers found some higher knowledge, most of them probably spent one second’s attention before clicking away. Moran and Hunt make a really good point here: these people are not “good customers”, and you’re not helping them or yourself by trying to attract them. Instead, they recommend you think carefully about what makes your site or blog distinctive: what are the target keywords you want to attract? Then determine a strategy for “owning” (to the extent possible) the search results for those keywords.

Example: with Google, i’m #1 among the 750k results for “semantic bible” (entered without quotes). I’m #3 for “hyperconcordance” (a modest achievement given there are only 5000 results). A two-year old post is #4 for blogos (but not the home page?? i must be doing something wrong): since blogging has become more popular, so has the name (though i was there first).  But i’m not even in the top 50 for “digital bible”, even though those are important keywords related to my content. Given all the competition in that space, it would take enormous effort to achieve a high ranking there. In this case, my efforts are probably better spent elsewhere.

There are plenty of free resources out there: Moran’s Skinflint Search Marketing is a good place to begin, and Google Analytics already provide far more capability than i know how to take advantage of. Which brings me back to the real challenge of doing SEO for non-profit sites: deciding how much effort is really worth it. But if nothing else, thinking about SEO gets you thinking about what your site is for in the first place, and that’s always a good thing to keep in focus.

January 23rd, 2009

Tools for Personal Knowledge Management

If you make your living as a carpenter, you’ve probably invested a lot of money in professional-grade tools you depend on every day. If you’re a knowledge worker, you have the same need for professional-grade tools, but it’s not a simple as going down to Home Depot to find what you need.

For one thing, the kinds of knowledge you have to manage may be widely varied, and a tool that’s good for one thing may not work well for others. More tools become available all the time, and it’s not always easy to separate the hype from the practical benefit. Though geek types like me are often attracted to high-tech solutions, sometimes “right tech” (or even, horrors, low-tech) approaches provide most of the benefit at a fraction of the cost (which is usually measured in time and effort rather than money).

I had a conversation today with someone who had learned about my experience with some Semantic Web technologies, and wondered what i thought about different approaches to mapping out complex knowledge. As we talked, i reflected on several different kinds of knowledge that i manage almost daily:

  • Bookmarking web sites in your browser doesn’t scale well beyond a hundred items or so: nearly everybody who’s organizing web knowledge these days has moved up to something like del.icio.us (here are my tags, which i see recently topped 1000!), Ma.gnolia, or something else with dots in its name.
  • For organizing notes on projects, i like to have small chunks of content with hyperlinks in a Wiki. I’ve tried a couple of systems, but Tiddlywiki is still the winner: it’s lightweight, it lives in my browser, and it’s got a great bang-to-buck ratio. I use this blog in some similar ways: even if nobody else reads my posts, it’s useful for me to organize my thoughts in writing them, and it also provides a more persistent record of my thoughts and wanderings across the web.
  • Bibliographic references: i use CiteULike for academic work, though it’s much easier to capture references than it is to find time to read them! A major benefit is CiteULike’s ability to easily import reference details rather than requiring me to type them. Similar systems include Zotero, Connotea, Citeline (which is cool because it harnesses some of the technology from the MIT Simile project).
  • Things that take longer to read than a simple web page, but have a shorter life span than academic publications, get queued on InstaPaper.
  • If you want a social bookmarking service with Semantic Web and natural language technology under the hood, Twine is an interesting new player. I haven’t had time to review its capabilities carefully yet, but i’m intrigued.
  • If you already know that you really want data structured as triples, and you already think that way, Turtle is a much easier way to manually author data than the RDF/XML syntax. But be warned that you’ll also need to invest considerable time in learning tools for parsing, storing, etc.: the geek threshold for Semantic Web technologies is still quite high.

A few more general thoughts:

  • If your data and knowledge will have a long lifespan, beware of lock-in. It’s not just the ease of entering knowledge, but what you can do with it afterward, including taking it someplace different altogether, that counts. People often regret their time investment in Facebook when they realize they can’t just pack up their data and move elsewhere (some comments by Dave Winer are typically on-target here). These systems are like a roach motel: data checks in, but it doesn’t check out.
  • You need to think like a carpenter about your knowledge management tools: what do you need them to do, how much investment is appropriate, etc.? The right tools can provide an enormous boost in your productivity, but they can also become masters rather than servants if you’re not careful.

Some related past posts:

January 19th, 2009

Semantic Search in the Gospels

Cognition uses “Semantic NLP, the Company’s patented linguistic meaning-based text processing technology” to process natural language text and make the information in it searchable by meaning rather than simply by word. They’ve recently released a demo based on the Gospels and associated notes from the NET Bible.

Dr. Kathleen Dahlgren, their founder and CTO, has been working in the field of NLP for a long time, so this is not some newly-launched startup with more hype than substance. Their underlying technology represents an enormous investment in the linguistic data required for actually understanding language. Having worked in closely-related fields for most of my pre-Logos career (and having thought quite a bit about things like this for Bible study and search), i was very curious to take it for a spin and see how well it does. While they correctly claim that there’s a lot of figurative language in the Gospels, there’s also plenty of plain narrative description that ought to understandable.

Not surprisingly, the examples on their demo page look reasonably good (that’s what you do when you put together a demo, after all). “Who double-crossed the Lamb of God?” is a clever way to show off their ability to recognize double-cross as a synonym for betray, and Lamb of God as an alternate designation for Jesus. I might quibble with “blessed are the pure in heart” (Matt 5:8) as a hit for “blessed are the innocent”, but it’s clearly on the right track.

But they also allow you to try your own queries, which is where you can really see whether this approach helps or not. Some queries i tried:

  • “a valuable pearl” comes up empty. Just searching for “pearl” finds Matt 13:45-46, but not finding “a pearl of great value” as a valuable pearl seems like a definite lack of understanding. Just searching for “valuable” finds a great many hits (remember this includes the NET Bible notes as well as the text), but some of the senses it retrieves don’t seem like a good fit for “valuable”: for instance, ” a major category of meaning”, “an aorist main verb”, “is redundant” (?), “is not being critical of”. I understand why some of these matched, but they don’t convince me that there’s deep understanding going on.
  • “good soil” also comes up empty, even though this phrase occurs verbatim in Luke 8.15.
  • “a herd of swine” gets in the neighborhood: it apparently bridges the gap between swine and pig, and finds Matt 8.31 (apparently getting to “drive” from “herd”?), and some other notes related to “herdsmen”. But surprisingly it misses Mark 5.11 which has “a herd of pigs”.
  • “Peter’s brother” first tries the interpretation of “brother” as “member of a religious order” (!), but there’s a nice interface where you can choose alternate senses. After selecting the “sibling” sense, it does better, though the results aren’t always appropriate (e.g. Matt 17.1).
  • You can try questions like “Where did Jesus live?”, though the responses look like it’s merely searching on individual content words, not the semantics of the proposition. “Where did Herod live?” brings back a few interesting results where “live” has been connected to “palace”, which then results in helpful information because his palace was in Jerusalem.

Finding a use case for this particular demo comes down to finding an interesting intersection of several requirements: how many queries are there that

  • you’d actually want to look for
  • you couldn’t easily find based on the words alone
  • don’t require synthesis or reasoning (that’s really asking too much of this technology)

It was harder than i thought to come up with cases like this, and for most of them, the results still left something to be desired. But all critique aside, kudos to Cognition for being brave enough to put their technology out there and letting the results speak for themselves. Real understanding of text is an extremely difficult task: it looks to me like Cognition has made substantial progress, though the problem is still far from solved.

January 14th, 2009

XML Schema with Optional, Unbounded, Unordered Elements

This is so obscure i hesitate to blog about it, except that it took me so long to figure out that i’d love to save somebody else the trouble. You won’t care unless:

  • You’re designing an XML Schema definition (.xsd) to validate an XML file
  • You’re defining an element to contain regular text, or multiple elements, in any order, from zero to many times

Here’s an example: suppose you have a plain text description of events that includes people, places, and Bible references.

Jesus heals Simon’s mother-in-law (Matt 8:14-17; Mark 1:29-34; Luke 4:38-41)

You want to link person references with a Link element, Bible references with a Reference element, and otherwise leave the plain text as is. This results in something like this (using square brackets since otherwise WordPress gets confused):

[Link]Jesus[/Link] heals [Link]Simon[/Link]‘s mother-in-law ([Reference]Matt 8:14-17[/Reference]; [Reference]Mark 1:29-34[/Reference]; [Reference]Luke 4:38-41[/Reference])

Now imagine several of these in the same element, so potentially you can have any arbitrary sequence of Links, References, and plain text, in any order, any number of times. Describing this with a BNF grammar is trivial:

LinkRef ::= Link | Reference
TextItem ::=  ( text | LinkRef )+

A cursory reading of the XML Schema description (which i’d never actually done before, instead depending on XMLSpy which generally lets me avoid thinking that hard) might make you think grouping models like sequence, choice, and all in conjunction with attributes like minOccurs and maxOccurs would do what you need. But there’s a surprisingly complex set of interactions between these, that i still don’t really understand, and so what seemed so simple proved surprisingly hard. Here are a few examples of what i tried, where XMLSpy’s validation model for XSD files (which i’m assuming is correct) wouldn’t allow it:

  • while all is for an unordered group of elements, it’s restricted to maxOccurs=1. So it doesn’t handle unbounded occurrence (though it does allow minOccurs=0, e.g. optionality). Furthermore, it can’t be nested inside other model groups like sequence.
  • choice groupings can be neither optional nor unbounded.
  • trying to specify multiple occurrences of both Link and Reference, each both optional and unbounded, is flagged as an ambiguous model.

The solution i finally discovered (after embarrassingly many other permutations, more by trial and error than anything else):

  • define a LinkRef group that allows a sequence of either Link or Reference, both optional and unbounded (zero to many occurrences)
  • the TextItem (enclosing parent) element allows an optional and unbounded sequence of LinkRef groups.

For the more visually oriented, here’s how it looks in XMLSpy:

TextItem and LinkRef Grouping

January 8th, 2009

Addressability Matters

Ever since Adam named the beasts (Gen 1:19-20), labels have mattered to humanity: it’s pretty hard to hold a conversation if you have to start with “you know that really big gray beast with the cute little ears that sits in the river all day with just its eyes showing?”, instead of just “hippopotamus”.

Information on the web works the same way. Most (but not all!) web pages have the equivalent of a name, their Uniform Resource Locator (URL), which tells your browser how to bring up the page. But too many conversations about web pages are still like the hippopotamus conversation: “just go to www.frooble.com, then type ‘shebang’ in the search box, and look about half-way down the page on the left side …”. In other words, that little tidbit of information isn’t addressable: i can’t give you a name for it, i can only tell you to travel over the river, through the woods, and then turn left at the 3rd oak tree.

Though there’s usually no good technical reason, this is still so often true for our web-enabled world. For example, i admit to my chagrin that i only just now figured out the URL for my Facebook profile, even though i had looked for it (half-heartedly) several times previously. (I happened to stumble over somebody else’s, saw the pattern, and then plugged my own name and ID in the URL instead). Having a URL that’s both explicit and understandable enables this kind of URL hacking, which is a really powerful technique.

Here’s a small example (combined with a shameless plug). The HTML designers for the upcoming Bible Tech conference have added page targets for speakers to the Speakers page. So even though there’s one long list, you can get to just the right spot on the list by following the link to my talk. And if i show you the URL

http://www.bibletechconference.com/speakers.htm#SeanBoisen-2009

and explain the schema ([baseURL]#[Firstname][Lastname]-year), you can get to my talks from last year too. That’s a nice bit of design, and part of a much larger and important architectural practice called Representational State Transfer or REST. As another example, you can probably figure out how to change this URL

http://bible.logos.com/passage/NIV/Ge 2.19-20

to get you to Mark 4.1-12 in the ESV instead (though you might stumble if you use a colon instead of a period to separate chapter and verse).

A lot of important things only become possible once you start to provide names for your resources. That’s a big part of the justification for the complex tangle of ideas called the Semantic Web, or if that’s too high-falutin’ for you, just call it smarter web design for information integration.

PS: i realized later it wasn’t just that i couldn’t figure out how to construct a Facebook URL: you have to make a badge first to get an addressable URL, which seems pretty non-obvious!

|