God’s Word | our words
meaning, communication, & technology
following Jesus, the Word made flesh
June 16th, 2009

http://ref.ly for Bible References

My colleagues at Logos have launched http://ref.ly, a URL shortening service for Bible references: see this blog post. It provides the convenience of TinyURL (turning long unreadable URLs into something much more manageable), but unlike that service also provides readable, understandable content. Once you get past the prefix, you won’t have any trouble figuring out what verse http://ref.ly/Mk4.9 is referring to.

If you’re a Twitter person trying to shoehorn your message into 140-character tweets, you’ll like the fact that this gives you a brief and unambiguous way to both specify a Bible reference and link to the content behind it (the references resolve to the actual verse text at bible.logos.com). Since addressability matters, this is a good thing.

But it has precisely the same utility even if you’re not a Twitterhead (i’m not):

  • it clearly marks a string of characters as a Bible reference
  • it also normalizes the reference into a form that can be automatically processed

While it’s not quite a microformat, it’s really only a small step away from things like bibleref. In particular, if lots of people start using ref.ly references, it will be possible to process that content and understand things like what verses are most popular.

What’s more, editors that recognize and automatically link URLs (like MS Outlook for HTML-based email, and MS Word) will now automatically make Bible links for you (like RefTagger does for blog posts), as long as you’re willing to tack on “http://ref.ly/” and live with the slightly non-traditional format. You don’t need to know anything about how to make a hyperlink in HTML: just a little extra syntax (14 characters, to be precise) moves these references toward much greater usefulness.

June 12th, 2009

Reading Tab-Delimited Data in Python with csv

I had a head-slapper this morning when i realized i’d been using custom code for a long time to do something that’s in a standard Python module. Here’s the sorry tale, in hopes of saving others from a similar fate.

I regularly use tab-delimited files for data wrangling: it’s a nice, lightweight format for table-structured data, and Excel makes a good enough editor for non-programmers to change things without messing up the format. Here’s a simple example, with a set of identifiers in the first column: a typical use case would be that somebody is editing the second column so you can map old identifiers to new ones.

Old New
Aphek1 AphekOfAsher
Aphek2 AphekOfSharon
Aphek3 AphekOfAram

It’s also very easy to read and write this kind of data in Python:

for row in open('somefile.txt', 'rb'):
    old, new = row.split('\t')
    # do something useful here

So i have a little utility reader module doing only a little more than this, stripping out comment lines, returning a list or a dict, etc., and i use this code all over the place. Then i recently needed to read some CSV (comma separated values) files, and stopped to ask The Question, which every programmer should ask before writing new code:

Hasn’t somebody else solved this problem already?

In the case of reading and writing CSV files, the answer was a quick and clear “yes”: there’s a standard Python module called csv that does just that, and nicely. So, reformatting the earlier data example as CSV would look like this:

"Old", "New"
"Aphek1", "AphekOfAsher"
"Aphek2", "AphekOfSharon"
"Aphek3", "AphekOfAram"

and there’s a nice DictReader method that (assuming your columns are unique and your first row identifies them) makes working with this data even easier.

import csv
reader = csv.DictReader(open('somefile.csv', 'rb'))
for row in reader:
    #do something more useful here
    print row.get('new')

If the first row doesn’t contain column headers, you can supply them to DictReader. This looks like overkill for this simple problem, but once you have multiple columns, need to check values or map them onto something else, or add other logic and processing, life is just much easier with a dictionary structure (for one thing, you get rid of meaningless mystery indexes and stop asking “what the heck is in row[1]”?).

Now comes the embarrassing part: i quickly breezed through the documentation, accomplished my immediate task, and moved on, missing one important detail that i just now (a month later!) figured out. Tab-delimited files are just a special case of a CSV file. My original, tab-delimited file works just the same way, once i construct the reader with tabs (rather than the default of commas) as the delimiter.

import csv
reader = csv.DictReader(open('somefile.txt', 'rb'), delimiter='\t')
for row in reader:
    #do something more useful here
    print row.get('new')

There are a few other gotchas, the most important of which for me is that csv doesn’t handle Unicode. So if you have to read Unicode data, you’re back to reading the data directly, splitting lines on tabs, etc.

The best code is usually the code you didn’t write and don’t have to maintain. No matter how many times i stop and ask The Question, i still don’t do it enough.

June 7th, 2009

Blog Echoes for 2009-06-06

An anthology of interesting posts that passed through my reader this week:

June 2nd, 2009

Social Communication in the Enterprise

These days all the cool kids are tweeting, twittering and telecasting their every thought and activity using services like Twitter and Facebook. While i’m usually willing to give new technologies a try, i’ve been hesitant to take the plunge into Twitter. It’s not like i need additional sources of distraction: i’m perfectly capable of losing my focus all by myself.

In some respects, Twitter seems like another replay of several earlier IT pyramid schemes: first bulletin boards, then websites, then blogs. The earlier adopters get the lion’s share of attention, and the people who follow them get the initial benefit of more information and (maybe) more access. But as the pyramid grows over time, the volume of communication becomes unmanageable. Then we evolve new schemes for managing the flood: bookmarks, feeds and feed readers, tags spaces (hash tags for Twitter), and other meta-schemes. At some point we ought to stop and ask whether the new communications services and the additional complexity and overhead they impose provide enough benefit to justify their cost.

I’m humble enough to acknowledge that many smart and productive people say they get a lot of benefit from Twitter, so maybe i really am missing something by remaining doggedly tweet-free. But the latest issue of eWeek (you may need to download their reader to access it) has a nice review of several enterprise-oriented social communication services (SocialCast, Socialtext, and Huddle) that’s making me rethink my position. These have several distinct differences from Twitter and Facebook that make them more interesting to me:

  • Narrower scope of communication: While broad services like Facebook and Twitter provide access to a universe of information, that can become overwhelming, particularly when the universe is talking back at you. Limiting the conversation to what’s happening in our company has a lot of appeal.
  • Better business information: getting the right mix between quantity and focus about what’s going on in your company has always been a difficult challenge in my business career. The people who have the most to say are also very busy, and of course they already know what they know: so there’s asymmetry, with a cost to them in producing information while most of the near-term benefits accrue to their listeners.At the same time, there’s no easy way to predict what information might be useful to whom, so generating lots of it makes sense provided there are effective ways to filter it.
  • Distributed leadership: personnel at all levels have useful things to offer your company, if you can just break them out of the stovepipes of departmental structure and management hierarchy.

Here’s how i might see this playing out at Logos. Our business structure is largely typical for a software company of our size: there’s a sales and marketing division with some people who travel a lot and who (along with inside sales) provide a lot of our revenue, a customer service group that deals with users and their problems, the programmers who make the application work, a smaller group (where i work) that’s developing new features and data sets, a variety of support and infrastructure people who keep IT, finance, etc. running smoothly, and of course a management team that’s steering the corporate ship. All of these groups have front-line access to information that might help the rest of us do a better, given the right kinds of access.

One of the challenges is to keep the information we all broadcast a little more targeted. While some amount of personal interest makes all the hours we spend at work more fun, too much information about somebody’s disappointment that their favorite team lost, or how much they hate paperwork, the lousy sandwich they had for lunch, etc. would start to make this just like Facebook (and i don’t think having employees reading Facebook during work time is a good strategy for productivity). But i’d love to hear:

  • from front-line sales people: a conference sales pitch that really hit home with people; the reason somebody just gave for purchasing our software (or not purchasing it); a complaint from a loyal customer
  • from customer support: the common problems that get raised over and over, and that would make make for a much happier user experience if we fixed them upstream
  • from support staff: the challenges that we can all help with
  • from programmers and from R&D: exciting new discoveries, tips and tricks
  • from management: new things we’re learning and thinking, big picture business issues, where we see our business heading

Many of these things get communicated now, just more formally, and therefore less frequently, and to a more selective audience within the enterprise. Moving toward Twitter-style microcontent, just a little more focussed, might provide the right mix to get me tweeting.