My Django Talk at LinuxFest

Apparently i neglected to let Blogos readers know that i was speaking at LinuxFest Northwest this past weekend: my bad! My talk was a basic practical intro to Django, the Python-based web application framework, entitled “From 0 to Website in 60 Minutes – with Django“. Since Django is touted (rightly in my view) as a highly-productive way to do web development, what better way to demonstrate that than to actually build a functioning database-backed website in the course of the talk?

It was a pretty ambitious goal, and i had to take a few shortcuts to pull it off (like starting past the boring stuff, with Python/Django/MySQL already installed, and data ready to go). But i think i can fairly claim to have delivered what i promised. We walked through an application that’s been a side-project for the Whatcom Python Users Group, a web version of Sustainable Connection‘s Food and Farm Finder brochure. It’s a nice simple learning example, well-suited to tutorial purposes. I’d say there were at least 40 or so in attendance, many the kind of beginners i was trying to focus on. And even though the time slot turned out to only be 45 minutes, I finished with several minutes to spare (in retrospect, i could have gone a little slower).

Slides are here, along with the data you need to follow them on the main page for the talk. I have audio of the talk that i’ll post in the next day or two once i’ve cleaned it up a bit: then it will be almost like being there (though without the ability to make sense of the “skeleton” joke). I was glad to have the opportunity to shine a little light on Django and repay a tiny portion of the debt of gratitude i owe its creators, since it’s been a major productivity boost in my work at Logos.

The Definitive Guide to Django Here’s another reason why i give talks whenever i get the chance: you always learn more when you teach others. As a concrete example, i was reminded while prepping the talk that Django’s template framework, while primarily designed around HTML generation, is quite general and therefore capable of generating other data formats as well. At work, i’d built up an entire module of custom code around serializing Bible Knowledgebase data as XML for internal hand-off to our developers. Re-reading the Django book gave me the idea of using Django templates to do this instead. In fairly short order, i was able to rewrite my test example, 80 lines of custom code, with a single clean template and 20 much simpler lines instead.

A Python Interface for api.Biblia.com

Last week Logos announced a public API for their new website, Biblia.com, at BibleTech. Of course, i want to wave the flag for my employer. But i’m also interested as somebody who’s dabbled in Bible web services in the past, most notably the excellent ESV Bible web service (many aspects of which are mirrored in the Biblia API: some previous posts around this can be found here at Blogos in the Web Services category). Dabblers like me often face a perennial problem: the translations people most want to read are typically not the most accessible via API, or have various other limitations.

So i’m happy with the other announcement from BibleTech last week: Logos is making the Lexham English Bible available under very generous terms (details here). The LEB is in the family of “essentially literal” translations, which makes it a good choice for tasks where the precise wording matters. And the LEB is available through the API (unlike most other versions you’re likely to want, at least until we resolve some other licensing issues).

I don’t want to do a review of the entire API here (and it will probably continue to evolve). But here are a couple of things about it that excite me:

  • The most obvious one is the ability to retrieve Bible text given a reference (the content service). Of the currently available Bible versions, the LEB is the one that interests me the most here (i hope we’ll have others in the future).
  • Another exciting aspect for me is the tag service. You provide text which may include Bible references: the service identifies any references embedded in it, and then inserts hyperlinks for them to enrich the text. So this is like RefTagger on demand (not just embedded in your website template). You can also supply a URL and tag the text that’s retrieved from it. One caveat with this latter functionality: if you want to run this on HTML, you should plan to do some pre-processing first, rather than treating it all as one big string. Otherwise random things (like “XHTML 1.0” in a DOCTYPE declaration) wind up getting tagged in strange ways (like <a href="http://ref.ly/Mal1">ML 1.0</a>).

I’ve just started working through the Biblia API today, but since i’m a Pythonista, developing a Python interface seemed like the way to go. This is still very much a work in progress, but you can download the code from this zip file and give it a whirl. Caveats abound:

  • I’ve only implemented three of the services so far: content() (retrieves Bible content for a reference), find() (lists available Bibles and metadata), and tag() (finds references in  text and enhances it with hyperlinks). And even with these three services, i haven’t supported all the parameters (maybe i will, maybe i won’t).
  • This is my first stab at creating a Python interface to an API, so there may be many stylistic shortcomings.
  • Testing has also gotten very little attention, and bugs doubtless remain.

If you’re interested and want to play along, let me know: we can probably set up a Google group or something for those who want to improve this code further.

LinuxFest Registration is Open

Im going to Linxufest Northwest 2010 April 24-25th If you’re in the Bellingham area, Linuxfest is coming up, and registration is now open. This is a great opportunity to learn more about Linux, Open Source, and a variety of other technical subjects — and it’s free!

Yours truly is hoping to give a talk on using the Django web-application framework for rapid web site development: “From 0 to Website in 60 minutes – with Django“. Please sign up and attend!

Reading Tab-Delimited Data in Python with csv

I had a head-slapper this morning when i realized i’d been using custom code for a long time to do something that’s in a standard Python module. Here’s the sorry tale, in hopes of saving others from a similar fate.

I regularly use tab-delimited files for data wrangling: it’s a nice, lightweight format for table-structured data, and Excel makes a good enough editor for non-programmers to change things without messing up the format. Here’s a simple example, with a set of identifiers in the first column: a typical use case would be that somebody is editing the second column so you can map old identifiers to new ones.

Old New
Aphek1 AphekOfAsher
Aphek2 AphekOfSharon
Aphek3 AphekOfAram

It’s also very easy to read and write this kind of data in Python:

for row in open('somefile.txt', 'rb'):
    old, new = row.split('\t')
    # do something useful here

So i have a little utility reader module doing only a little more than this, stripping out comment lines, returning a list or a dict, etc., and i use this code all over the place. Then i recently needed to read some CSV (comma separated values) files, and stopped to ask The Question, which every programmer should ask before writing new code:

Hasn’t somebody else solved this problem already?

In the case of reading and writing CSV files, the answer was a quick and clear “yes”: there’s a standard Python module called csv that does just that, and nicely. So, reformatting the earlier data example as CSV would look like this:

"Old", "New"
"Aphek1", "AphekOfAsher"
"Aphek2", "AphekOfSharon"
"Aphek3", "AphekOfAram"

and there’s a nice DictReader method that (assuming your columns are unique and your first row identifies them) makes working with this data even easier.

import csv
reader = csv.DictReader(open('somefile.csv', 'rb'))
for row in reader:
    #do something more useful here
    print row.get('new')

If the first row doesn’t contain column headers, you can supply them to DictReader. This looks like overkill for this simple problem, but once you have multiple columns, need to check values or map them onto something else, or add other logic and processing, life is just much easier with a dictionary structure (for one thing, you get rid of meaningless mystery indexes and stop asking “what the heck is in row[1]”?).

Now comes the embarrassing part: i quickly breezed through the documentation, accomplished my immediate task, and moved on, missing one important detail that i just now (a month later!) figured out. Tab-delimited files are just a special case of a CSV file. My original, tab-delimited file works just the same way, once i construct the reader with tabs (rather than the default of commas) as the delimiter.

import csv
reader = csv.DictReader(open('somefile.txt', 'rb'), delimiter='\t')
for row in reader:
    #do something more useful here
    print row.get('new')

There are a few other gotchas, the most important of which for me is that csv doesn’t handle Unicode. So if you have to read Unicode data, you’re back to reading the data directly, splitting lines on tabs, etc.

The best code is usually the code you didn’t write and don’t have to maintain. No matter how many times i stop and ask The Question, i still don’t do it enough.