Review: Beautiful Data by Toby Segaran and Jeff Hammerbacher

Segaran (author of the highly-recommended Collective Intelligence) and Hammerbacher have collected together a diverse set of essays on data collection, visualization, processing, and analysis. What interested me most was the wide variety of application areas in which data is the “secret sauce”. The essays range from broadly philosophical to deep in the technical details: so you’re likely to find something at your level of interest (though that also means that much of the book may not hit your level).

Jeff Hammerbacher’s chapter on Information Platforms and the Rise of the Data Scientist is a good example. It discusses Facebook’s history of scaling its data storage and analysis capabilities, starting with custom scripting based on SQL, moving to data warehousing and then beyond to Hadoop and related tools. “More data, simple models” is the processing style that characterizes many such Big Data enterprises today.

Other valuable chapters for me:

  • Data Finds Data (Jeff Jonas and Lisa Sokol)
  • Natural Language Corpus Data (Peter Norvig)
  • Connecting Data (Toby Segaran)

While you’re not likely to find a solution here to specific technical problems, there’s a good chance you’ll find something either to broaden your horizons or give you some new ideas. Definitely recommended.

(Disclosure: I received a free copy of this book through O’Reilly’s Blogger Review program.)

Seasons and Cycles in your Bible Study

Listening to a podcast by Justin Maxwell for CHI Conversations* raised an interesting question. He was talking about how we all have cycles and mood changes in our lives that affect our interaction with software: the lunch time at the gym, the afternoon doldrums. Based on his previous work with Mint, there were big differences in people’s interactions around paydays, when there’s both a large inflow of money and a lot of bills to be paid. College students tend to break up more frequently before Thanksgiving and Christmas.

Which leads to the question: what are your cycles and moods with respect to Bible study (whether via software or print)? For most of my life i’ve considered morning the optimum time for personal Bible reading: my mind is fresh, and i can take those thoughts with me into the day. Sunday morning (and maybe Wednesday Night Bible Study for the churches that still have them) are obvious times of higher activity. What about Saturday morning compared to weekday mornings: more or less usage? More Bible reading during Lent?

In the digital age, Bible search engines and programs that talk to the cloud have the potential to identify some of these variations. For example, someone associated with one of the large Bible search sites told me they saw a spike in usage in the late hours of Saturday night (pastors preparing their sermons?). We’ve seen some similar upticks in various websites operated by Logos, though i haven’t been able to do a careful analysis.

So i’d be interested to learn more about the cycles, moods, and seasonal fluctuations of Bible reading and study. I’m interested in your own personal reflections, but even more in any studies or data you might be aware of.


*“CHI Conversations covers Computer/Human Interaction, including design, human factors, cognitive psychology, social science, and more.”

Review: The Productive Programmer by Neal Ford

This is a great grab-bag of detailed tips (“Mechanics”) and general approaches/philosophies (“Practice”) for helping serious programmers be more productive (this isn’t a book for the average user). Most programmers know that the difference between an okay developer and a great one isn’t fractional, it’s an order of magnitude or more. The ideas here are part of that body of knowledge that makes for great programmers.

Many readers will find sections where they say “yeah, I know this stuff”: if so, pat yourself on the back as a seasoned developer. But more likely you’ll find at least a few tips worth trying, or be reminded of something you never took the time to try out (but should have: how did I miss multiple desktops for Windows?). Those little gems are worth the price of this book, and you can easily skip the rest. The key to books like this is to set aside a little time each day for improving your craft.

Along the way, Ford’s notes supply zen-like snippets of programmer wisdom:

  • “Search is faster than navigation”
  • “Don’t spend time doing by hand what you can automate”.

and dozens of others. You’ll even learn a little history about Aristotle, Occam, and other subjects. Definitely recommended (if taken as directed).

I review for the O'Reilly Blogger Review Program [Full disclosure: I received a free copy of this book as part of the O’Reilly Blogger Review Program. But i would have read it anyway. ]

Weekly Roundup – 2010.11.12

From time-to-time i find things of interest: blogging them here helps me hang on to the data and conclusions, and might be of interest to others too.

“… the  death of the printed book, at least on campus, has been greatly exaggerated …”

According to a study from the National Association of College Stores (not necessarily an unbiased source):

  • only 13% of college students purchased an electronic book of any kind during July-Sept
  • just over half of those were primarily for required class materials
  • 92% of students indicate they don’t own an e-reader
  • of those who had purchased an e-book, 3/4s used it on a laptop or netbook

Google Books “About this book” Feature

This post about the Books in Browsers conference pointed out a feature of Google Books that apparently many folks, including me, haven’t paid much attention to: the “About this book” page. That page for DeSilva’ “An Introduction to the New Testament” includes, in addition to the reviews and related books links (which are also on the main page):

  • a (noisy) contents list with hyperlinks
  • a word cloud of common terms and phrases, which link to full-text search
  • popular passages that appear in other books (here they’re all quotations from the Bible!)
  • References to this book from other books and Google Scholar
  • A Google Map of places mentioned in the book. It clearly has some smarts, but placename extraction and normalization is a very hard problem: for example, “Emmaus” is linked to a city in Pennsylvania, not the appropriate place in Palestine.
  • Links to other books by this author, and with the same subject index terms (e.g. “Religion/Biblical Criticism & Interpretation/New Testament”)
  • Buttons to export the citation in several formats

That’s quite a wealth of information! (Apparently you only get it for books with previews?)

Information Moving To The Web

Xerox Star 8010 We all know there’s a massive shift of information onto the Internet, with Google Books scanning whole libraries, more content being born digital, the transformation of digital libraries, and tera-peta-exa-zeta-yotta-yadayadayada-bytes of data going online. But somehow, those abstract notions don’t have quite the same tangible impact as actual physical artifacts (like books) with their connections to our personal histories. Here’s how this hit home for me today.

I first got interested in computational linguistics around 1979, when i was finishing up my degree at Occidental College (an independent major combining linguistics and anthropology) and playing around with computers. Later, as a graduate student in linguistics at UCLA, i attended my first academic conference in the field: COLING 84 at Stanford, a combined gathering of the 10th International Conference on Computational Linguistics and the 22nd Annual Meeting of the Association for Computational Linguistics. It was a pretty heady experience for this young man: i still remember playing with the bit-mapped graphics on what i think was a Xerox Star, one of the earliest commercial systems with many of the display and interface innovations that are commonplace today.

I brought back the proceedings, a hefty volume about 3cm thick. Later i joined the Association for Computational Linguistics, which included getting the journal Computational Linguistics, and over the course of my 19 years with BBN Technologies i attended many annual meetings and other workshops, collecting proceedings all the time (they started distributing them on CDs around 2000). COLING 1984 Proceedings I have close to a complete collection of the journal for many years (dozens of volumes). I count 16 proceedings volumes, typically several cm each. All told. these were taking up about a meter of shelf space in my office, as they have for the last 10 years or so (the last one i have is from 2000, which is about when i got more involved in management and had a harder time justifying these kinds of technical conferences).

Today, casting about for a place to put some new books i’d acquired, i looked at these journals and proceedings, and had an epiphany. I googled a few articles: sure enough, they were all on-line. In fact, the journal became open access in 2009, and they’ve put all the back issues on the web as well. The ACL Anthology hosts thousands of computational linguistics papers, and they’ve provided digital versions of all the proceedings i have (and many many others). So all of a sudden, i realized i had a meter of useless paper volumes on my bookshelf.

You might wonder what took me so long. I do too: I guess one answer is simply inertia. I’ve had these volumes on my shelves for so long i hadn’t gotten around to reconsidering whether i really needed them. I’m also an information omnivore, so i’ve always been reluctant to just give them up (though i couldn’t tell you the last time i actually cracked the cover on one). In part, I suppose another reason is that having a shelf of professional journals and proceedings makes me feel smarter (silly though that sounds when said out loud): it’s evidence of many years of commitment to the field. In the digital age, these markers of industriousness are becoming as scarce as the artifacts themselves. 20 Years of Journals and Proceedings

Some of these volumes have moved with me many times, from Los Angeles to Massachusetts when i took my first research position with BBN (1987), through various office moves there, when we moved to Maryland in 2000 (and more internal moves there), and when we moved to the northwest to work for Logos in 2007. That first COLING volume has been on my office bookshelf as long as i’ve had an office with bookshelves! But, with ever more information on-line (and much more findable and useful there), new books that need to find a home, and doubtless other office moves ahead … it’s time to let go and continue the march into the digital future.

Resources for Distance Education

My colleagues and I met yesterday with some folks from a seminary who are interested in setting up a distance education program. I did a few blog posts about this subject several years back when i was taking some courses toward a Masters in Distance Education through the University of Maryland University College. After moving to Logos, i didn’t continue in the program, but it’s an area i’m still very interested in, and most of those posts aren’t too relevant now (possibly excepting my brief reflections on whether the Apostle Paul counts as an early distance educator).

In our discussions, the question arose: what’s the one book you’d recommend we read to learn more about distance education? I don’t have an authoritative answer, since i haven’t kept up with the literature for several years now: probably there are better resources now that I’m not familiar with. But here’s my answer anyway, in case it’s helpful to others:

At the top of my list would be Distance Education: A Systems View by Michael Moore (not, not that Michael Moore). Chapter 5 is now made mostly irrelevant by the Internet, but otherwise it’s a good overview of the wide variety of issues that go beyond how you distribute content.

There are a few other titles, all with good content, though perhaps more academic and not as easy to read, or less broad.

  • Learning and Teaching in Distance Education (Otto Peters) is by one of the pioneers in the field (and therefore not completely up to date). My recollection is it focused more on the learning and teaching sides of the process, with less about administration and larger issues
  • Mega-Universities and Knowledge Media (John Daniel) focuses more on the role of technology in education, and has a good chapter on the economics involved.

Though it’s not about distance education per se, i’d also have to include Brain Rules by John Medina. This is a very approachable overview of some important findings in brain science and their practical application to every day life: why you should not talk on your cell phone while driving, how we remember and learn, the myth of multi-tasking, and so forth. It’s both engaging and good science, and i’d make it required reading for every professor/pastor/teacher.

Connecting Christian History to Present Issues

Thanks to my scholarly wife, i receive the weekly Christian History Newsletter (at $12/year, it’s a bargain). One of the articles in today’s issue is entitled “Sasquatches, Unicorns, and . . . the History Assignment that Works“. The title alludes to the challenges teachers face in helping students connect their studies of the past to the issues in the church today. Chris Armstrong, the Bethel Seminary professor who authored the article, has found the assignment he describes to consistently produce high-quality reflection from students that helps them integrate their academic learning (in this case, a course surveying church history ) with contemporary Christian challenges.

Follow the link above for the details (they’re worth reading), but here’s an abbreviated outline:

  1. “Find a single issue in the church today that concerns you personally.”
  2. “Find a single historical crux—that is, a single document, single event, single person’s idea, etc.—from church history in which some version of that same issue emerges …”
  3. “Study that historical crux (document, event, person’s idea, etc.) by reading a balanced bibliography of primary and secondary sources …”
  4. Write a paper addressing the following three points:
    1. Describe your contemporary issue in detail, “… as if you were writing a brief editorial article for Christianity Today.”
    2. “… write a summary/analysis/interpretation of how that issue played out at your chosen historical crux.” (several important additional details here)
    3. Write a conclusion in “your Christianity Today editorial style”.

Does it seem crazy to suggest you write a paper if you’re not required to by some formal academic program?!? Maybe, but current research in learning theory strongly suggests you learn concepts much better when you write about them — writing for learning. So it’s not really about a grade for a course, it’s about your personal education (e.g. discipleship) in  what it means to follow Jesus today, based on knowing more about what’s happened in church history. This kind of writing is one of the reasons i blog: things simply stick better in my head when i take a little time to think them through and communicate them in writing. So you could always blog your response (if so, give it some distinctive tag like christianhistory so it’s more findable).

Richard Baxter on the Need for Personal Study

I’ve been reading J.I. Packer’s A Quest For Godliness (in Logos), his attempt to reacquaint the modern Christian world with the works of the Puritans who have been so influential and are yet so little known.

A Quest for Godliness

This morning’s readings included some discussion of how Richard Baxter put knowledge ahead of emotion in his teaching: “first light — then heat.” To the imagined objections of his working-class congregation that ‘We are not learned, and, therefore, God will not require much knowledge at our hands,’ Baxter answers with several arguments (whose language i’ve updated slightly: the selection is by Packer) as to why laypersons have as great a responsibility as scholars to increase their understanding of God and the Christian life.

  1. Every individual should know that they are created by God, and the purpose of their life, as well as the way to individual happiness, as well as a scholar does. Do you not have souls to save or lose, as scholars do?
  2. God has shown His will to you in the Bible; he has provided teachers and many other aids; so you have no excuse if you are ignorant. You must know how to be Christians even if you are not scholars. You may find the way to heaven in English, even if you have no skill in Hebrew or Greek: but in the darkness of ignorance you can never find it.
  3. … if you think, therefore, you can be excused from knowledge, you might as well think you can be excused from love and from all obedience: for there can be none of this without knowledge… If you were as interested in the knowledge of God and heavenly things as you are to know your career or profession, you would have started learning it before today, and you would have spared no cost or pains until you had it. You think seven years little enough to learn your trade, and won’t spend one day in seven diligently learning the matters of your salvation.

and one closing comment:

If heaven is too high for you to think on, and to provide (prepare) for, it will be too high for you ever to possess.

Packer, J. I. (1994). A quest for godliness : The Puritan vision of the Christian life (70). Wheaton, Ill.: Crossway Books.

Review: Saving the World at Work

Tim Sanders declares a “responsibility revolution” in business, where employees can help the companies they work for do a better job of helping individuals, society, and the environment, while staying focused on their business mission. He recounts numerous anecdotes of companies that have improved their contributions to ecology, sustainable business practices, and social welfare, often thanks to the advocacy of individual employees. And his avowed goal for his book: “I want to recruit you, and train you, for the Responsibility Revolution.” Saving the World at Work

The first third of the book focuses on how business revolutions take place, a five-phase process according to Sanders:

  1. a change in circumstances that dramatically affects our view of the business landscape
  2. a consequential shift in values
  3. the arrival of the innovators, who rush to address these new values with new approaches, leading to
  4. disruption as the old guard either disappear or cave in and adopt the new values
  5. the revolution finally culminates in the New Order, becoming better established and serving new markets

He gives a variety of compelling examples of these five phases as companies begin to adopt quality of life, broadly understood, both locally and globally, and for both current and future generations, as a central business value. Companies like GE, IKEA, SAS, Timberland, Aveda, Patagonia, and even Wal-Mart have made significant changes to how they do business, often helping the bottom line at the same time as they’re being more socially responsible.

The book is chock-full-o’ interesting factoids:

  • In one survey, 50% of MBA students said they’d accept a smaller salary to work at a company that was very socially responsible.
  • Two-thirds of recent college graduates claim they will not work for a company with a poor reputation for social responsibility.
  • Paper (the vast majority of which can be recycled) accounts for one-fourth of the volume of landfill waste. As it breaks down in a landfill, it converts to methane, producing twenty times more greenhouse-gas emissions than carbon dioxide.
  • If the US were to cut annual paper use by 20%, the reduction in greenhouse gas emissions would be equivalent to taking half a million cars off the road for a year.
  • It takes 3 liters of fresh water to make one liter of bottled water (i cringe every time i see the wall of drinks in the kitchen at work and think about how much energy and resources it takes to support the “convenience” of individual servings)

On the negative side: some of Sanders attempts at catchy phrases are annoying, like “saver soldier”, a “highly motivated person who leverages work as a platform to help save the world”, or the “Them Generation”, those baby-boomers from the Me Generation who have turned around and are thinking about others now. And while he’s touting the “responsibility revolution” as an accomplished fact, i suspect it’s not quite here yet (if it were, he probably wouldn’t have had to write the book!). But he’s acting as a cheerleader here, and sometimes cheerleaders have to do some goofy moves to get our attention.

This book motivated me to see what i might do at work to make my company a better social citizen.

Reading: Wikinomics

I didn’t get all the way through Wikinomics before i had to return it to the library, but i plan to go back for the second half. So i don’t have it in front of me, and therefore can’t quite do it justice in a review. But it’s an important book that addresses several topics around how cultures of openness and collaboration are changing the nature of business and technology.

Some of the main points discussed include:

  • How advances in technology have brought production within the reach of a much larger group of people than ever before
  • “Ideagoras”, about corporate outsourcing of R&D to bring a much larger pool of ideas to bear on challenging problems
  • “Prosumers”: how customers want to hack, not just passively consume, products
  • How sharing scientific knowledge accelerates progress
  • Open, participative platforms that enable those outside an enterprise to build on its products
  • Wikis in the workplace

While the success of applications like Wikipedia may prove hard to reproduce, it’s clear that they represent some fundamental changes to how knowledge is developed and shared.