Segaran (author of the highly-recommended Collective Intelligence) and Hammerbacher have collected together a diverse set of essays on data collection, visualization, processing, and analysis. What interested me most was the wide variety of application areas in which data is the “secret sauce”. The essays range from broadly philosophical to deep in the technical details: so you’re likely to find something at your level of interest (though that also means that much of the book may not hit your level).
Jeff Hammerbacher’s chapter on Information Platforms and the Rise of the Data Scientist is a good example. It discusses Facebook’s history of scaling its data storage and analysis capabilities, starting with custom scripting based on SQL, moving to data warehousing and then beyond to Hadoop and related tools. “More data, simple models” is the processing style that characterizes many such Big Data enterprises today.
Other valuable chapters for me:
- Data Finds Data (Jeff Jonas and Lisa Sokol)
- Natural Language Corpus Data (Peter Norvig)
- Connecting Data (Toby Segaran)
While you’re not likely to find a solution here to specific technical problems, there’s a good chance you’ll find something either to broaden your horizons or give you some new ideas. Definitely recommended.
(Disclosure: I received a free copy of this book through O’Reilly’s Blogger Review program.)