Thursday, August 13, 2009

Book Review: Programming Collective Intelligence : Building Smart Web 2.0 Applications

Programming Collective Intelligence: Building Smart Web 2.0 Applications
By Toby Segaran
O'Reilly 2007
ISBN 978-0-596-52932-1

I bought and read Programming Collective Intelligence about a year ago and it remains one of my favourite books.


According to the author, Collective Intelligence is the "combining of behavior, preferences, or ideas of a group of people to create novel insights." This is a book about approaches and algorithms that allow you to do more with datasets of user-generated or contributed content.

The Book takes a task-based approach. After an introductory chapter the book is divided into chapters around a performing a specific sort of analysis. You'll find chapters on making recommendations, discovering groups, searching and ranking linked content (like Google's Pagerank), optimisation, document filtering, modelling with decision trees, price models, classification and genetic programming.

You will not find heavy theoretical discourse in this book. As required the book introduced new underlying algorithms but only in the context of the current task. Those with a background in the material might benefit from knowing that the following algorithms are covered: Bayesian classifiers, decision tree classifiers, neural networks, support vector machines, k-nearest neighbours, clustering, multi-dimensional scaling and non-negative matrix factorisation.

The book is black and white and is 334 pages long with a mid-sized typeface. The writing style is relaxed and conversational, but you will need to concentrate at times to follow the concepts being explained. Each chapter has many code samples and quite few data tables and diagrams. The net effect is that where text, code or diagrams are hard to fully grasp, the other code, text or diagrams can clarify the meaning and intent.

The code samples are written in Python in a tutorial, incremental style, so you can follow along with the text. You will need download and install some well-known third party libraries to recreate most of the examples, and many of the samples either require an internet connection to supply the data or use data sourced from the internet.

A very short Python primer is included in the preface. Knowing Python is not a pre-requisite to reading and understanding the code samples, however - I did not know Python at all when I read it and still managed to follow along. As someone who is now learning Python I can appreciate that the code samples are all very concise and polished, even if the powerful use of list comprehensions sometimes confused me at the time I read it.

I enjoyed this book immensely and it remains one of my favourites. I had only very limited exposure to the concepts prior to reading the book, but I found myself increasingly excited about the possibilities as I read. Even as I page through the book for the purposes of refreshing my memory it is impossible not to earmark certain sections for my current projects.

Who should read the book? I think non-programmers would struggle getting through the book, and readers wanting a more theoretical and deep understanding of the algorithms may be disappointed. Programmers with a background in any language will probably extract a lot of value from the book, however, especially if these concepts are new to the them. For that audience I strongly recommend this book. Unlike many other technical books I think this book will remain relevant and useful for quite a while.

Programming Collective Intelligence : Building Smart Web 2.0 Applications is available from Amazon, Oreilly.com and possibly your local technical bookstore.

(Full disclosure: As an Amazon associate, I will get paid if you purchase this book though the links to Amazon on this page. Use this Amazon link if you do not wish this to happen. Was the review helpful?)

No comments:

Post a Comment