Sunday, July 5, 2009

Natural Language Processing for Spell Check

I've mentioned in my other blog that I star items in my news feed that I want to take a better look at. Today I went back and re-read and watched the you tube clip from this Google Operating System post about how Google is leveraging its massive amount of Language data to improve spell check.

This is pretty neat stuff. Removing the need for dictionaries and focusing on existing spellings feels like the more organic approach to ensuring a spelling is correct. Is this the death of the dictionary? Probably not for formal writings, but this is a quantum leap for the average, casual, personal conversation conducted so many million times around the web each day.

This is also another demonstration of what is made possible when you collect a large amount of empirical data - You can stop trying to define things via rules and let experience make the overall system smarter. In this case, Google Search and Google Translate are amassing a huge of field knowledge about how people write and construct grammar. Now that Google is pushing ahead with Google Voice (for the US only at the moment, sadly) it is likely they will amass a similiar amount of data to improve their speech recognition.

In the meatime, is this smart spellcheck capability available to the rest of us through tools or APIs? The Google Toolbar provides a spellcheck service that seem to leverage this, and some folks have helpfully reverse engined an API from this. A search of the Google Code APIs did not turn up any contenders, which begs the question whether or not Google is purposefully keeping this functionality close or if they simply have not gotten round to documenting or releasing the API.

I am definitely interested in more information if someone has any.

No comments:

Post a Comment