The Google iPhone voice app angle that no one is talking about


By Jeff Whatcott - Posted on 18 November 2008

I checked out the new voice-activated Google Mobile App for iPhone this morning. It works pretty well - not perfect - but solid.

I think it’s interesting that no one is talking about the strategy behind the introduction of voice as an interface to Google. This has been on my mind lately as I’ve been researching the web search domain in conjunction with the forthcoming release of Acquia’s hosted search service. As part of that research, I read The Search by John Battelle. John’s penetrating analysis of Google opened the neural pathways that lead me to take a deeper second look at their new iPhone app. Let me explain.

Google’s business model is spinning gold from data collection and organization. As Tim O’Reilly has been on this for a long time with this “Data is the Intel inside” theme.

With the new iPhone App, Google is now able to collect data about not only what you are looking for, but how you asking for it, at the waveform level. And this data is correlated with your exact geographic position.

In the last 24 hours, Google has probably collected and stored hundreds of thousands of samples of human speech, and by the end of the year, it will be millions and eventually billions. They’ve probably also captured data about whether their voice recognition algorithms are accurate, because they probably know if you actually click on any of the links they serve up. And they also know what region of the world you are in when you speak, which means that they have the data to start to account for regional accents and dialects.

So on one level, what Google has created here is a massively scaled feedback loop that will likely improve speech recognition algorithms at a blinding pace. Similar to what Recaptcha does with text recognition, Google has the opportunity to build a distributed microlearning system that will just get better and better with continued massive use. Good for them.

But that’s not all. They are also getting inside our heads. With the massive data set they are gathering, they may also eventually be able to make educated guesses about our age and gender, our ethnic heritage, our mood, our health, and our honesty by recording and analyzing our voices. They could also be gathering a unique voiceprint which can be stored as a unique identifier for each of us. It’s a pretty reliable multi-factor identification and tracking system in the making. Voice login for Gmail, anyone?

I have no idea if Google is doing any of the above. It may all be science fiction. And even if they are heading down this path, there may be a way to do it that should not cause undue privacy concerns. But it may be worth asking the question, because Google is not a normal company, it falls within their core mission to do this, and and they have a profit motive.

What do you think about all this? Do you want Google servers indexing and storing your voice?

Well, it's not entirely true that people haven't talked about this before.
Melissa Mayer actually admitted last year that GOOG-411 was a giant speech recognition phoneme data mining project:


The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.

Thanks for the link. I hadn't seen that one before.

You might be interested in following the discussion over here. Apparently the files that the app is sending back to the Google mothership are only 100-300 bytes. Probably too small to be actual waveforms. It looks like some of the speech-to-text (or speech-to-phonemes) conversion may be done on the phone itself.

That's very helpful. Thanks.

Looks like there have been a few updates to that post and Google has confirmed that the recognition is happening in the cloud.