Image: Matt Montagne / Flickr

Google slashes AI Speech Recognition Error Rate


It's good-bye to double-Dutch as Mountain View breaks out from only speaking fluent Joda

Natural language is the easiest way for people to interact with systems and to communicate with one another.

Error rates of less than 10% approach human levels of understanding which, along with Natural Language Processing and Artificial Intelligence, allows for services such as truly secure transcription where unencrypted voice never leaves the enterprise, meetings are automatically scheduled while still on a call, and automated assistants can remind you of the meeting agenda or transcribe a note.

Make your point

Effective communications are at the core of everything we do in our daily lives, and despite relying heavily on texting and typing, it's our natural spoken language that is easiest for both interacting with systems and for communicating with others.

Think, then of how much easier our lives will become when you can simply speak your mind, and your devices understand perfectly? It seems we're not far off if we consider a recent announcement by Google claiming that their Cloud Speech API has cut error rates by more than 30% in the last five years.

Error rates of less than 10% approach human levels of understanding and, along with Natural Language Processing and Artificial Intelligence, allow for services such as secure transcription, automated assistance and sci-fi levels of home automation.

"Hello, HAL. Do you read me HAL?"

The key behind voice recognition software is how well the technology can convert a person's voice into a data pattern. The problem is that everybody speaks differently with subtleties in accent or phrasing, making it difficult for effective translation and accuracy.

Still, according to a recent VentureBeat article, Google's Jeff Dean announced at this year's AI Frontiers Conference in Santa Clara that Google has smashed error rates – the frequency of words being transcribed incorrectly – by a colossal 30%. This leap has been attributed to the addition of "neural nets", which are biologically inspired programming systems that help computers learn by observation and repetition.

The race is on…

Back in August, Alex Acero, the senior director of Apple's AI speech recognition software, Siri, told tech site Backchannel that the "error rate has been cut by a factor of two in all the languages". And in September, Microsoft announced that researchers had achieved a speech recognition milestone of a word error rate of just 6.3 percent. If that can be replicated in a consumer use-case, it is impressive stuff. If...

Amazon and Google fight to take over your home

Unsurprisingly, both Amazon and Google tend to do extremely well over the Christmas period, and with fourth-quarter revenues due to be announced imminently, the question for the year ahead is who will win the battle to control your home?

Retail giant Amazon struck gold with their market leader, Echo (and charming digital assistant Alexa), and Google entered the ring with Google Home some time later. Sales of Echo so far have dwarfed Google's efforts, with Morgan Stanley estimating that the Echo is now in 9% of US households, but why should this matter to Google?

Research company Gartner, according to The Guardian, claims that by 2018, 30% of all interactions with devices will be voice-based (due to the fact you can speak four times faster than you can type), and, as we can see, voice recognition tech is improving all the time. The ramifications are huge. No more typing into search engines means less time facing the screen, and potentially Google taking a hit on one of their ad revenue streams.

As voice recognition development continues to improve, the market is poised to make voice-based interaction and communication the standard, potentially making fingers-and-thumbs a thing of the distant past.

"Alexa? What was typing?"

Show Comments ()


Follow Us On

Jobs at Clique

Related Articles