Voice, the next frontier

Jun 28, 2018 Audio & Broadcast

Smart objects are becoming more intelligent by the day, integrating into lives like new members of families, and even new colleagues. With costs coming down, we can expect to see more technology that recognises faces, images, gestures, speech, even emotions, all the while picking up on natural language processing too.

Article by: Laura Daquino

This goes beyond the household names of Google, Alexa (Amazon), Siri (Apple), Bixby (Samsung), and Cortana (Microsoft). Audio companies are launching their own voice technology products. Even one of the oldest bathroom suppliers in the world, Kohler, has just rolled out voice-activated bathrooms too.

In Asia especially, there’s a huge case for voice-based IoT products because of the sheer difficulty of typing phonetic characters with speed.

Right now these products tend to take the form of small speaker-like objects. In the future, voice tech is expected to be omnipresent, essentially invisible, as it smoothly integrates into our everyday lives.

The medium of choice?

In 2003, before voice tech was even part of the vernacular, Jakob Nielsen, a global interface expert, holder of 79 US patents, and principal of the Nielsen Norman Group, determined in a paper that “voice interfaces will not replace screens as the medium of choice”.

He did, however, caveat that with a study he conducted in 1986, when modern technology was still in its nascency. Nielsen asked a group of 57 computer professionals who would win the interface war by 2000 — voice or graphical. Overwhelmingly, the respondents echoed voice.

Mitchell Long, director of strategy at media agency PHD Australia, has a wealth of experience in the field in Australia, having worked on the local launch of Google Home a few years back.

Nielsen suggested voice would be most beneficial for users with disabilities, who don’t have access to a keyboard or monitor (very few in 2018), or those in “eyes-busy, hands-busy” situations. Fifteen years on, Long believes voice has merits for everyone who falls into those categories, and everyone else too.

“Once you get past the stigma of looking like an idiot talking to an object, you will become comfortable and natural, because it becomes homogenised,” says Long. “It won’t impact all of our interactions with technology but it does start to break down our reliance on screens in some situations.”

“We’re moving away from us having to go to technology to use it, as technology now comes closer to us.”

He also thinks, where voice is a fixture in an early adopter’s home appliance set right now, “sooner than we think” will it be integrated into commercial environments too.

Seamless utility, not novelty

Long would not go as far to say this would replace screens as the medium of choice. Similar to Nielsen’s prophecy of a “multi-modal dialogue” between screens and voice, Long can see this playing out already with some brands already cross-promoting products through devices.

A great branded example, in Long’s view, is Campbell’s Soup talking recipes through Amazon Alexa to encourage seamless purchase of its products. Walmart and Google have also recently partnered on Home-based shopping.

What makes for the widespread adoption, and defines the quality of the voice tech, will be its utility value, believes Long. That’s regardless of whether voice tech is operating in a commercial or residential environment. Whether it’s advertising a product, offering a suggestion, or simply responding to a command, these products must deliver utility, not novelty.

Google understands this well, and was recently reported to be “throwing cash” at four Google Assistant startups to dominate voice tech in lucrative markets like travel, hospitality, gaming and education. One of its latest acquisitions, an AI-powered English tutor named Edwin, helps students prepare to take foreign language tests with tailored lessons. It even integrates with Facebook Messenger.

The audio visual game

Perhaps most interesting will be the adoption of voice by the audio visual industry, which has always been on the cutting-edge of technology.

Already Amazon’s Alexa can engage in teleconferencing, and is offering the commercial tech market the ability to build custom skills through its Alexa Skills Kit and Alexa for Business APIs.

In a practical sense, this is making collaboration easier in commercial environments, where participants can quickly check in with voice tech to schedule meetings, set reminders, and even bring up data from popular business applications like Salesforce. It goes beyond “turn on the projector” commands.

Commercial is the next frontier for voice. But there are concerns.

For starters, users in a business environment change on a regular basis, which means the technology needs to be more intuitive and adaptive than ever. If not, there’s a risk that voice will be neglected early on and for good, as users are time poor and working by the clock.

There are also prevailing privacy concerns. Google hit a speed bump because of this last year, upon discovery that Google had been secretly recording conversations. In May 2018, Amazon made headlines for a similar reason, where Alexa misheard a prompt to send an audio file of a husband and wife’s conversation to an employee of the husband.

Alexa has also laughed — quite creepily — at some of its users for no particular reason. That would not go down well in an office environment when the big boss is presenting.

But Mitchell Long from PHD is optimistic. He thinks any glitches will be ironed out sooner than we think. He’s also banking on voice tech to reach “the tipping point” soon, a mark of accuracy that will lead to widespread adoption in business environments.

“What some people are saying, once you reach around 98 per cent accuracy, that’s where the technology essentially skyrockets — you no longer find yourself repeating or find it clunky to use,” says Long.

“It may still seem a little gimmicky at this stage. AI is still quite systemic, in terms of how it responds to things, but then it becomes more like a person.”

By some reports, it already has. Digital expert Mary Meeker, who doubles as a famous venture capitalist, released her much-awaited annual internet trends report at the end of May.

 

In that, she claimed the Google machine learning word accuracy measure was at 95 per cent — a level labelled as the “threshold for human accuracy”.

The future is already here. We just need to learn how to integrate it. And Integrate is helping drive that charge, cultivating the ultimate experience for AV and IT professionals in Australia, INTEGRATE 22,24 Aug, ICC Sydney. You can also hear from Mitchell Long himself in his dedicated session, “Navigating the voice first revolution” – click here for more education information.

  • Stay up to date with the latest news, industry insights and Integrate updates.
  • Subscribe