How We Will Talk to Computers in the Future


There's a lot of talk right now about the Amazon Echo, or Alexa, as you would refer to the voice assistant that the Echo enables you to converse with. To a lot of people this is something very novel: you say, "Alexa, could you buy some toilet paper?" and a slightly robotic woman's voice chirps out a recommendation and asks you if she should order it for you.

I've had an Amazon Echo for about a year now, and I often use Siri to replace otherwise mundane tapping on my iPhone. Voice assistants really augment my everyday life. Nothing major -- just small touches like turning on music while my hands are full, checking the weather while putting on my shoes, setting alarms, timers, and reminders. But it's easy to imagine voice assistants playing bigger and bigger roles in our lives, and this is one of the many things we think about at Roger.

Voice as an interface won't replace existing interfaces. There's a large cognitive benefit to being able to see and touch an interface. So while it's very convenient that you can reorder your last Domino's pizza order via Alexa, creating a custom order is still much easier through Domino's website.

There's been a lot of buzz about the "Conversational UI," but it's not going to be a revolution. While some apps may benefit from having a dialog with the user, it's a very linear, slow medium when compared to graphical and tactile interfaces.

Let's look at a possible future where voice interfaces -- and by extension, assistants -- will excel.

Today, pure voice interfaces work best for very imperative queries. The usual stuff is "what's the weather," "turn off the lights," "call Bob." Dictation is another good use case. It can be very useful when your hands are full or you don't want to pick up your phone.

Once you go into detail, or talk about more than one thing, voice interfaces currently break down. Some examples that work poorly today include sentences with foreign words, rambling and noisy environments.

The quality of voice interfaces is improving very rapidly. Baidu has built machine learning that understands Mandarin in noisy environments better than humans. Honda also developed a voice recognition system that can understand and identify multiple people speaking at the same time.

The greatest barrier to voice interfaces is public acceptance. For several reasons.

Passive listening is a big deal. There's a device in your home or your pocket that's listening for its name. It sounds eerie, but clearly a lot of big companies are betting on it. Xbox One turns on when you say, "Xbox on." Your Android phone may light up when you say "Ok Google." The iPhone responds to "Hey Siri," and the Amazon Echo listens for "Alexa." What might have been unthinkable five years ago is now becoming prevalent.

Talking to a voice assistant while on the street or with your friends can be awkward. This also happened with headphone calls, checking social feeds on your phone, and so on. Assuming people start using voice assistants privately, it's pretty safe to assume that it will become an accepted behavior publicly too.

Voice allows for completely new types of interactions. Unlike burrowing your head in a phone screen, voice is inherently a more interactive and social medium.

Longer term, voice as a human-computer interaction can become a very natural thing and shouldn't be thought of just as an interface to one or more applications. Much like search engines are getting better at presenting content most relevant to you, the voice assistant would be attempting to not only find relevant information, but perform actions most relevant to you. The obvious improvements in this area is making complex orders ("get tickets for Isabel and me for a movie we'd like, tonight by Union Square"), but it really doesn't have to end there.

Voice assistants become tools that act in your interest, and in some cases on your behalf. I'm not talking about AI, and I'm not even talking about simulating a person. If we assume that voice interfaces become a publicly accepted thing, it really wouldn't be that strange if, for example, I don't hear the phone and my voice assistant picks up instead and says "Hey Isabel, sorry Blixt couldn't pick up the phone. Can I help in any way?" and then Isabel can interact with my voice assistant to later remind me to pick up groceries on the way home, or if there's something really urgent tell my assistant to sound off a loud alarm on the phone to bring it to my attention.

The voice assistants we have today are barely scratching at the surface of what's possible. But they're increasingly understanding your taste profile, they know more about your day-to-day life. They're also getting better at understanding you, and better at making themselves understood.

We're far from a future where software has social status, and much like self-driving cars there's a lot of legal work involved in allowing computers to make choices on our behalf. But I don't think it's crazy to say that we will get there. Already today people of all ages love interacting with voice interfaces--it's so easy to imagine them as more than just another app.

Blixt is the Co-Founder and CTO of Roger, a NYC-based voice technology startup, and a Quora top writer on technology, artificial intelligence and programming.