Daily life is riddled with small misunderstandings. Words are mispronounced, ambiguous terms are misinterpreted, lack of context leads to confusion, and distraction contributes to mistakes. We take these little errors for granted and when they do not lead to meaningful consequences we think nothing of them.
Recently, I attempted to count how many of these small errors I witnessed in a single day. I included my own conversations and those I overheard. I stopped counting after only half a day and the number was already in the mid-thirties. Very normal. So why are we so much less tolerant of similar errors when committed by our voice-enabled technology?
The dream of fluent verbal communication with machines is so tangibly close that it now seems like an inevitability. Siri, Cortana and Google's voice recognition allow us to speak to computers in increasingly natural ways. We can ask about everything from sports, to weather, to movies. We can check calendars, set timers, search the web, post to social networks and even make (or ask for) jokes. And the flexibility with which we can phrase our verbal input continues to grow.
When it works smoothly verbal interaction with a computer is deeply satisfying because it feels effortless and unmediated. But, when we are heard incorrectly or misunderstood we seem to hold the computer in much more contempt than we would a fellow human. There are websites and videos dedicated to pointing out the occasional 'Siri fail' even though Siri and her ilk represent some of our crowning technological accomplishments. And even though those failures are generally no worse than the errors we ourselves make all the time and take for granted.
It appears we are still very much in a transitional period with regard to our fundamental understanding of what computers are. We are gradually moving away from an 'old computer' paradigm and toward an era of 'new computers'. Some of the essential differences between old and new computers relate to our interface with them and likely influence our attitudes toward their errors.
Reliability versus Flexibility
Consider the calculator. Calculators are rigidly and severely constrained devices. Input is limited to a small set of symbols with very narrowly defined meaning. Commands are limited to an even smaller set of equally narrowly defined operations. But calculators are highly reliable. You get the same answer every single time. Old computers are essentially glorified calculators. Absent artificial intelligence, they follow a similar pattern of narrowly constrained input and highly reliable output. These old computers set the expectations of generations of humans who expect extremely high reliability, and find the notion of computers making mistakes ludicrous.
But as we build up the artificial intelligence of computers and endow them with more human-like capabilities we inevitably lose some of that reliability. A computer that is able to guess is sometimes going to guess wrong. Did you mean 'play the song Baby' or play the album with the same name? Or did you want to launch and app called baby?
Natural Language versus Computer Commands
Two of my most frequent Siri commands are 'Directions home' and 'Timer 11 minutes' (you guessed it, for pasta). Both these statements would be extremely rude if I was speaking to another human being. We embellish conversation with politeness and niceties. We repeat ourselves. We introduce questions with the announcement that we are asking a question. Civility requires that we do not merely state information or announce our demands. But we speak differently to computers. "Timer eleven minutes" feels more right to me when I am speaking to my phone than an alternative like 'Could you do me a favor and please set a timer for 11 minutes'. Why is that? And what are the implications for the computer's ability to understand me correctly?
The why probably has something to do with the old computer model I am used to. Old computers responded correctly only to very specific syntactical instructions. Ever try typing 'Show me the files on drive C' in DOS? Plus, the added embellishments in human conversation require some effort which I am not inclined to make for my phone. But it is becoming increasingly likely that a computer, like a human being, will actually respond better to less rigidly worded commands. The added words provide clues that can be used to make better guesses about your meaning. It is fascinating to watch Siri parsing and processing words as you speak because words are constantly corrected based on other words that are spoken later in the sentence. So it is either already true or it soon will be that by speaking to computers 'like computers' (i.e., like old computers) we are actually hindering rather than aiding their understanding.
Old computers made us feel in control
By 'us' here I mean anyone who was accustomed to computers and learned to take them for granted. The predictability and the obviously limited intelligence of computers put the human being squarely in the superior position. I cannot count how many times I heard it repeated that computers have an IQ akin to an earthworm but that phrase was getting a lot of mileage at one point. Human beings do not like to feel threatened and when something threatens their self-esteem they tend to react negatively. Who's more likely to call a computer a 'stupid machine', the computer engineer or the person who never got comfortable using one? When we feel stupid we like to poke fun at whoever diminishes us. New computers, with their endless knowledge and apparent intelligence may seem friendlier, but they are also more intimidating. Siri is witty and glib, but she can also seem rather superior.
There are also more mundane factors that likely contribute to our unforgiving attitudes toward computer error. One may have to do with the speed and ease of verbal communication. Speaking with Siri or Cortana is fast, but it is not quite fast enough. There is still a noticeable pause while our speech is interpreted and a response is generated. Perhaps receiving a mistaken response is that much more irritating when you have to spend those extra seconds waiting for it. Or perhaps it has to do with the fact that our electronic 'assistants' all seem to have female voices, at least by default. Though I do not have any empirical evidence to support this, I am (sadly) willing to believe that we might have less ridicule for a more masculine sounding computer interface.
As we move further into the era of new computers, with increasing numbers of devices and functions supported by artificially intelligent networks we will have to reappraise our expectations of how computers should behave. A prime example is likely to be the driverless vehicle. It's easy to imagine the outrage that will ensue the first time a driverless vehicle kills a human being (I would bet this will occur in the not too distant future). But to be useful driverless vehicles only need to not hurt significantly more than the millions already injured by human drivers every year. We are at the beginning of a new era in computing and we will not be able to endow computers with increasingly 'human' tasks while requiring them to maintain their old computer-like infallibility.