The Turing test is tired. It's time for AI to move on

The Turing test is tired. It's time for AI to move on
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.
Statue of Alan Turing at Bletchley Park, where he led the effort to decode Enigma during World War II.

Statue of Alan Turing at Bletchley Park, where he led the effort to decode Enigma during World War II.

Wikimedia Commons

This weekend at England’s famed Bletchley Park, a bit of software fooled some humans into thinking it was, well, human. Mitsuku, an animated chatbot that calls itself “an artificial life form living on the net,” won the Loebner Prize’s Turing test competition for the third time since 2013. While the Turing test has long served as a milestone for artificial intelligence developers, advances like self-driving cars, speech processing, and image recognition have rendered the test less relevant, begging the question: What’s the next moonshot goal for AI developers?

The Turing test, of course, is named for Alan Turing, who led Britain’s efforts at Bletchley Park to break the Nazi’s Enigma code during World War II. (Turing was played by Benedict Cumberbatch in the 2014 film, The Imitation Game.) In 1950, Turing asked “Can machines think?” and described a test where a computer communicating in text might display intelligent behavior like that of a human. He expected that within 50 years, such communications with a computer would be either indistinguishable from or the equivalent of those by a human.

The "standard interpretation" of the Turing Test, in which player C, the interrogator, is given the task of trying to determine which player – A or B – is a computer and which is a human. The interrogator is limited to using the responses to written questions to make the determination.

The "standard interpretation" of the Turing Test, in which player C, the interrogator, is given the task of trying to determine which player – A or B – is a computer and which is a human. The interrogator is limited to using the responses to written questions to make the determination.

Wikipedia/Juan Alberto Sánchez Margallo

The Turing test went on to excite the imagination of the tech community, paving the way for early chatbots like ELIZA and SmarterChild. But as AI and machine learning advance, the challenge of a machine imitating a human has become easier, and relatively trivial, compared to a machine exhibiting smarts and an ability to learn.

Steve Worswick, developer of the Mitusku bot points to “programs like AlphaGo and the recent Dota bot that can defeat world champions at their own field of expertise, [which] show that machines don’t have to be humanlike in order to be useful.”

“I believe that the Turing test goal of trying to achieve a human level of intelligence was a noble goal in its day, but computers are capable of doing so much more than a human, especially with memory and information retrieval,” adds Worswick.

Beerud Sheth, CEO of Gupshup, a provider of bot building tools, says, “Turing proposed his test as a practical, simple, measurable way to evaluate machine intelligence, knowing fully well it's limitations. The test is useful more as an inspirational idea than its literal interpretation as a human-imitation game.”

Even so, devising software that can pass a Turing test is no small feat. For instance, Loebner prize competitors faced 20 questions ranging from current events like #8: “What do you think of Trump?” to ones requiring an understanding of context like #17: “I was trying to open the lock with the key, but someone had filled the keyhole with chewing gum, and I couldn’t get it out. What couldn’t I get out?”

Mitsuku was developed with AIML language and is hosted by Pandorabots.

Mitsuku was developed with AIML language and is hosted by Pandorabots.

Mitsuku

Possible alternatives to the standard Turing test abound. For instance, Apple co-founder Steve Wozniak has suggested a coffee test, whereby a robot would be challenged to enter your home, find the kitchen and brew a cup of coffee.

“The Turing Test isn't a bad test, but it doesn't really measure intelligence,” explains Ben Parr, co-founder and CMO of Octane AI, a maker of Messenger chatbots used by the likes of Maroon 5, Aerosmith, and Lindsay Lohan. (Disclosure: Octane AI is a member of the All Turtles AI startup studio.) “Clearer tests for sentience and self-awareness are needed. It may be decades or longer until we have a truly sentient machine.”

Meanwhile, Kai-Fu Lee , co-founder of Sinovation Ventures and former head of Google China, argues that the Turing test simply needs updating. “I think we should stay within Turing's goal,” he says, and rather than test chatbots which communicate with text, “there should be a cyborg with human skin, human vision, human speech, and human language. The test should judge the humanness or naturalness of the cyborg with all the above skills. One could add the naturalness of the skin, hair, eyes, eye-movements, body language, and more.”

Ryan Graciano, co-founder and CTO of Credit Karma, a personal finance company, cautions against a blurring of narrowly-applied AI for specific tasks and more general AI. “I'm not aware of any significant advancement that brings us anywhere close to creating a conversational machine. We are very far away from that,” he says. ”Striving for a true conversational machine would be the true goal post for me. Something that understands and can prove its understanding.”

Given that cyborgs won’t be competing in a Turing test competition anytime soon, there is room for specialized AI competitions — think multiple, task-specific “Turing tests.” There could be contests for chatbots that take pizza orders, or for image recognition AI that examines x-rays, or as we’ve already seen, for bots battling humans in chess, Jepoardy, and video games.

“A Turing test is about as relevant as a cheetah-test would be for cars,” says Gupshup’s Sheth. “Just as we evaluate a human customer service agent differently from a human radiologist, it makes sense to evaluate different AIs with domain specific tests. Where tests are hard to define, we can evaluate AIs through competitions, like we do with chess champions.”

Steve Worswick receives the Loebner Prize on September 16, 2017 at Bletchley Park in England.

Steve Worswick receives the Loebner Prize on September 16, 2017 at Bletchley Park in England.

Mitsuku

“I would much rather see a contest to find the most convincing chatbot,” says Mitsuku’s Worswick. “By this I mean the one with the most believable personality, or [that] can reply sensibly to questions.”

Such discrete competitions can help marshal resources and drive incremental advances in AI. What’s needed though is a new test, what Sheth calls “a moonshot goal.” It’s attainment would need to be years away and resonate throughout the tech community — like Turing’s original test.

“It would be great to see programs that can develop on the work already created to do practical tasks like identifying cancer cells, “ says Worswick. “A lot of work is already being carried out in these areas. Rather than trying to deceive and lie to someone that it’s human, I would much rather see a more practical use for AI.”

His three victories in the Loebner prize notwithstanding, Worswick is ready for a new challenge. “Only the arrogance of the human race believes nothing can be more intelligent than a person — and I’m sure back in Turing’s day this was the case, but it’s now time to move on.”

Popular in the Community

Close

What's Hot