Building a Conversational Agent From the Ground Up, Part I

Having recently completed, my latest technothriller, I thought to complement it with an online conversational avatar of one of the book's protagonists, enabling fans to discuss the book's plot with a simulated character drawn from its pages.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

Having recently completed Dualism, my latest technothriller, I thought to complement it with an online conversational avatar of one of the book's protagonists, enabling fans to discuss the book's plot with a simulated character drawn from its pages.

The character I chose to emulate was Dualism's resident artificial intelligence, a quantum neural network who styles himself "Nietzsche." In effect, I wanted to create a real-world AI that would impersonate a fictional one. The question was: how to go about it?

What Wouldn't Work
So, there's actually something of a history of treating computers and conversation as a sort of elaborate parlor game. It started back in 1966, with MIT professor Joseph Weizenbaum's ELIZA program. No friend of AI, Weizenbaum set out to see how far he could get without it. The result was a triumph of minimalism: a program that employed no grammar, no lexicon, no semantics -- in short, no thought and no language -- and still managed to fool some of the people all of the time.

The genius was in the set up: ELIZA's best-known, most successful charade was its portrayal of "DOCTOR," a practitioner of Rogerian psychotherapy (of which Weizenbaum was evidently no friend either). In that guise, ELIZA had two things going for it:

  • First, as in all analysis, the "patients" (the humans with whom "DOCTOR" interacts) did most of the talking, and that about their favorite topic, themselves -- so no wonder they found the conversations fascinating.
  • Second, the mode of nondirective psychotherapy practiced by Carl Rogers and his followers is particularly easy to parody, consisting as it does of stock responses and slightly modified reiterations of what the patient has said.
  • Capitalizing on these features of the "conversation," ELIZA employed a technique called "string-matching" to generate what little in the way of response was required of it. While actual implementations could become arbitrarily elaborate, the logic behind them remained rudimentary enough: simply scan the input strings for the occurrence of keywords (e.g., "MOTHER"), and use it to trigger canned, but canny-seeming responses (e.g., "TELL ME MORE ABOUT YOUR FAMILY.").

    Well, that was maybe good enough for the sixties. The wonder is that even now, fifty years hence, the same "chatbot" techniques are still being taken by some as exemplifying the state of the art in computational linguistics, and tend to dominate the entrants in each year's Loebner Prize competition, the annual chatbot bake-off. There's got to be something better by now, but --

    What's the alternative?

    A Garden of Technologies

    The fact of the matter is that, in the nearly five decades since ELIZA's debut, significant progress has been made toward true conversational agents. String-matching has long since been abandoned in favor of full sentence parsing. Moreover, unlike ELIZA, modern-day NLP systems actually try to follow the conversation via techniques of discourse analysis and management (DAM). Even more radical, some of those systems employ knowledge representation and reasoning (KR&R) technologies to give them some idea of what they're talking about!

    We'll discuss each of these in turn, creating a blueprint, a mental model, of our ideal artificial conversationalist. And in this, there is no better place to start than with the technology that has become almost synonymous with natural language processing as a whole: parsing.

    Parsing: Can't See The Forest For The Trees
    The word "parsing" comes from the Latin for "part," and refers to the process by which a sentence is taken apart into its constituent "parts of speech" (nouns, verbs, adjectives, etc.) and then reassembled into a "parse tree" -- a structure manifesting the syntactic relationships among those components, with a node for the sentence as a whole at the root, and branches for noun- and verb-phrases leading down to the terminal leaf-nodes containing the literal words.

    Nowadays, as evinced by the (rather one-sided) contretemps between Noam Chomsky and Google's Peter Norvig, there's something of a debate over which is better for cranking out these parse trees: a symbolic, or a sub-symbolic (i.e., statistical or machine learning), approach.

    So, since it behooves anyone with an interest in the future of computational linguistics to take a stand on this issue, here's mine: I don't care! Just give me a decent parse, please, and don't tell me how you got it. I don't want to know.

    The trouble is: giving me a decent parse is exactly what both of these approaches have trouble doing. There turns out to be a fly in the ointment, forever preventing a purely syntactic analysis from ever yielding a single definitive interpretation. A fly called "ambiguity."

    Consider, for example (while we're talking about flies), the seemingly altogether unambiguous statement, "Time flies like an arrow." Perfectly obvious, no? After all, it's just as assertion about how some entity ("time") performs some action ("flies") in some manner ("like an arrow").

    Not so fast. Lacking knowledge of the real world, an artificial intelligence might not be able to rule out the possibility that, just as there are "horse flies" and "fruit flies," there might well be "time flies," and that these strange insects might be very fond of archery equipment. In which case, "Time flies like an arrow" results in a whole different parse-tree, with a whole different meaning -- "time flies" now becoming the subject, "like" the action, and "an arrow" its object.

    We're not done yet: In addition to being a noun and (possibly) an adjective, "time" in English is also a verb, meaning "to measure the speed of something with a timing device." So, "Time flies!" could be a command to measure the speed of flies with a stopwatch, and the whole sentence "Time flies like an arrow!" might be read as an answer to the burning question: "What's the best way to time flies?" ("Time them the way you would time an arrow" or maybe even "Time them the way an arrow would time them.")

    If you think this is bad, though, consider the sign seen hanging in a California town hall -- to wit:

    Persons applying for marriage licenses wearing shorts or pedal pushers will be denied licenses.

    This humble -- and, I submit, instantly comprehensible -- advisory admits of no fewer than forty-three separate parses, including the ones where the marriage licenses are the ones wearing shorts, or the shorts are wearing marriage licenses, or the applicants are magically transformed into licenses in a state of denial ("...will be denied licenses").

    The point in all of this is that, beyond some extremely low threshold of complexity, there is no such thing as a single parse tree for a given input. At best there is always a small for-est of alternative parses; at worst, a jungle.

    So, the real challenge for those who would create conversational agents does not end with finding a better parser. Rather, it goes on from there to consider, in hopes of mimicking, the effortlessness with which we humans hack our way through all this syntactic undergrowth, blithely discarding most variant interpretations as if they did not exist at all.

    Part of how we do this is by taking context and connectivity into account. The peripheral becomes central in the pursuit of a perfect parse, as next time we leave the individual sentence behind and enter the broader realm of discourse phenomena as a whole.

    Popular in the Community