It, Watson by Alec Perkins

name

It, Watson

by Alec Perkins

Despite what some may say, Watson, IBM's Jeopardy! -playing super computer, isn't a herald of the Singularity, nor the harbinger of Skynet. It's a tool to assist people. It's strong where we are weak, and weak where we are strong. What more could you ask from a tool that complements us and extends our capabilities? Many of the problems we face are highly complex, and a tool like DeepQA, the engine behind Watson's answer generation, may be what's necessary to get a handle on them.

Part of the point of projects like Watson is to better understand our own intelligence. By seeing where computers struggle, and by trying to recreate what we are good at, we learn how our brains work their wonders.

Mechanics

Like Deep Blue's approach to chess, Watson's approach to Jeopardy is very different from that of a human. It's a brute-force method that combines a diverse set of algorithms to extensively analyze a large set of possible results and arrive at an answer. Watson is not capable of reasoning, at least not as we generally define it. It also does not actually understand the questions. Rather, it looks for connections between the words to gain a 'meta-understanding' of the meanings. Humans also derive an answer from their knowledge of connections between words, but we have the added advantage of understanding the actual meaning of the words. Ken Jennings did say in his piece on Slate that Watson's “techniques for unraveling Jeopardy! clues sounded just like mine”. However, Watson's application of these techniques is doubtfully the same.

As computers tend to be, Watson is excellent when the questions are specific. The first night, it dominated a category about The Beatles that presented song lyrics as clues. This was no doubt due to the nature of song lyrics to be rather unique and specific combinations of words, especially when qualified by a precise category. They wouldn't be recognizable to humans either if they weren't.

However, Watson stumbles when the meaning is unique but the words use implication instead of specificity. At one point, the clue was “His victims include Charity Burbage, Mad Eye Moody and Severus Snape; he'd be easier to catch if you'd just name him!”. Watson's top three answers were Harry Potter (37% confidence), Voldemort (20%), and Albus Dumbledore (8%). Since none of the confidence levels passed the threshold, it stayed silent while Brad Rutter easily responded “Who is Voldemort?”. To humans at all familiar with Harry Potter, the meaning of “victims” and “if you'd just name him” is clearly referring to Voldemort — an evil character known as “He-Who-Must-Not-Be-Named”. Watson's answers demonstrated knowledge of Harry Potter, but the clue not using the specific phrase describing Voldemort, or elaborating on the implied connection between victim and evil, kept Watson from zeroing in on the correct answer.

Machine Intelligence

Obviously, projects like Watson raise questions about the nature of intelligence, and how machine intelligence may differ from human intelligence. If Watson is any indication, machine intelligence may not be too unlike how it has been presented in sci-fi works (though hopefully not too similar). The precision and introspection displayed by the android Data on Star Trek: The Next Generation was eerily similar to the way Watson works. Also, when Watson is wrong, it is often way off, giving answers that seem wacky or bizarre to us. Likewise, Data's attempts at social interaction frequently led to awkward or weird situations that even the most socially inept human could not have brought about. Data's role on Star Trek was to serve as a foil to the human characters, and provide opportunities for the writers to explore the nature of humanity. In many ways, this is Watson's role, as well.

A key question is can a machine produce a novel output. The researchers said that Watson frequently surprised them, and during the game, its behavior was indeed often unexpected. However, this was not because Watson was being creative. Any novel behavior demonstrated by Watson is the result of its nature as a complex system that is probably impossible for a human to completely wrap their head around. Its answers are the result of a large collection of algorithms operating on a diverse database, answering questions that cannot be pre-programmed and are not structured for the machine. It's no wonder the output is sometimes puzzling.

Humans are just as complicated — if not more so — and we still struggle to understand our own cognitive processes. What is important about the complexity of Watson is the precision of its process, and that an integral part of the system is the introspective confidence level. This is where machine intelligence diverges from human intelligence. We cannot say precisely how sure we are in an answer, or necessarily our complete reasoning behind it. Watson, on the other hand, knows precisely how sure it is, and can show you the exact breakdown of its evidence for any of the answers it provides. Also, Watson considers every option it can. Whereas humans will quickly narrow their focus to something manageable, aided by the meaning of the words, Watson must explore all possibilities in order to then rank them and yield a response. This is an issue when faced with ambiguity, but given enough specificity it can be an advantage.

Our power of reasoning is best displayed when faced with ambiguity and constraints of information. During the first round of Final Jeopardy, Watson displayed what is probably one of the most talked about quirks of its approach. Given the category “US Cities” and the clue “Its largest airport was named for a World War II hero; its second largest, for a World War II battle.”, the human contestants answered “What is Chicago?”, while Watson gave “What is Toronto?????”, to everyone's amusement. Welty explained during the panel that Jennings had said after the game he did not know outright that Chicago was correct. Jennings's expression during his answer reveal even suggested he was unsure about his answer. However, he knew enough about the airports of New York City and Chicago to narrow the options to those two. From there, he reasoned that, since La Guardia was not a World War II battle, Chicago was the more likely choice, even without knowing the name of its second airport. Importantly, he also did not even consider cities outside of the United States since they were categorically removed.

Watson's trouble with the clue arose from its ambiguity, and reliance on implication to convey meaning. The breakdown of why “Toronto?????” was chosen explains that, had the clue been more specific and started with “This US City's largest airport…”, Watson would've had a better shot at answering correctly. Of course, the Jeopardy Clue Crew prides itself in the puns and manipulation of natural language that are exactly what makes a game like Jeopardy so hard for a machine. One other aspect of the problem was Watson's ability to learn. It had learned during practice to not rely on the category too much, since it is often a pun itself and is not helpful. Unfortunately, this time that's where the important information was.

The computer was saved by its wager, a paltry 947$ chosen using a sophisticated, Game Theory-based analysis of the state of the game and possible outcomes. This precision was again displayed during the second Final Jeopardy round, when Watson bet exactly the amount necessary to guarantee a win of 1$ should it get the question wrong and its opponents get the question right.

Machine intelligence will be one of precision, evidence, and lots of statistics. It will not be one of spontaneity, creativity, or invention. Probably. Dr Welty mentioned several times just how bad humans are at giving a precise confidence in their answers. We are not always aware of every factor in our decisions, and many decisions are made based on "gut feelings". These gut feelings are not just arbitrary decisions one way or another. Rather, they are the result of lower-level intuition that has been trained through experience, and can be spot on even without a conscious understanding or awareness of all the factors.

Computers, by their nature, cannot have gut feelings. Any decision must have a clear basis in evidence. It is important to note that evidence is distinct from fact, and that decision does not mean "the right answer". Machines will not be the arbiters of truth, nor will they always solve every problem. Instead, machine intelligence will help humans by presenting possibilities, and the evidence for them. We are terrible at exhaustively analyzing every option. It's possible, but rarely do we have the time or willpower to do so. Our intelligence is not built to operate that way. That's why we made computers in the first place, to more quickly solve certain kinds of problems. Watson is a seed of a kind of intelligence that will excel at finding and presenting possible answers. But don't expect this intelligence to go looking for questions to solve.

John Seely Brown is quoted in a NYTimes piece on Watson, saying “The essence of being human involves asking questions, not answering them”. The answers aren't what pull us in new directions. It's the wondering what's around the next corner that gets us there. While answers can reveal things unexpected by the asker, being intrigued by the unexpected and wanting to know more are why we continue to take the next step.

Ken Perkins pointed out to me during a later discussion that, when humans give the wrong answer, or no answer, on Jeopardy! , it's usually because they can't recall the correct answer. By contrast, when Watson got the wrong answer, it was because it didn't fully understand the question. This highlights one of the most fundamental differences between humans and machines. Computers are built for a mathematical world; humans brains are built for the fuzziness and messiness of reality. Machines rarely have a problem remembering something. In fact it's often a problem that they remember everything, since the unimportant stuff gets in the way. The challenge is communicating to them what we are looking for. There are many examples of structuring data, but they are rarely human-friendly. As exemplified by the "US Cities" question, the real challenge is getting the computer to "understand" people, so it can better interact with us. Watson was an amazing step in that direction, but obviously there is a lot of room for improvement.

Applications & Interface

Watson's lack of actual understanding is what will make it possible to plug-and-play into different applications. The possible meanings for words are defined by association, not an actual comprehension of the meaning. This meta-understanding derived from connections is powerful, yet generic. The realm of operation is limited only by the knowledge base provided to the QA engine. The thoroughness, and lack of bias in the system itself, makes it well suited for problems that have numerous and complicated factors, and require considering a wide space of possibilities. Medicine and law were some of the fields suggested by the researchers, and DeepQA seems well suited for tasks like differential diagnosis.

With these applications, the interface with the system becomes extremely important. This is more than just do we talk to it, and what color its avatar is. Watson's avatar changed color and pattern when it was thinking, got a question right, or got a question wrong. While amusing, this was little more than a status light, with some extra "personality" programmed in for entertainment purposes. In order for the machine to help us, we have to trust it.

Sanmay Das pointed out during the RPI panel on day three that we can't really trust people more than a computer. But, we still do. A major component of trust is understanding the process behind the other entity. As humans, we can relate to each other, since we know what we think and can empathize with others. It's much harder to empathize with a rack of servers, something so different on a such a fundamental level. We lose trust in people and things when they don't behave as expected. This is where the anthropomorphizing of Watson can help, to a point.

Calling Watson a "he" instead of an "it" and giving it an avatar may make it easier to relate to and accept, but it masks the reality of the machine. Its decisions are based on cold probability and rational processes. In some ways, this should make it easier to trust, since we don't have to worry about it harboring a grudge or deciding it "wants" to answer a different question. However, trust is about information, and the interface needs to enable communication both ways. Some of the mistakes by Watson were due to its lack of contextual awareness. It repeated an incorrect answer from Jennings, because it had no way to know that answer had been declared incorrect. Also, it continued to pick a category it was doing poorly in, because it was getting the correct answer, but didn't know it was getting beat to the buzzer by the humans.

A doctor using a DeepQA descendant needs to present the machine with all available patient and contextual information, and then get back a thorough explanation of the possible causes. There needs to be a mechanism for back-and-forth, so that the computer can prompt for more information on areas of ambiguity. The clues given in Jeopardy! are a good challenge, but often rely on that ambiguity that will trip up a computer. And obviously a contestant asking for clarification of a clue would defeat the purpose. Likewise, the display of Watson's confidence in each answer, and the color change of the avatar to simulate emotion, was suitable for the format of a TV program, but is too much of a black box when it comes to communicating the process behind the answer. It will be very interesting to see what interfaces the applications teams come up with for the different fields that DeepQA goes into.

Not only will we have to invent better ways for a machine to relate to people, through things like better language parsing and voice recognition, we humans will have to learn better ways to relate to the machines. Experienced users of search engines learn how to craft queries to get the best results from a search. Over time, they pick up not just the special operators, but the nuances of word choice and ordering that are most effective in bringing up helpful hits. Humans will have to become more comfortable and familiar with statistics as a field; Adam Lally recounted how someone said “It was 98% certain. How did it get that wrong?!”, demonstrating a misunderstanding of what 98% really means. Without understanding what the machine is telling us, we cannot trust its answers to our questions.

Most importantly, we have to understand the limits of the machine approach to decisions. Also during the last panel, Selmer Bringsjord stated that “probabilities don't cut it for morality and ethics”. Again, science fiction provides a great example. In the film I, Robot, Detective Spooner describes an accident where he was saved by an intervening robot, while a little girl drowned: “I was the logical choice. It calculated that I had a 45% chance of survival. Sarah only had an 11% chance. That was somebody's baby. 11% is more than enough. A human being would've known that.” Just because we trust the machine's process behind an answer doesn't mean we have to agree with it. The interface between human and machine has to allow for our disagreement. (Helpful hint: don't give it nuclear launch capability.)

Victory

The success of Team Watson in no way devalues the skill of Ken Jennings and Brad Rutter, or any other human Jeopardy! players. Jeopardy! was the benchmark, not the mission. It's not as though Watson will become a regular on the show. It would be silly to continue to devote the resource that is Watson to playing what is now a pointless game for it. If anything, this experiment helps us appreciate just how difficult the game of Jeopardy! is. Attempting to develop artificial intelligence is an exercise in understanding what makes us human.

Watson's win is a triumph of human ingenuity. One of the things that makes us human is our use of tools to manipulate our environment. This manipulation has gotten us into trouble, with complex systems that are now beyond our understanding, beyond human scale (eg: the economy). Tools like DeepQA are good at handling analysis of a massive evidence base across a broad spectrum of topics. During the panel discussion, Bringsjord suggested a Watson-type system could be used by policy makers to help decide what to do about the federal deficit. It may seem far fetched, but this broad-yet-deep analytical skill, something humans are not wired to do, could be what we need to get a handle on some of the things that we've created. Obviously Watson cannot simply suggest a solution to the deficit; it undoubtably has no system for factoring in political bullshit. Still, it has a place as a valuable research tool.

I wouldn't be surprised if it takes one complex system to figure out another. Here's hoping it doesn't become a case of sending in a cat after a mouse, then a dog after a cat, and so on. The machine doesn't have to be, and cannot be, perfect. It is built by fallible humans who are trying to better themselves — or we wouldn't bother making it to begin with. It just has to be that our use of this tool results in us being better off. It's up to us to decide if and how tools like Watson meet this requirement. In the same way that autonomous vehicles don't have to be flawless, just better than the mediocre drivers currently behind the wheel. With proper feedback and tuning, the systems can get darn close to perfect.

Marquee