Wednesday, February 16, 2011

Watson Taking No Prisoners

Like many of you, I have watched "Jeopardy!" with great interest the last two nights as the two greatest champions in the show's history have taken on IBM's "Watson" computer. I'll watch tonight as well, though the suspense is pretty much over. Watson kicked ass in the first of two matches, and it will be virtually impossible for Ken Jennings and Brad Rutter to catch up.

In the first round, televised Monday, Watson got off to a very fast start and established that it was faster buzzing in than Jennings and Rutter. But it faltered in a couple of categories and Rutter was able to catch up at 5,000 by the end of the first night. Last night, playing "Double Jeopardy," it was wall-to-wall Watson as the computer started fast again and never let up. At the end of "Double Jeopardy," with Watson sitting comfortably with 35,000 or so, Rutter at 5,400, and Jennings at a paltry 2,400, I said to my wife, "Now Jennings knows what all those other contestants felt like when they played him!" He averaged around 34,000 in his 74-game winning streak, and anyone who got more than a few thousand against him was doing well. I'm sure a lot of them were sitting at home last night going "take that, Jennings!"

The only hope for the humans last night was that Watson would blow the "Final Jeopardy" question and bet a good chunk of its cushion. They got half the wish, as it somehow failed to find "Midway" in its database of airports and blew the question. But it shrewdly bet only 947 and is still about 24,000 ahead of Rutter and almost 30,000 ahead of Jennings. Unless Watson blows a fuse and is incapable of buzzing in tonight, it will perform the "Jeopardy!" equivalent of Deep Blue check-mating Garry Kasparov in seven or eight moves.

Still, it's fascinating to watch, and impressive to see how Watson's creators have programmed it. Its store of knowledge is vast, though in reality Rutter and Jennings have known just as many of the answers. The difference has been in the buzzing, and I've seen some comments about the match (or machine) being "rigged" so that Watson buzzes in first almost every time. Could anything have been done differently? One person suggested letting the three contestants buzz in at any point rather than having to wait until Alex Trebek is through reading the clue and the electrical circuits are opened up for buzzing. That wouldn't work, because both Rutter and Jennings would buzz in as soon as Trebek spoke his first word, trusting that they'd know the answer once they saw the whole clue. And of course, Watson would figure that out and probably beat them to the punch anyway.

A middle road might work, allowing the contestants to buzz in after two seconds or a few seconds. The clues would have to be wordy enough to take longer than that to read aloud. But it would be a test not only of whether Watson could match humans in finding the right question, but who was faster. The humans would still buzz in as soon as they could, but Watson might be slowed down a bit, and that would make for a closer game. Of course, I'm guessing that the folks at IBM aren't interested in closer games; they want Watson to do just what it is doing, kicking ass.

I say all this as one of the former "Jeopardy!" contestants who participated in the "sparring" matches against Watson at IBM's research center in Yorktown Heights, NY. We were sworn to secrecy at the time, but that was before the "Jeopardy!" producers had okayed the actual match, before numerous articles were written about the process (the best of which was in the New York Times magazine section last spring), and of course before the actual matches. So I feel free to write a little bit here about my experience with Watson, though I'll still keep some secrets.

If you caught the "Nova" episode last week about artificial intelligence and the development of Watson, you know that the IBM programmers had a lot of glitches to overcome before making Watson the hot-shot player it is today. I was there about one-third of the way through the sparring matches, which is why I put my money on Watson before this week's showdown. Though it made some egregious mistakes, misunderstood some categories and clues, and repeated someone's wrong answer (which it did Monday night as well), and totally blew one "Final Jeopardy" question, it was quite formidable. When it was 90% or more certain of the answer, it buzzed in first nearly every time. The only way for the humans to win was to hit the daily doubles, pretend we weren't playing for money, bet everything, and get it right. That's still the only way Watson will lose tonight--if someone Rutter or Jennings builds up enough of a stake to double up twice and make up the current deficit. I don't think it will happen.

I played three games against Watson. The first two games I beat Watson--but not the other humans, who hit those daily doubles and cleaned up. I was second all the way in the first game, where Watson struggled, and in the second game I was a distant third until the final question, when Watson bet everything, trying to catch the leader, and completely misread the clue, answering with the name of a magazine rather than the name of the person who was written about in the magazine. So I finished second by default that time.

Our third game was the last of the day, and the programmers told me afterwards that it was one of the best games they'd seen. I jumped off to a big start in the first round, and when that ended I had around 9,000 while Watson was around 3,000. Watson got hot in "Double Jeopardy!" and quickly caught up, then passed me. For the last dozen or so clues, we went back and forth, the lead changing at least half a dozen times. It was just as exhilarating and challenging as playing against a human, and just as frustrating when Watson buzzed in last on the final clue, knew about some lake in Minnesota, and passed me by 100. That was enough to make the difference; we both got the final answer right, and Watson beat me by a freakin' dollar.

I'm smart enough to know that the Watson I faced wasn't nearly as skilled as the Watson that is beating the crap out of Brad Rutter and Ken Jennings. Its buzzing timing, already very sharp, is even faster now, which makes this contest look unfair. But it isn't unfair. Watson has the same amount of time as the humans to find that right question, and it isn't much time, just a few seconds. That is where it has improved. It "gets" more clues and hones in on the certainty that allows it to avoid some of the disastrous mistakes I saw a year ago.

One of them led to the funniest moment of the day. The clue was in the "Funny People" category, about the star of "The Hot Chick" and "The House Bunny". I didn't know at the time that it was Anna Faris. The other human buzzed in first, but he misread the clue and thought it was looking for an actor, not an actress, so he said, "Who is Rob Schneider?" Watson, which seemed to have some problems with pronouns (I'm guessing that in its focus on "key" words, it didn't give priority to pronouns because searching for them wouldn't narrow down the possibilities fast enough), buzzed in next and said, "Who is Rob Schneider?" The host, a funny fellow named Todd Crain who would make a fine successor to Alex Trebek, snickered at that and turned to me. "Do you want to take a shot at it?" I shook my head and replied, "nope--you're not gonna get me to say Rob Schneider!" My wife, sitting in the audience, said that the IBM geeks in the control booth cracked up over that one.

Even though Watson repeated a wrong question the other night, there is nothing funny about the way it is crushing the game's two legends. However sobering it might be to our collective human egos, it is also impressive and inspiring--if IBM can make good on their intention all along, which was not merely to create a machine to win a particular game (as it did with Deep Blue), but rather to create a template for using machines to absorb, understand, and process information that is too massive for humans to process easily. Watson is correlating huge databases, making connections, and deducing the most useful responses. Of the possible uses discussed in the various documentary footage accompanying the telecast and the "Nova" special, the medical applications are the most promising to me. What doctor can assess 20,000 cases of a particularly elusive disease? Watson can, and will. Someday its progeny might save your life, or mine. And then I won't begrudge it beating me by that damn dollar on the last clue, or making chumps out of champions.


Robin said...

An interesting post. I watched this show of course, and I had some mixed feelings about it. I didn't feel like there were many, if any, questions where Brad and Ken did not know the answer. They just couldn't buzz in before Watson. I also thought that Trebek mentioned that the "answers" were being texted to Watson as opposed to speech recognized. In a way, that adds another level of complexity to the humans, as it would need to understand the words as stated as well. Adding that touch would make Watson that much more impressive.

I still thought what Watson did was amazing, but I am not sold that he "dominated" the way the score showed. If it had been a format of entirely Double Jeopardy questions, I think the humans would have won in a landslide.

Cliff Blau said...

Thanks for sharing your story, Gabe. I agree Watson's main advantage was its ability to ring in quickly; the human players seemed to do better in that regard the second game, and Ken was able to play it almost even that day.

I'm not convinced that Watson's ability to "understand" natural language is so critical to something like diagnosing diseases; it would seem more critical in an area such as technical support. Be interesting to see if in a few years, all those workers in India who got our outsourced jobs get replaced by computers.