Tuesday, February 15, 2011

Thoughts on Watson: the Jeopardy robot Part I



Last night was part 1 of 3 for the competition between IBM's Watson and the two Jeopardy wizards only as known as Ken Jennings and Brad Rutter. I've been anticipating it since I first heard it was happening late last year, but I gotta admit I hadn't really thought that hard about it beyond a standard "holy shit' feeling.



I've long been a proponent of Star Trek style technology, despite it's inherent evilness and capability to trap us inside Agatha Christie trains or make us obsessed with gaming technology. I've publicly stated that I'll have the matrix plug installed as soon as it becomes available (read: covered by insurance) because once that thing exists, who knows what's what in any real sense so why not jump on the jelly wagon post haste and ride it to freedom (or slavery). Watson appears to be an important step in this lineage.

Watson takes the questions and spits out answers in plain (question-oriented) english after ringing in on the buzzer. I had been under the impression it would be analyzing Alex's actual speech in order to receive the clue info, but apparently that's not the case.  Trebek said Watson receives the clue as a text file simultaneously to it being revealed to the player. That's slightly less impressive--(but still cool) and it brings up a couple questions that weren't answered last night:

Does Watson get the text file entered at the moment the question gets "revealed" on TV (and presumably to the contestants as well) or immediately after Alex is finished reading it?

My guess is the former, which slants the game significantly in Watson's favor, since it's reading speed has got to be worlds faster than any human. In reality, the time it takes Alex to read through a clue is almost an eternity for a machine so gosh darn sophisticated.  But OK, still pretty cool.

Here's another relevant question though:

How does Watson receive the category information?


This is more vague, I guess. Watson presumably receives the categories as text files at the same time as contestants, but this point was not specified. It's important, I think, because as soon as you hear the categories your (human) brain is priming it's internal filing cabinet for what might come up in the context of said category. So assuming Watson gets them simultaneously with the humans. I wonder how the programming parsed that information specifically, or if Watson chooses how to move through the board based in part on a perceived wealth or paucity of information in any given category. Maybe tonight's broadcast will reveal more of this type of stuff.

A couple other observations I'm hoping to explore after seeing more of the game:

I noticed that in one instance, Ken Jennings answered something incorrectly (in the "what decade?" category) and then that Watson rang in and gave the same (incorrect) answer directly afterward. Did Watson's developers miss a fairly obvious aspect of the game with this, or was this instead an extremely difficult programming obstacle?  Or perhaps they are scrambling as I type to fix this glitch.

The visualization of Watson's thought process was pretty intriguing.  The concept of a confidence threshold was covered and then presented as a bar graph on the screen. Although Watson was typically quite confident in the answer (and subsequently correct) there were a few instances where Watson wasn't particularly certain which of three top answers to select. Here's one thing it made me wonder:

Does Watson's option weighting and/or overall confidence in any given answer correlate to what us humans might call "the overall difficulty" of the question? 

Although Jeopardy works hard to weight the difficulty of the clues accurately with a monetary value, it's easy to agree that this is a flawed system. What makes question difficult? (besides not knowing the answer).
Example: In the "Literary Villains APB" category, the mid-level clue was indicating the villain from the Harry Potter series: Voldemort.  Watson incorrectly surmised "Harry Potter" and the confidence bars indicated a degree of confusion (although the threshold was not met). Two pieces of information in regards to that are worth thinking about: 1. the clue didn't mention "Harry Potter"  and 2. "Voldemort" is (somewhat) rarely referred to by his actual name in this series of books.
Watson must be using a sort of "fill in the blank" algorithm which correctly identified other words in the clue (like Hogwarts) as being related to Harry Potter, while also correctly identifying that "Harry Potter: himself did not appear in the clue. This may be an interesting, somewhat non-intuitive piece of the "what makes a clue difficult" puzzle.
The second point however, speaks more directly to what it means for a question to be difficult. There is simply less information connecting that particular answer to the clue, less repetition of the specific idea, fewer linguistic bridges.

I'd like to get more into this avenue of thought after I watch Double Jeopardy tonight. Right now, Watson is tied for the lead at $5000.  Humanity still has hope, and the Holodeck may still be a ways off I guess.

Note: wanted to get my thoughts down first, without reading others' take...but now that I did here's another article, which makes mention of an interesting contextual aspect of correctly answering that I didn't recount above--very interesting stuff...referred to as "leg-gate" in the comments.

2 comments:

  1. We are truly brothers on some level, because I have been analyzing ALL of that stuff too.

    Interesting to find out that they reshot the "leg" sequence, and I think that another issue of Fairness to the Robot™ arises from it.

    Usually, at least in the JEOPARDY! round (1), Alex will give normal, human contestants who aren't specific enough another shot. "Can you be more specific?"

    We were told that Watson's answer of "leg" was wrong, even though, technically, it's right-ish. In a way. So, maybe Watson doesn't have the ability to reconsider or expound on an answer...but what I dislike is that he wasn't even given the OPTION.

    Even if Alex KNOWS that Watson can't give a more-detailed answer, he should've given him the same chance usually afforded to any other contestant. And let Watson get confused or remain silent. Watson was treated differently, and that sort of thing makes this whole experiment more intriguing...but, also, kind of voids it, to me.

    Gotta' play by the same rules, and let the COMPUTER deal with handling situations for which it's ill prepared.

    So, Leg-gate bothers me.

    That Watson wasn't programmed to eliminate its top choice if that choice is proven wrong and either recalculate or proceed to the second most-likely answer was pretty obvious to even non-JEP nerds watching, I'm sure.

    These flaws make for fascinating, question-packed viewing. They also show that, maybe, Watson deserves another chance. Or, more importantly, HOW did these problems never arise in the many trial games Watson supposedly had?

    A basic familiarity with the quirks of the show should've revealed these potential gamebreakers.

    I'll be watching, though.

    ReplyDelete
  2. Yeah totally--and while I admit I didn't pick out and remember the "leg gate" thing, it brings us to an extremely interesting type of point.
    My next question would be is there or can there be a contingency plan for "clarification" given that Watson can't "hear" Alex. Like, it would have to be a standard "clarify" button input so that no one can be accused of prompting him too much.
    Can't believe they didn't account for the wrong answer repeat thing...but again that must speak to the limited amount of information Watson is actually receiving, and that no one is "real time" transcribing the show to Watson via text. It only gets the questions, right?

    Sidenote is, I love Alex and everything..but he can be pretty smug at times...and couldn't even help himself SCOLDING THE ROBOT! ("No, Ken said that" [stupid]). Classic!

    ReplyDelete