Thursday, February 17, 2011

Thoughts on Watson: the Jeopardy robot Part III



OK. Watson won. He won kinda big time, and I was gonna post a picture of a chain-link-fence-holding Sarah Conner engulfed in flames on judgement day, but it was sort of gruesome and anyway I don't really feel that way about this particular robot triumph. That said, burying an arms cache in the Mexican desert and maybe learning a few tips on how to outsmart logic-beholden machines via a re-read of Asimov's I Robot might not be a terrible idea. (no, watching the movie won't help)

Before we get into the gloomy excitement of computers getting really really smart all of a sudden,  let's quickly discuss the saving grace for humanity here which is: Ken Jennings managed a (reasonably funny and certainly appropriate) Simpsons reference within his final jeopardy answer!

We're still good at something! Being funny! Take that, machines!

In case you're not a Simpsons nerd, see the classic Kent Brockman clip below.



Maybe IBM's next challenge should be to develop a robot that wins Last Comic Standing. No, seriously, give it a shot, IBM. It's even OK if it uses props, or ventriloquism, or a redneck-y catch phrase, but I think your best bet is to have it write 45 minutes about the quirks of being robotic, then stock the audience with robots. But perhaps I digress.

Back to the show.  What can I say except to ask more questions about how Watson works? I find this stuff sort of endlessly fascinating.

Why does it seem like Watson's so much better at Double Jeopardy than Single?

Does Watson benefit from any sort of momentum of confidence factor after a series of correct answers (like a human would)? (Presumably not...so, followup question: Would that be an inherently bad thing to build into a computer's programming? Discuss)


How did Watson put together Moldavia and Wallachia to get Bram Stoker but not get the Chicago airport question?


Let me jump into that one.


As a reasonably intelligent human being, I gauged both of these (final jeopardy) questions as fairly easy, although I would agree that the Chicago one was a little tougher. Although the inherent difficulty of any question is debatable and of course skewed by whether someone knows or does not know the answer, I feel my opinion is at least somewhat founded in objective analysis and also supported anecdotally by the fact that both human players got both questions correct.

I'm being presumptuous about how Watson thinks through these questions, but what the hell (bbq). To get to last night's final Jeopardy solution (clue paraphrase: "So and so's published anthropological survey of Moldavia and Wallachia was the inspiration for this author's most famous work"), I feel Watson must have had to throw "Moldavia" and "Wallachia" into the gears and realize (quickly) they were a reference to Romania. (Wikipedia informs me that these comprise the north and south historical regions of what is now modern day Romania). From there the path gets a little murky though. The category was "19th Century Authors". So now Watson must filter through a list of authors with whom Romania is associated?  Or does he, in his geographical search come up with keyword "Transylvania"...which when filtered through the category title yields "Bram Stoker", author of Dracula. Perhaps "Dracula" has to occur first to Watson but I sort of doubt it. This all seems pretty reasonable, except if this type of database list generation and subsequent list cross referencing is how Watson arrives at answers...
then why not nail that Chicago airport one?

Pull up a list of major airports, filter by US cities (or don't, even) and cross reference with cities with 2 airports. Even without the US filter (or if the US/American filter includes all of the cities in the Americas) this ought to yield a fairly short list.  Then filter by historical names, and see which are associated with WWII.  As I mentioned, Toronto's airport is Lester Pearson International (YYZ), but he's got no direct association with WWII (that I can see). A good but incorrect guess would have New York City (because of JFK)
I find all this puzzling, is all. Hopefully we'll get more coverage and learn more about how Watson ticks in the near future.  For now I'll take our triumphs where we can get them, but more pertinently...where's my holodeck?

Wednesday, February 16, 2011

Thoughts on Watson: the Jeopardy robot Part II

Last night was sort of hard to watch, right? Watson mopped the floor with the humans and went out to a $20,000 lead. Last night Watson got both Daily Doubles and eventually played in Final Jeopardy. The main question I want to ask is:

How does Watson decide what to wager?


Presumably it's an algorithm based on a simple (quick) examination of Watson's stores of info on any given category, crossed somehow with the game's current scores and score gaps, etc... Does Watson (actively) take into account the remaining board squares' monetary values when deciding what to wager? Does Watson learn about playing styles as the game goes on? In other words, does Watson know he's winning and is likely to continue to win, or only that he's currently ahead in points?

What jumps out to me most about last night was the final Jeopardy clue.  The category was US Cities, and the clue had to do with airport names.  Paraphrase is "what US city's largest airport is named for a WWII hero and second largest named for a WWII battle?" Ken, Brad, and I came up with the correct answer as 'Chicago' (references to airports O'Hare and Midway) but Watson flubbed it pretty big with "Toronto".

For starters, Toronto's not in the United States....so it seems genuinely strange to me that Watson could try to pass that one off as correct, even as a shaky guess. Secondly, Watson only wagered $947 bucks on the clue.  One obvious explanation of that could be that Watson was way way ahead in the score, so oftentimes players in that position won't jeopardize (ha!) the guaranteed win in a gambit to accumulate money...however those helping normal human contestants give them a series of scenarios (at least, this is my understanding as a viewer) to help them decide how much to wager.

Watson was almost $20,000 dollars ahead, so he could have bet almost 20 times as much and still ensured his overall win in this round. Watson did no such thing, at least in part because the two day game is cumulative and so he must have it in his programming to generate as much money as possible...

This is all a roundabout way to get back to what made the category "US cities" appear daunting to Watson, and why he guessed incorrectly on a clue that at least three humans got easily. Why be intimidated by this category, and then furthermore why get it so wrong? It seems a cursory cross reference of major US airport names with WWII associated people would have quickly yielded the Chicago answer.  Also note that Toronto's largest airport (YYZ, the one Rush wrote the song about) is called Lester Pearson Airport after a Canadian prime minister.

More questions than answers...looking forward to the conclusion.

"too clever is stupid, dude."
 - Icepick (from Skate or Die II)

Tuesday, February 15, 2011

Thoughts on Watson: the Jeopardy robot Part I



Last night was part 1 of 3 for the competition between IBM's Watson and the two Jeopardy wizards only as known as Ken Jennings and Brad Rutter. I've been anticipating it since I first heard it was happening late last year, but I gotta admit I hadn't really thought that hard about it beyond a standard "holy shit' feeling.



I've long been a proponent of Star Trek style technology, despite it's inherent evilness and capability to trap us inside Agatha Christie trains or make us obsessed with gaming technology. I've publicly stated that I'll have the matrix plug installed as soon as it becomes available (read: covered by insurance) because once that thing exists, who knows what's what in any real sense so why not jump on the jelly wagon post haste and ride it to freedom (or slavery). Watson appears to be an important step in this lineage.

Watson takes the questions and spits out answers in plain (question-oriented) english after ringing in on the buzzer. I had been under the impression it would be analyzing Alex's actual speech in order to receive the clue info, but apparently that's not the case.  Trebek said Watson receives the clue as a text file simultaneously to it being revealed to the player. That's slightly less impressive--(but still cool) and it brings up a couple questions that weren't answered last night:

Does Watson get the text file entered at the moment the question gets "revealed" on TV (and presumably to the contestants as well) or immediately after Alex is finished reading it?

My guess is the former, which slants the game significantly in Watson's favor, since it's reading speed has got to be worlds faster than any human. In reality, the time it takes Alex to read through a clue is almost an eternity for a machine so gosh darn sophisticated.  But OK, still pretty cool.

Here's another relevant question though:

How does Watson receive the category information?


This is more vague, I guess. Watson presumably receives the categories as text files at the same time as contestants, but this point was not specified. It's important, I think, because as soon as you hear the categories your (human) brain is priming it's internal filing cabinet for what might come up in the context of said category. So assuming Watson gets them simultaneously with the humans. I wonder how the programming parsed that information specifically, or if Watson chooses how to move through the board based in part on a perceived wealth or paucity of information in any given category. Maybe tonight's broadcast will reveal more of this type of stuff.

A couple other observations I'm hoping to explore after seeing more of the game:

I noticed that in one instance, Ken Jennings answered something incorrectly (in the "what decade?" category) and then that Watson rang in and gave the same (incorrect) answer directly afterward. Did Watson's developers miss a fairly obvious aspect of the game with this, or was this instead an extremely difficult programming obstacle?  Or perhaps they are scrambling as I type to fix this glitch.

The visualization of Watson's thought process was pretty intriguing.  The concept of a confidence threshold was covered and then presented as a bar graph on the screen. Although Watson was typically quite confident in the answer (and subsequently correct) there were a few instances where Watson wasn't particularly certain which of three top answers to select. Here's one thing it made me wonder:

Does Watson's option weighting and/or overall confidence in any given answer correlate to what us humans might call "the overall difficulty" of the question? 

Although Jeopardy works hard to weight the difficulty of the clues accurately with a monetary value, it's easy to agree that this is a flawed system. What makes question difficult? (besides not knowing the answer).
Example: In the "Literary Villains APB" category, the mid-level clue was indicating the villain from the Harry Potter series: Voldemort.  Watson incorrectly surmised "Harry Potter" and the confidence bars indicated a degree of confusion (although the threshold was not met). Two pieces of information in regards to that are worth thinking about: 1. the clue didn't mention "Harry Potter"  and 2. "Voldemort" is (somewhat) rarely referred to by his actual name in this series of books.
Watson must be using a sort of "fill in the blank" algorithm which correctly identified other words in the clue (like Hogwarts) as being related to Harry Potter, while also correctly identifying that "Harry Potter: himself did not appear in the clue. This may be an interesting, somewhat non-intuitive piece of the "what makes a clue difficult" puzzle.
The second point however, speaks more directly to what it means for a question to be difficult. There is simply less information connecting that particular answer to the clue, less repetition of the specific idea, fewer linguistic bridges.

I'd like to get more into this avenue of thought after I watch Double Jeopardy tonight. Right now, Watson is tied for the lead at $5000.  Humanity still has hope, and the Holodeck may still be a ways off I guess.

Note: wanted to get my thoughts down first, without reading others' take...but now that I did here's another article, which makes mention of an interesting contextual aspect of correctly answering that I didn't recount above--very interesting stuff...referred to as "leg-gate" in the comments.