We use cookies to support features like login and allow trusted media partners to analyse aggregated site usage. Keep cookies enabled to enjoy the full site experience. By browsing our site with cookies enabled, you are agreeing to their use. Review our cookies information for more details.
We use cookies to support features like login and allow trusted media partners to analyse aggregated site usage. Keep cookies enabled to enjoy the full site experience. By browsing our site with cookies enabled, you are agreeing to their use. Review our cookies information for more details.
We use cookies to support features like login and allow trusted media partners to analyse aggregated site usage. Keep cookies enabled to enjoy the full site experience. By browsing our site with cookies enabled, you are agreeing to their use. Review our cookies information for more details.
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Review our cookies information for more details
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Review our cookies information for more details
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Review our cookies information for more details
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Review our cookies information for more details
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Review our cookies information for more details
Babbage

Science and technology

More Babbage in "Jeopardy!"

Odd facts and figures

Nov 20th 2012, 17:20 by G.F. | SEATTLE

BABBAGE had $5,000 to his credit in the final moments of his third match of the game show "Jeopardy!" that aired a few weeks ago. The category was 19th-century female authors. Trailing the leader by $3,600 with a couple of minutes left in the game, he was faced with a "Daily Double" wager that allowed him to bet from $5 to the full $5,000 in his kitty. A fraction of a second later he bet the whole lot. Would IBM's Watson supercomputer, which in 2011 defeated the programme's two all-time best human contestants, have done the same?

Almost certainly, according to Gerald Tesauro, a researcher at IBM, and four colleagues. Earlier this year they published a paper in the firm's research journal detailing the approach to wagering that helped Watson beat Ken Jennings (the longest-playing contestant) and Brad Rutter (its highest-grossing winner) in a series of televised matches.

The emphasis at the time was on the extraordinary breakthroughs in natural-language processing that allowed real-time parsing of the clues and (mostly correct) responses, which the programme requires to be phrased in the form of questions. But the authors, part of a team of two dozen scientists who worked on the "Jeopardy!" project over four years, explain that tactical game-play choices based on probabilities of the answers being correct also played a big part. Watson, like any player, had to pick which hidden clue at a given dollar value to select, predict where the doubling clues might lie and calculate amounts to wager during Daily Doubles. Dr Tesauro says the computer had a big advantage over human contestants, even the best of whom focus on factual knowledge and largely ignore the game's probabilistic facets (which Babbage can confirm).

For a start, Dr Tesauro's group put a number on what viewers and players know: that Daily Doubles are more likely to hide behind certain squares on the board than others. In the second round, Double Jeopardy, the show's producers never place them both in the same category. In both rounds, producers located them typically in one of the highest three dollar values. In either round, one often appears in the first category column. From this, a prediction model helped Watson pick Daily Doubles when it was most advantageous.

Researchers also modelled typical "Jeopardy!" players to simulate games against Watson by mining the fan-built J-Archive repository of decades' worth of clues and responses. But rather than look at the content of the trivia, the researchers looked at the order of selection, wagers and accuracy. The team created models of average players, champions (semifinalists from an annual tournament of the year's best players) and grand champions (those who had won the most games in the current show's run).

Unsurprisingly, champions and grand champions ring in for regular clues much more often than average contestants, and are more accurate. But the difference is much higher in Daily Double and the last round called Final Jeopardy. For instance, a Ken Jennings would be correct 82% of the time with a Daily Double wager, while the average contestant had the right response in only 64% of cases. "It gave me a new way of appreciating just how good Ken Jennings is", says Dr Tesauro.

Dr Tesauro says the IBM system wound up playing a very different game than its human counterparts. Something similar has been seen in more traditional games, such as chess, backgammon and bridge, once algorithms became robust enough to challenge master-class players. For instance, when Watson wrested control of category selection it ventured all across the board rather than work its way through a single category (which the show's producers encourage but cannot enforce).

While typical contestants wager even amounts for Daily Doubles, like $1,000, and rarely above $5,000, Watson bet sums like $1,246 or $6,435. Dr Tesauro explains that the software's clue analysis component would score how confident Watson was in an accurate answer which would be used to shape the size of a wager. The wagers factored in the rounds' limited time, too. The paper also reveals that picking the Daily Double clues at the right moment can tip the scales. In a simulation of 200,000 games against grand champions, Watson could win 61% of the time with top-to-bottom selection, and 68% of the time by using multiple methods to pinpoint the wagering rectangles.

This is all fun and games but, as IBM said at the time, to a point. Dr Tesauro says that the ability to model—and beat—human decision-making processes has applications in any field where different decisions lead to a multitude of possible outcomes. In a paper for an upcoming conference, Dr Tesauro and his colleagues extend their "Jeopardy!" thinking into health care, where varying incentives for doctor compensation and disparate treatment possibilities were weighed to find the optimum solution. Such software would potentially let policymakers, regulators and hospital managers model scenarios without having to test them out on real patients.

As for "Jeopardy!", Dr Tesauro expects (and hopes) that Watson's approach will eventually colour human players' preparation and gameplay. He suggests that alongside cramming knowledge, potential contestants should mug up on game theory. (He also acknowledges that "Prisoner of Trebekistan", a book written by Bob Harris about his multiple campaigns, dovetails neatly with their more exhaustive data analysis.)

So, how did Babbage do? Faced with recalling the pen name of Amantine Lucile Aurore Dupin, his addled brain came up with "George Saunders" (a contemporary author), obviously incorrect. The remnants of his cognitive function translated this to "George Sands". Sadly, the answer was "Sand". Your correspondent lost the lot (though he regained $2,000 before Final Jeopardy; the right answer there was not enough to win). Dr Tesauro ran the scenario through algorithms that simulate only human players, and says that with the degree of confidence Babbage had in the category risking it all was a good strategy. Some solace.

Readers' comments

The Economist welcomes your views. Please stay on topic and be respectful of other readers. Review our comments policy.

guest-ljjoinj

Greetings. Sorry to be late to the party. I am Michael Holmes, Program Director of IBM's Watson Solutions group. Glad to respond to any questions. A few comments: the Jeopardy! quiz show was just a proof of concept demonstration around the real goal of putting Watson to work in areas important to society and business... which is being done today. We have pilots with leading healthcare, financial services, and medical research organizations. The first of these has gone into production and we expect to commercially release our first offering mid-2013. Personally I am most excited about our work with Memorial Sloan Kettering to teach Watson to help oncologists make more informed, evidence-based decisions when diagnosing and treating cancer. Looking closely at the article, you'll see that Dr Tesauro's paper on 'incentives for MD compensation' is an academic concept related to game theory and unrelated to Watson itself (past, present, or future). Watson is about giving professionals better access to information to make more informed decisions. It's not about making decisions itself. Sikko6 - Watson did not retrieve information. That would imply a traditional database query. Watson consumed and understood unstructured text (like what you're reading now) and generated and evaluated hypotheses based on this understanding. As for button pushing speed, focusing attention there is to miss the significance of this first step into a new era of computing.

KCTwtZSRk5

The Watson matches were rigged from the start because Jeopardy gave Watson ridiculous buzzer speed. That had nothing to do with why the machine was created in the first place, and had everything to do with IBM basically bribing the show to get its computer on the air. The sad thing is that if the computer had been given average buzzer skills for a top level player on the show, it would have made for fascinating viewing and a good fight. As it was, it was a joke. At least I beat it twice in beta testing, and then shot down every spurious argument the IBM nitwit scientists put up in its defence. Seriously, the Watson games were a joke and should never have happened.

sikko6 in reply to KCTwtZSRk5

When it comes to information retrival, no human can compete against computers! But if the questions are asked in a way that you have to deduce at least a few steps, things will be completely different!

MosbyM in reply to G.F. - The Economist

But the buzzer speed is what makes the show competitive. I believe almost any "good" player would drub "greats" like Rutter and Jennings if they consistently and automatically got to answer all of the questions they wanted to answer. That's not a competition, nor is it the game (flawed as that game might be). I think it is/was a significant problem with the setup.

G.F. - The Economist in reply to MosbyM

The research shows that's not the case. Dr Tesauro et al found that average contestants not only buzz in successfully fewer times than champions and grand champions, but are less accurate. Gaining control of the board and having more wrong answers than better competitors will lose you the game just as much as never mastering buzzer control.

But the Watson situation was rightly criticized for allowing superhuman buzzer skills. After that point, though, one has to be impressed with IBM system being able to interpret and respond to clues appropriately.

MosbyM in reply to G.F. - The Economist

I take your point, and do agree that buzzer issue aside, Watson's ability to pull things together was a very fun moment to witness in the history of artificial intelligence.

On the buzzer issue however, I do think it really cheapens the "win". Because of that issue I don't consider the victory legitimate. I stand by my belief that if an average player were granted the superhuman (Watson ability) to automatically ring in first, and could do so near perfectly for all answers the player was "sure" of, that the player would beat "great" players. Especially at the higher levels, it is likely all three players know the answer to any given question, the "game" comes in who wins the buzzer, and who gets control of the board. Watson rigged that.

G.F. - The Economist in reply to MosbyM

Not to prolong this, but any average player against Ken Jennings and Brad Rutter would have lost even with buzzer timing. If they rang in only when they had a high degree of confidence and obtained the buzzer lock each time, they would have (based on archived play) been wrong sufficiently frequently to have a lower-dollar balance.

Consider this. If Player A (average) is correct 65% of the time when he rings in and Player K (grand champion) is correct 80% of the time, then Player A will regularly lose the dollar amount in question while turning over the potential to answer to Player K who will then add that amount to his score. In that scenario, Player K would never need to ring in at all, but always take the leavings.

Similarly, the spread for Daily Double answers is so high that average contestants are far more likely than a grand champion to answer incorrectly (see article), thus, on average, losing large amounts.

Buzzer control is important, but accuracy is significant part of the spread between average and superior Jeopardy players.

Where we should be impressed is that Watson could play at grand champion level, but then, I agree: the buzzer response should have been weighted against human factors.

sikko6 in reply to MosbyM

Complaining buzzer speed is rudicrous. If the buzzer speed is a problem, you should never compete aginst any machine. In the same logic, you don't compete 100m against sprinters who have enormous startup speed!

G.F. - The Economist in reply to sikko6

Except (and, honestly, not to belabour the point), Jeopardy has a built-in corrective. In the game, if one is incorrect ("negs" in the parlance of aficionados) on any regular answer, either of the other players may then buzz in. The wrong answer subtracts the value of the clue, and another player being correct (often helped by a wrong answer or a misspoken one) can accumulate that value, too.

jouris

It will be fascinating to see if Dr Tesauro and his colleagues' health care model can successfully determine what incentives for doctors (and/or hospitals, etc.) work best.

And what, in the doctors' minds, the current incentives actually are. I have the suspicion that the incentives that the bureaucrats think that they have built into various health care programs are not quite what the doctors perceive -- and so the results are not what was desired either.

About Babbage

In this blog, our correspondents report on the intersections between science, technology, culture and policy. The blog takes its name from Charles Babbage, a Victorian mathematician and engineer who designed a mechanical computer.

Advertisement

Economist video

Explore trending topics

Comments and tweets on popular topics

Advertisement

Products & events

Advertisement