Nature | News

Translations

عربي

When Google got flu wrong

US outbreak foxes a leading web-based method for tracking seasonal flu.

Article tools

JOHN ANGELILLO/UPI/Newscom

The latest US influenza season is more severe and has caused more deaths than usual.

When influenza hit early and hard in the United States this year, it quietly claimed an unacknowledged victim: one of the cutting-edge techniques being used to monitor the outbreak. A comparison with traditional surveillance data showed that Google Flu Trends, which estimates prevalence from flu-related Internet searches, had drastically overestimated peak flu levels. The glitch is no more than a temporary setback for a promising strategy, experts say, and Google is sure to refine its algorithms. But as flu-tracking techniques based on mining of web data and on social media proliferate, the episode is a reminder that they will complement, but not substitute for, traditional epidemiological surveillance networks.

“It is hard to think today that one can provide disease surveillance without existing systems,” says Alain-Jacques Valleron, an epidemiologist at the Pierre and Marie Curie University in Paris, and founder of France’s Sentinelles monitoring network. “The new systems depend too much on old existing ones to be able to live without them,” he adds.

This year’s US flu season started around November and seems to have peaked just after Christmas, making it the earliest flu season since 2003. It is also causing more serious illness and deaths than usual, particularly among the elderly, because, just as in 2003, the predominant strain this year is H3N2 — the most virulent of the three main seasonal flu strains.

Traditional flu monitoring depends in part on national networks of physicians who report cases of patients with influenza-like illness (ILI) — a diffuse set of symptoms, including high fever, that is used as a proxy for flu. That estimate is then refined by testing a subset of people with these symptoms to determine how many have flu and not some other infection.

With its creation of the Sentinelles network in 1984, France was the first country to computerize its surveillance. Many countries have since developed similar networks — the US system, overseen by the Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia, includes some 2,700 health-care centres that record about 30 million patient visits annually.

But the near-global coverage of the Internet and burgeoning social-media platforms such as Twitter have raised hopes that these technologies could open the way to easier, faster estimates of ILI, spanning larger populations.

The mother of these new systems is Google’s, launched in 2008. Based on research by Google and the CDC, it relies on data mining records of flu-related search terms entered in Google’s search engine, combined with computer modelling. Its estimates have almost exactly matched the CDC’s own surveillance data over time — and it delivers them several days faster than the CDC can. The system has since been rolled out to 29 countries worldwide, and has been extended to include surveillance for a second disease, dengue.

Sources: Google Flu Trends (www.google.org/flutrends); CDC; Flu Near You

Google Flu Trends has continued to perform remarkably well, and researchers in many countries have confirmed that its ILI estimates are accurate. But the latest US flu season seems to have confounded its algorithms. Its estimate for the Christmas national peak of flu is almost double the CDC’s (see ‘Fever peaks’), and some of its state data show even larger discrepancies.

It is not the first time that a flu season has tripped Google up. In 2009, Flu Trends had to tweak its algorithms after its models badly underestimated ILI in the United States at the start of the H1N1 (swine flu) pandemic — a glitch attributed to changes in people’s search behaviour as a result of the exceptional nature of the pandemic (S. Cook et al. PLoS ONE 6, e23610; 2011).

Google would not comment on thisyear’s difficulties. But several researchers suggest that the problems may be due to widespread media coverage of this year’s severe US flu season, including the declaration of a public-health emergency by New York state last month. The press reports may have triggered many flu-related searches by people who were not ill. Few doubt that Google Flu will bounce back after its models are refined, however.

“You need to be constantly adapting these models, they don’t work in a vacuum,” says John Brownstein, an epidemiologist at Harvard Medical School in Boston, Massachusetts. “You need to recalibrate them every year.”

Brownstein is one of many researchers trying to harness the power of the web to establish sentinel networks made up not of physicians, but of ordinary citizens who volunteer to report when they or someone in their family are experiencing symptoms of ILI. ‘Flu Near You’, a system run by the HealthMap initiative co-founded by Brownstein at Boston Children’s Hospital, was launched in 2011 and now has 46,000 participants, covering 70,000 people.

  1. SLIDESHOW
    France's Sentinelles' network of doctors reporting cases of influenza-like illness has produced a clear picture of how the 2012–13  flu season has evolved.

    Sentinelles, UMR-S 707 Inserm, UPMC

  2. Sentinelles, UMR-S 707 Inserm, UPMC

  3. Sentinelles, UMR-S 707 Inserm, UPMC

  4. Sentinelles, UMR-S 707 Inserm, UPMC

  5. Sentinelles, UMR-S 707 Inserm, UPMC

  6. Sentinelles, UMR-S 707 Inserm, UPMC

  7. Sentinelles, UMR-S 707 Inserm, UPMC

  8. Sentinelles, UMR-S 707 Inserm, UPMC

  9. Sentinelles, UMR-S 707 Inserm, UPMC

Similar systems are springing up in Europe.For example, GrippeNet.fr, run by French researchers in collaboration with national health authorities, has attracted more than 5,500 participants since its creation a year ago, with 60–90 people joining each week.

Lyn Finelli, head of the CDC’s Influenza Surveillance and Outbreak Response Team, feels that such crowdsourcing techniques hold great promise, especially because the questionnaires are based on clinical definitions of ILI and so yield very clean data. And both Flu Near You and GrippeNet.fr have a representative age distribution of participants. The CDC has worked with Flu Near You on its development, and Finelli herself has signed up: “I submit my family’s data every week,” she says.

Other researchers are turning to what is probably the largest publicly accessible alternative trove of social-media data: Twitter. Several groups have published work suggesting that models of flu-related tweets can be closely fitted to past official ILI data, and various services, such as MappyHealth and Sickweather, are testing whether real-time analyses of tweets can reliably assess levels of flu.

But Finelli is sceptical. “The Twitter analyses have much less promise” than Google Flu or Flu Near You, she says, arguing that Twitter’s signal-to-noise ratio is very low, and that the most active Twitter users are young adults and so are not representative of the general public.

Michael Paul, a computer scientist at Johns Hopkins University in Baltimore, Maryland, disagrees. He is part of a team that is developing Twitter-based disease monitoring, and says that Google search-term data probably have just as much noise. And although Internet-based surveys may boast less noise, their smaller size means that they may be prone to sampling errors. “I suspect that passive monitoring of social media will always yield more data than systems that rely on people to actively respond to surveys, like Flu Near You,” Paul says.

To reduce the noise, the Johns Hopkins team has recently analysed a subset of a few thousand flu-related tweets, looking for patterns indicating which tweets showed that the tweeter was actually ill rather than simply, say, pointing to news articles about flu. They then used this information to retrain their models to weed out irrelevant flu-related tweets. Paul says that a paper in press will show that this greatly improves their results.

Already, web data mining and crowdsourced tracking systems are becoming a part of the flu-surveillance landscape. “I’m in charge of flu surveillance in the United States and I look at Google Flu Trends and Flu Near You all the time, in addition to looking at US-supported surveillance systems,” says Finelli. “I want to see what’s happening and if there is something that we are missing, or whether there is a signal represented somewhat differently in one of these other systems that I could learn from.”

Journal name:
Nature
Volume:
494,
Pages:
155–156
Date published:
()
DOI:
doi:10.1038/494155a

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.

Comments

5 comments Subscribe to comments

  1. Avatar for Carey Goldberg
    Carey Goldberg

    Seems fitting to add a bit of journalistic context to this interesting piece: Keith Winstein, a former Wall Street Journal reporter now a computer science grad student at MIT, deserves credit for first calling attention to the dramatic gap between Google Flu Trends and the gold-standard CDC numbers &#8212 and he did it when the hype about "worst flu season ever" was still in full swing. Please see his excellent graphs and interpretation here:
    http://commonhealth.wbur.org/2013/01/google-flu-trends-cdc
    http://commonhealth.wbur.org/2013/02/google-flu-tracker-wrong

  2. Avatar for Natalia Mantilla-Beniers
    Natalia Mantilla-Beniers

    "Crowdsourced" methods for monitoring flu are at least ten years old: DeGroteGriepMeting opened in the Netherlands in 2003 as a project seeking to engage the general public in science. This gave way to similar websites in Portugal (2005), Italy (2008) and Mexico (2009). Nowadays the original initiative has led to the opening of a whole network of websites which form the consortium of Epiwork in Europe. Participation is very enthusiastic in many of these countries and the potential for fascinating developments large.

    If you happen to live in Mexico, please join

    http://reporta.c3.org.mx/

    and help us monitor respiratory infections here! Your participation is highly valued.

    Thanks in advance :-)

  3. Avatar for Alexis Jones
    Alexis Jones

    The Google Flu Trends method was originally published in Nature too. The articles gives some more interesting information about how it works: http://www.nature.com/nature/journal/vaop/ncurrent/full/nature07634.html

    But wasn't this problem already known (and addressed by Google) weeks ago?

    http://www.nationaljournal.com/healthcare/why-google-flu-trends-will-not-replace-cdc-anytime-soon-20130125

    http://commonhealth.wbur.org/2013/01/google-flu-trends-cdc

    http://commonhealth.wbur.org/2013/02/google-flu-tracker-wrong

    http://muckrack.com/link/w9Gk/is-google-flu-trends-prescient-or-wrong

  4. Avatar for Dan Kaminsky
    Dan Kaminsky

    It is indeed quite the assumption you're making &#8212 that traditional monitoring represents ground truth, and any deviations are evidence of systematic error on the part of Google. Now, it's plausible that differentials may be coming from both severity (this flu was particularly bad) and a bystander effect (it's so bad, others are Googling how to make their family members better). But it's also plausible that, due to economic factors, fewer people can go to the doctor to thus have their ILU's witnessed, but more people can search for relief online.

    My preference is always going to be for multiple orthogonal sources of data. "â&#x80&#x9cI want to see whatâ&#x80&#x99s happening and if there is something that we are missing, or whether there is a signal represented somewhat differently in one of these other systems that I could learn from.â&#x80&#x9d -- Finelli's got it right, and I applaud you ending on this note.

  5. Avatar for Lin Wang
    Lin Wang

    A hot potato.
    But there also presents some articles unveiling that the prevalence reported by CDC largely underestimates the real prevalence. Such as "http://www.cdc.gov/h1.1flu/estimates/April_March_13.htm":

Top Story

Top 100

The top 100 papers

The most-cited research of all time is not what most people would expect, a Nature investigation shows.