Understanding more about flu trends could reduce disease spread, but big data efforts have overestimated cases. Researchers at UC San Diego are working to combine traditional data sources with big data to improve the quality of prediction.
Even though the flu season is generally starting to abate around now, it is still having an impact, and affects around 20% of people worldwide. In the US alone, it leads to around 200,000 hospitalizations each year. Google Flu Trends shows in real time where people in the US are searching for information on flu and flu-like symptoms. Google uses big data to approximate disease burden and spread, producing estimates around two weeks earlier than surveillance data collected by the US Centers for Disease Control and Prevention (CDC). However, there have been reports that Google Flu Trends has overestimated the levels three years running, and cannot always differentiate between laboratory-confirmed cases and syndromic influenza-like illness (ILI).
The US San Diego researchers, in a paper published in Scientific Reports, combined CDC data and Google Flu Trends to create a model. This weighted Google Flu Trends predictions with a social network derived from CDC data on laboratory-tested cases of flu, and was therefore able to refine and improve Google Flu Trends predictions. The new model was able to predict infections up to one week in the future, as well as predicting current infections with the same accuracy as Google.
“Our innovation,” says Michael Davidson at UC San Diego, “is to construct a network of ties between different US health regions based on information from the CDC. We asked: Which places in years past got the flu at about the same time? That told us which regions of the country have the strongest ties, or connections, and gave us the analytic power to improve Google’s predictions.”
The team hopes that their model will be implemented by epidemiologists and data scientists to better target prevention and treatment efforts, especially during epidemics.