Wikipedia Page Views Could Predict Disease Outbreaks

By HospiMedica International staff writers
Posted on 25 Nov 2014
A new study suggests that Wikipedia access data could be an effective tool for forecasting disease outbreaks up to a month in advance.

Researchers at the Los Alamos National Laboratory (NM, USA) reviewed access logs to disease-related Wikipedia pages between 2010 and 2013. They mapped the languages the information was written in, using this as an approximate measure for people's locations. Using linear statistical techniques models, the researchers then tested 14 location-disease combinations to demonstrate the feasibility of the techniques built upon the data stream, and compared the results with disease outbreak information provided by national health surveillance teams.

The researchers found three broad classes of results. In eight cases, there was a usefully close match between the model's estimate and the official data. This statistical technique allowed them to predict emerging influenza outbreaks in the United States, Poland, Japan, and Thailand, dengue fever spikes in Brazil and Thailand, and a rise in tuberculosis cases in Thailand.

In three cases, the model failed, apparently because patterns in the official data were too subtle to capture, and in a further three, the model failed apparently because the signal-to-noise ratio (SNR) in the Wikipedia data was too subtle to capture. The researchers suggested that disease incidence may also be changing too slowly to be evident in the chosen analysis period. The results also suggest that these models can be used even in places with no official data upon which to build models. The study was published on November 13, 2014, in PLoS Computational Biology.

“A global disease-forecasting system will change the way we respond to epidemics. In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast,” said lead author Sara Del Valle. “The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal.”

The researchers added that it is important to recognize demographic biases inherent in Wikipedia and other social internet data sources such as age, gender, and education. Most importantly, the data strongly over-represent people and places with good internet access and technology skills.

Related Links:

Los Alamos National Laboratory



Latest Health IT News