Wikipedia Page Views Could Predict Disease Outbreaks
By HospiMedica International staff writers Posted on 24 Nov 2014 |
A new study suggests that Wikipedia access data could be an effective tool for forecasting disease outbreaks up to a month in advance.
Researchers at the Los Alamos National Laboratory (NM, USA) reviewed access logs to disease-related Wikipedia pages between 2010 and 2013. They mapped the languages the information was written in, using this as an approximate measure for people's locations. Using linear statistical techniques models, the researchers then tested 14 location-disease combinations to demonstrate the feasibility of the techniques built upon the data stream, and compared the results with disease outbreak information provided by national health surveillance teams.
The researchers found three broad classes of results. In eight cases, there was a usefully close match between the model's estimate and the official data. This statistical technique allowed them to predict emerging influenza outbreaks in the United States, Poland, Japan, and Thailand, dengue fever spikes in Brazil and Thailand, and a rise in tuberculosis cases in Thailand.
In three cases, the model failed, apparently because patterns in the official data were too subtle to capture, and in a further three, the model failed apparently because the signal-to-noise ratio (SNR) in the Wikipedia data was too subtle to capture. The researchers suggested that disease incidence may also be changing too slowly to be evident in the chosen analysis period. The results also suggest that these models can be used even in places with no official data upon which to build models. The study was published on November 13, 2014, in PLoS Computational Biology.
“A global disease-forecasting system will change the way we respond to epidemics. In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast,” said lead author Sara Del Valle. “The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal.”
The researchers added that it is important to recognize demographic biases inherent in Wikipedia and other social internet data sources such as age, gender, and education. Most importantly, the data strongly over-represent people and places with good internet access and technology skills.
Related Links:
Los Alamos National Laboratory
Researchers at the Los Alamos National Laboratory (NM, USA) reviewed access logs to disease-related Wikipedia pages between 2010 and 2013. They mapped the languages the information was written in, using this as an approximate measure for people's locations. Using linear statistical techniques models, the researchers then tested 14 location-disease combinations to demonstrate the feasibility of the techniques built upon the data stream, and compared the results with disease outbreak information provided by national health surveillance teams.
The researchers found three broad classes of results. In eight cases, there was a usefully close match between the model's estimate and the official data. This statistical technique allowed them to predict emerging influenza outbreaks in the United States, Poland, Japan, and Thailand, dengue fever spikes in Brazil and Thailand, and a rise in tuberculosis cases in Thailand.
In three cases, the model failed, apparently because patterns in the official data were too subtle to capture, and in a further three, the model failed apparently because the signal-to-noise ratio (SNR) in the Wikipedia data was too subtle to capture. The researchers suggested that disease incidence may also be changing too slowly to be evident in the chosen analysis period. The results also suggest that these models can be used even in places with no official data upon which to build models. The study was published on November 13, 2014, in PLoS Computational Biology.
“A global disease-forecasting system will change the way we respond to epidemics. In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast,” said lead author Sara Del Valle. “The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal.”
The researchers added that it is important to recognize demographic biases inherent in Wikipedia and other social internet data sources such as age, gender, and education. Most importantly, the data strongly over-represent people and places with good internet access and technology skills.
Related Links:
Los Alamos National Laboratory
Latest Health IT News
- Machine Learning Model Improves Mortality Risk Prediction for Cardiac Surgery Patients
- Strategic Collaboration to Develop and Integrate Generative AI into Healthcare
- AI-Enabled Operating Rooms Solution Helps Hospitals Maximize Utilization and Unlock Capacity
- AI Predicts Pancreatic Cancer Three Years before Diagnosis from Patients’ Medical Records
- First Fully Autonomous Generative AI Personalized Medical Authorizations System Reduces Care Delay
- Electronic Health Records May Be Key to Improving Patient Care, Study Finds
- AI Trained for Specific Vocal Biomarkers Could Accurately Predict Coronary Artery Disease
- First-Ever AI Test for Early Diagnosis of Alzheimer’s to Be Expanded to Diagnosis of Parkinson’s Disease
- New Self-Learning AI-Based Algorithm Reads Electrocardiograms to Spot Unseen Signs of Heart Failure
- Autonomous Robot Performs COVID-19 Nasal Swab Tests
- Statistical Tool Predicts COVID-19 Peaks Worldwide
- Wireless-Controlled Soft Neural Implant Stimulates Brain Cells
- Tiny Polymer Stent Could Treat Pediatric Urethral Strictures
- Human Torso Simulator Helps Design Brace Innovations
- 3D Bioprinting Rebuilds the Human Heart