Study Finds AI Falls Short When Analyzing Medical Data
|
By HospiMedica International staff writers Posted on 20 Nov 2018 |
A study conducted at the Icahn School of Medicine at Mount Sinai (New York, NY, USA) has found that artificial intelligence (AI) tools trained to detect pneumonia on chest X-rays suffered significant decreases in performance when tested on data from outside health systems. These findings suggest that unless AI in the medical space is carefully tested for performance across a wide range of populations, the deep learning models may not perform as accurately as expected.
Amidst the growing interest in the use of computer system frameworks called convolutional neural networks (CNN) to analyze medical imaging and provide a computer-aided diagnosis, recent studies have found that AI image classification may not generalize to new data as well as commonly portrayed. The researchers at the Icahn School of Medicine at Mount Sinai assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions. They chose to study the diagnosis of pneumonia on chest X-rays due to its common occurrence, clinical significance, and prevalence in the research community.
The researchers found that in three out of five comparisons, the performance of CNNs in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower as compared to X-rays from the original health system. However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution. The researchers found that the key problem in using deep learning models in medicine was their use of a massive number of parameters, making it challenging to identify specific variables driving predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging.
“Our findings should give pause to those considering rapid deployment of AI platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” said senior author Eric Oermann, MD, Instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”
“If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis,” said first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai.
Related Links:
Icahn School of Medicine at Mount Sinai
Amidst the growing interest in the use of computer system frameworks called convolutional neural networks (CNN) to analyze medical imaging and provide a computer-aided diagnosis, recent studies have found that AI image classification may not generalize to new data as well as commonly portrayed. The researchers at the Icahn School of Medicine at Mount Sinai assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions. They chose to study the diagnosis of pneumonia on chest X-rays due to its common occurrence, clinical significance, and prevalence in the research community.
The researchers found that in three out of five comparisons, the performance of CNNs in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower as compared to X-rays from the original health system. However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution. The researchers found that the key problem in using deep learning models in medicine was their use of a massive number of parameters, making it challenging to identify specific variables driving predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging.
“Our findings should give pause to those considering rapid deployment of AI platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” said senior author Eric Oermann, MD, Instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”
“If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis,” said first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai.
Related Links:
Icahn School of Medicine at Mount Sinai
Latest AI News
- AI Analysis of Pericardial Fat Refines Long-Term Heart Disease Risk
- Machine Learning Approach Enhances Liver Cancer Risk Stratification
- New AI Approach Monitors Brain Health Using Passive Wearable Data
- AI Tool Maps Early Risk Patterns in Bloodstream Infections
- AI Model Identifies Rare Endocrine Disorder from Hand Images
- AI Tool Promises to Reduce Length of Hospital Stays and Free Up Beds
- Machine Learning Model Cuts Canceled Liver Transplants By 60%
Channels
Artificial Intelligence
view channelAI Analysis of Pericardial Fat Refines Long-Term Heart Disease Risk
Accurately identifying long-term cardiovascular disease risk in asymptomatic adults remains challenging for clinicians. Missed or underestimated risk delays preventive therapy and increases the chance... Read more
Machine Learning Approach Enhances Liver Cancer Risk Stratification
Hepatocellular carcinoma, the most common form of primary liver cancer, is often detected late despite targeted surveillance programs. Current screening guidelines emphasize patients with known cirrhosis,... Read moreCritical Care
view channel
Noninvasive Monitoring Device Enables Earlier Intervention in Heart Failure
Hospitalizations for heart failure with preserved ejection fraction (HFpEF) remain common because lung congestion often worsens before symptoms prompt treatment changes. Missed early decompensation... Read more
Automated IV Labeling Solution Improves Infusion Safety and Efficiency
Medication administration in high-acuity settings is often complicated by multiple concurrent infusions, making accurate line identification essential. In a 10-hospital intensive care unit study, 60% of... Read moreSurgical Techniques
view channel
Ultrasound Technology Aims to Replace Invasive BPH Procedures
Benign prostatic hyperplasia (BPH) is a frequent cause of lower urinary tract symptoms in aging men and often requires invasive procedures or prolonged recovery. With prevalence expected to rise as populations... Read more
Continuous Monitoring with Wearables Enhances Postoperative Patient Safety
Postoperative hypoxemia on general surgical wards is common and often missed by intermittent vital sign checks. Undetected low oxygen levels can delay recovery and raise the risk of complications that... Read morePatient Care
view channel
Wearable Sleep Data Predict Adherence to Pulmonary Rehabilitation
Chronic obstructive pulmonary disease (COPD) is a long-term lung disorder that makes breathing difficult and often disturbs sleep, reducing energy for daily activities. Limited engagement in pulmonary... Read more
Revolutionary Automatic IV-Line Flushing Device to Enhance Infusion Care
More than 80% of in-hospital patients receive intravenous (IV) therapy. Every dose of IV medicine delivered in a small volume (<250 mL) infusion bag should be followed by subsequent flushing to ensure... Read moreHealth IT
view channel
EMR-Based Tool Predicts Graft Failure After Kidney Transplant
Kidney transplantation offers patients with end-stage kidney disease longer survival and better quality of life than dialysis, yet graft failure remains a major challenge. Although a successful transplant... Read more
Printable Molecule-Selective Nanoparticles Enable Mass Production of Wearable Biosensors
The future of medicine is likely to focus on the personalization of healthcare—understanding exactly what an individual requires and delivering the appropriate combination of nutrients, metabolites, and... Read moreBusiness
view channel







