Study Finds AI Falls Short When Analyzing Medical Data

By HospiMedica International staff writers
Posted on 20 Nov 2018

A study conducted at the Icahn School of Medicine at Mount Sinai (New York, NY, USA) has found that artificial intelligence (AI) tools trained to detect pneumonia on chest X-rays suffered significant decreases in performance when tested on data from outside health systems. These findings suggest that unless AI in the medical space is carefully tested for performance across a wide range of populations, the deep learning models may not perform as accurately as expected.

Amidst the growing interest in the use of computer system frameworks called convolutional neural networks (CNN) to analyze medical imaging and provide a computer-aided diagnosis, recent studies have found that AI image classification may not generalize to new data as well as commonly portrayed. The researchers at the Icahn School of Medicine at Mount Sinai assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions. They chose to study the diagnosis of pneumonia on chest X-rays due to its common occurrence, clinical significance, and prevalence in the research community.

The researchers found that in three out of five comparisons, the performance of CNNs in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower as compared to X-rays from the original health system. However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution. The researchers found that the key problem in using deep learning models in medicine was their use of a massive number of parameters, making it challenging to identify specific variables driving predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging.

“Our findings should give pause to those considering rapid deployment of AI platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” said senior author Eric Oermann, MD, Instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”

“If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis,” said first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai.

Related Links:
Icahn School of Medicine at Mount Sinai