Traditional Diagnostic Decision Support Systems Outperform Generative AI for Diagnosing Disease
Posted on 02 Jun 2025
Despite the growing popularity of generative artificial intelligence (AI) tools like ChatGPT and Gemini in healthcare, traditional diagnostic decision support systems (DDSSs) continue to outperform them in clinical accuracy. Now, a new study comparing these AI models has found that both types of AI tools have the potential to augment one another to better inform treatment decisions.
Computer scientists at Massachusetts General Hospital (MGH, Somerville, MA, USA) had developed DXplain, an expert DDSS that leverages a vast database of disease profiles and clinical findings to assist clinicians in generating and ranking potential diagnoses. In contrast, large language models (LLMs) like ChatGPT and Gemini generate responses based on patterns in language data, lacking the structured medical knowledge base of systems like DXplain. In the study, researchers evaluated the diagnostic capabilities of DXplain, ChatGPT, and Gemini using 36 diverse patient cases, both with and without lab data.
The results, published in JAMA Network Open, showed that with lab data, DXplain correctly identified the diagnosis 72% of the time, compared to 64% for ChatGPT and 58% for Gemini. Without lab data, DXplain maintained a higher accuracy rate of 56%, outperforming ChatGPT (42%) and Gemini (39%). Notably, each system identified certain diagnoses that the others missed, suggesting that a hybrid approach could enhance diagnostic accuracy. Researchers propose integrating the structured reasoning of DDSSs like DXplain with the language processing capabilities of LLMs to create more robust diagnostic tools.
“These systems can enhance and expand clinicians’ diagnoses, recalling information that physicians may forget in the heat of the moment and isn’t biased by common flaws in human reasoning,” said corresponding author Mitchell Feldman, MD. “And now, we think combining the powerful explanatory capabilities of existing diagnostic systems with the linguistic capabilities of large language models will enable better automated diagnostic decision support and patient outcomes.”