Researchers Publish Chest X-Ray Dataset to Train AI Models
|
By HospiMedica International staff writers Posted on 20 Feb 2019 |

Image: The CheXpert dataset of chest X-rays is designed for automated chest X-ray interpretation (Photo courtesy of Stanford University School of Medicine).
Researchers from the Stanford University School of Medicine (Stanford, CA, USA) have published CheXpert, a large dataset of chest X-rays and competition for automated chest X-ray interpretation, which features uncertainty labels and radiologist-labeled reference standard evaluation sets. Automated chest radiograph interpretation at the level of practicing radiologists could provide substantial benefit in many medical settings, from improved workflow prioritization and clinical decision support to large-scale screening and global population health initiatives.
CheXpert consists of 224,316 chest radiographs of 65,240 patients collected from Stanford Hospital that were performed between October 2002 and July 2017 in both inpatient and outpatient centers, along with their associated radiology reports. The dataset was co-released with MIMIC-CXR, a large dataset of 371,920 chest X-rays associated with 227,943 imaging studies sourced from the Beth Israel Deaconess Medical Center between 2011-2016.
One of the main obstacles in the development of chest radiograph interpretation models has been the lack of datasets with strong radiologist-annotated groundtruth and expert scores against which researchers can compare their models. CheXpert is expected to address that gap, making it easy to track the progress of models over time on a clinically important task.
The researchers have also developed and open-sourced the CheXpert labeler, an automated rule-based labeler to extract observations from the free text radiology reports to be used as structured labels for the images. This is expected to help other institutions extract structured labels from their reports and release other large repositories of data that will allow for cross-institutional testing of medical imaging models. The dataset is expected to help in the development and validation of chest radiograph interpretation models towards improving healthcare access and delivery worldwide.
Related Links:
Stanford University School of Medicine
CheXpert consists of 224,316 chest radiographs of 65,240 patients collected from Stanford Hospital that were performed between October 2002 and July 2017 in both inpatient and outpatient centers, along with their associated radiology reports. The dataset was co-released with MIMIC-CXR, a large dataset of 371,920 chest X-rays associated with 227,943 imaging studies sourced from the Beth Israel Deaconess Medical Center between 2011-2016.
One of the main obstacles in the development of chest radiograph interpretation models has been the lack of datasets with strong radiologist-annotated groundtruth and expert scores against which researchers can compare their models. CheXpert is expected to address that gap, making it easy to track the progress of models over time on a clinically important task.
The researchers have also developed and open-sourced the CheXpert labeler, an automated rule-based labeler to extract observations from the free text radiology reports to be used as structured labels for the images. This is expected to help other institutions extract structured labels from their reports and release other large repositories of data that will allow for cross-institutional testing of medical imaging models. The dataset is expected to help in the development and validation of chest radiograph interpretation models towards improving healthcare access and delivery worldwide.
Related Links:
Stanford University School of Medicine
Latest AI News
- AI Analysis of Pericardial Fat Refines Long-Term Heart Disease Risk
- Machine Learning Approach Enhances Liver Cancer Risk Stratification
- New AI Approach Monitors Brain Health Using Passive Wearable Data
- AI Tool Maps Early Risk Patterns in Bloodstream Infections
- AI Model Identifies Rare Endocrine Disorder from Hand Images
- AI Tool Promises to Reduce Length of Hospital Stays and Free Up Beds
Channels
Artificial Intelligence
view channelAI Analysis of Pericardial Fat Refines Long-Term Heart Disease Risk
Accurately identifying long-term cardiovascular disease risk in asymptomatic adults remains challenging for clinicians. Missed or underestimated risk delays preventive therapy and increases the chance... Read more
Machine Learning Approach Enhances Liver Cancer Risk Stratification
Hepatocellular carcinoma, the most common form of primary liver cancer, is often detected late despite targeted surveillance programs. Current screening guidelines emphasize patients with known cirrhosis,... Read moreCritical Care
view channel
Eye Imaging AI Identifies Elevated Cardiovascular Risk
Many adults at risk for atherosclerotic cardiovascular disease are not identified until they undergo formal primary care assessment. Delayed risk recognition can postpone initiation of statins and lifestyle... Read more
Noninvasive Monitoring Device Enables Earlier Intervention in Heart Failure
Hospitalizations for heart failure with preserved ejection fraction (HFpEF) remain common because lung congestion often worsens before symptoms prompt treatment changes. Missed early decompensation... Read moreSurgical Techniques
view channel
Fiber-Form Bone Graft Expands Intraoperative Options for Spinal Fusion
Spinal and orthopedic fusion procedures often require bone graft materials that handle predictably and support bone formation. Surgeons face added complexity in difficult anatomy and challenging fusion environments.... Read more
Ultrasound‑Aided Catheter Treatment Cuts Early Collapse in Pulmonary Embolism
Acute pulmonary embolism can cause rapid hemodynamic deterioration and early death in hospitalized and emergency patients. Systemic thrombolysis can dissolve clots but is limited by a high risk of major... Read morePatient Care
view channel
Wearable Sleep Data Predict Adherence to Pulmonary Rehabilitation
Chronic obstructive pulmonary disease (COPD) is a long-term lung disorder that makes breathing difficult and often disturbs sleep, reducing energy for daily activities. Limited engagement in pulmonary... Read more
Revolutionary Automatic IV-Line Flushing Device to Enhance Infusion Care
More than 80% of in-hospital patients receive intravenous (IV) therapy. Every dose of IV medicine delivered in a small volume (<250 mL) infusion bag should be followed by subsequent flushing to ensure... Read moreHealth IT
view channel
Voice-Driven AI System Enables Structured GI Procedure Documentation
Documentation during gastrointestinal (GI) procedures often competes with real-time clinical decision-making and imposes a significant cognitive burden on physicians. Manual data entry and post-procedure... Read more
EMR-Based Tool Predicts Graft Failure After Kidney Transplant
Kidney transplantation offers patients with end-stage kidney disease longer survival and better quality of life than dialysis, yet graft failure remains a major challenge. Although a successful transplant... Read more
Printable Molecule-Selective Nanoparticles Enable Mass Production of Wearable Biosensors
The future of medicine is likely to focus on the personalization of healthcare—understanding exactly what an individual requires and delivering the appropriate combination of nutrients, metabolites, and... Read moreBusiness
view channel







