AI Tool Accurately Sorts Cancer Patients by Likely Outcomes
Posted on 15 May 2025
Pharmaceutical companies and healthcare providers often face the challenge of determining which patients will respond most effectively to a given drug. A new artificial intelligence (AI)-based method has been developed to address this issue, offering a more accurate way to group cancer patients based on similar characteristics before treatment and similar outcomes after treatment. This method, detailed in a study published in Nature Communications, demonstrated that its ability to predict treatment outcomes from health record data surpassed the performance of any previous method. The approach holds the potential to improve patient selection in clinical trials and enable more personalized treatment choices for individual patients.
Machine learning has been a promising tool for detecting subtle yet meaningful patterns in large datasets, including those in the medical field. However, while machine learning systems can categorize patients into groups based on shared health data, these groupings do not always correlate closely with the patients’ subsequent responses to treatment. To improve this, researchers at Weill Cornell Medicine (New York, NY, USA) in collaboration with Regeneron Pharmaceuticals (Tarrytown, NY, USA), sought to develop a platform capable of sorting patients with the same disease receiving identical treatments into groups that share both baseline characteristics and treatment outcomes. The platform was trained on deidentified health records from 3,225 lung cancer patients in a commercial database, with each record containing 104 variables, including blood test results, medical history, prescriptions, and tumor stage.
The researchers tested this method using a real-world database of patients with advanced small cell lung cancer who were treated with immune checkpoint inhibitors. In this initial test, the platform grouped patients into three distinct categories. The group with the longest mean overall survival time from the beginning of treatment consisted mostly of women (55.5%) and had relatively low rates of other conditions such as diabetes and heart failure. In contrast, the group with the shortest survival time had less than half the mean survival time of the first group, was predominantly male (66.2%), and showed higher rates of tumor metastases along with abnormal blood tests indicating issues such as liver, kidney, and inflammatory problems.
Using a metric known as the concordance index, the researchers demonstrated that this new method outperformed standard statistical and machine learning techniques in predicting patient survival times. When the machine learning system was applied to a new dataset of 1,441 patients with non-small-cell lung cancer, it produced similar groupings of patients in terms of baseline characteristics and survival outcomes. Moving forward, the team plans to continue developing and testing this method for patient stratification in clinical trials for new drugs, as well as for selecting the most appropriate treatments for individual patients. Their platform’s ability to reliably group patients according to outcomes also suggests it could provide valuable insights into disease biology.
“We’re hopeful that this approach ultimately will be useful for testing and targeting treatments across a wide range of diseases,” said senior author Dr. Fei Wang. “We’ll probably need more than electronic health record data for this, but we do want to understand the biological mechanisms that explain these distinct patient subgroups.”
Related Links:
Weill Cornell Medicine
Regeneron Pharmaceuticals