AI Model Accurately Predicts Progression of Autoimmune Disease
Posted on 09 Jan 2025
Autoimmune diseases, where the immune system mistakenly attacks the body’s healthy cells and tissues, often have a preclinical phase characterized by mild symptoms or the presence of certain antibodies in the blood before a formal diagnosis. For example, in individuals with rheumatoid arthritis, antibodies can be found in the blood up to five years before any symptoms appear. However, in some cases, these symptoms may resolve on their own without progressing to full-blown disease. Identifying who is likely to progress along the disease path is crucial for early diagnosis, intervention, improved treatment, and better disease management. The earlier a disease is detected and treated, the better the outcome, as damage caused by autoimmune diseases can be irreversible once they advance. One of the main challenges in predicting disease progression is sample size. The number of people with a specific autoimmune disease is often small, making it harder to build an accurate model and algorithm due to limited data.
A team of researchers from Penn State College of Medicine (Hershey, PA, USA) has now developed a novel approach to predict the progression of autoimmune diseases in those with preclinical symptoms. Using artificial intelligence (AI), the team analyzed data from electronic health records and large genetic studies of people with autoimmune diseases to create a risk prediction score. This new method proved to be 25% to 1,000% more accurate than existing models in determining which individuals would progress to advanced disease. The new approach, called Genetic Progression Score (GPS), can predict the transition from preclinical to disease stages. GPS uses the concept of transfer learning, a machine learning technique where a model is trained on one dataset and then adapted for a related but different dataset. This method helps researchers extract more information from smaller data samples. For instance, in medical imaging, AI models can initially be trained to distinguish between images of cats and dogs, which are easier to label, and later refined to identify malignant versus benign tumors.
To build the training dataset, medical experts typically label images one by one, a time-consuming process that is limited by the number of images available. Transfer learning, however, uses larger, easier-to-label datasets, like pictures of cats and dogs, to create a much bigger collection. The model learns to differentiate between the animals and is then adjusted to identify malignant and benign tumors. GPS is trained on data from large case-control genome-wide association studies (GWAS), which are commonly used in human genetics research to find genetic differences between people with a specific autoimmune disease and those without. This method also integrates data from electronic health record-based biobanks, which provide valuable patient information, such as genetic variants, lab results, and clinical diagnoses. This combined data helps identify individuals in the preclinical stage of disease and track the progression from preclinical to disease states. By merging these two data sources, the GPS model is refined to include factors most relevant to the actual disease development. Those with high GPS scores are at greater risk of progressing from preclinical symptoms to full-blown disease.
The team applied their model using real-world data from the Vanderbilt University biobank to predict the progression of rheumatoid arthritis and lupus and validated the GPS risk scores with data from the All of Us biobank, an initiative from the National Institutes of Health. The results, published in Nature Communications, showed that GPS outperformed 20 other models that relied solely on biobank or case-control data, as well as those that combined both using other methods. Accurate prediction of disease progression with GPS could lead to early interventions, targeted monitoring, and personalized treatment decisions, ultimately improving patient outcomes. It could also enhance the design and recruitment for clinical trials by identifying those who are most likely to benefit from new therapies. While this study focused on autoimmune diseases, the researchers believe that this approach could be applied to studying other types of diseases as well.
“By targeting a more relevant population — people with family history or who are experiencing early symptoms — we can use machine learning to identify patients with the highest risk for disease and then identify suitable therapeutics that may be able to slow down the progression of the disease. It’s a lot more meaningful and actionable information,” said Dajiang Liu, distinguished professor, vice chair for research and director of artificial intelligence and biomedical informatics at the Penn State College of Medicine and co-lead author of the study.