Machine Learning Approach Enhances Liver Cancer Risk Stratification

By HospiMedica International staff writers
Posted on 31 Mar 2026

Hepatocellular carcinoma, the most common form of primary liver cancer, is often detected late despite targeted surveillance programs. Current screening guidelines emphasize patients with known cirrhosis, which leaves many at-risk individuals unidentified. Earlier risk stratification could expand surveillance and enable timely diagnosis. A new study describes a machine learning model that estimates liver cancer risk using routine demographic, electronic health record, and blood test data.

Researchers at RWTH Aachen University and the Technical University of Dresden developed and validated a random forest–based risk model across large, population-level cohorts, with results published in Cancer Discovery ion March 26, 2026. The approach analyzes commonly captured clinical variables to assign individualized risk for hepatocellular carcinoma (HCC). The aim is to broaden identification beyond the narrow, high-risk populations targeted by current guidelines.


Image: The approach analyzes commonly captured clinical variables to assign individualized risk for hepatocellular carcinoma (Photo credit: Adobe Stock)

The modeling framework used separate random forest classifiers for five data types and tested stepwise combinations ranked by clinical availability: demographics, electronic health records, blood tests, genomics, and metabolomics. Random forests aggregate predictions from many decision trees, yielding robust and interpretable outputs. The best-performing configuration, termed Model C, combined demographics, electronic health records, and routine blood tests.

Investigators trained on 80% of the UK Biobank and validated on the remaining 20%. The UK Biobank included over 500,000 participants and 538 HCC cases, 69% of which occurred without prior diagnoses of cirrhosis, viral hepatitis, or other chronic liver diseases. External validation used the All of Us registry in the United States with more than 400,000 participants and 445 HCC cases.

Model C achieved an area under the receiver operating characteristic (AUROC) of 0.88. Adding genomics or metabolomics did not meaningfully improve performance. Compared with existing tools—FIB‑4, APRI, NFS, and aMAP—the model identified more true HCC cases while generating fewer false positives. A simplified version using as few as 15 routinely collected features still outperformed these scores.

Generalizability was supported by robust performance within the non‑white subgroup of the more diverse All of Us cohort. Study limitations include its retrospective design and a low fraction of participants with viral hepatitis in the training and validation sets. The authors note that further validation is needed in additional populations.

"Our study highlights the potential of a simple, easily utilized machine learning model to improve risk stratification for HCC using only routinely collected clinical data," said Carolin Schneider, M.D., assistant professor at RWTH Aachen University in Germany. "If validated in additional populations, our model would enable primary care physicians to efficiently identify at-risk patients and refer them to liver cancer screening. This could enable earlier detection and improved outcomes for patients with this aggressive disease."

Related Links
RWTH Aachen University
Technical University of Dresden


Latest AI News