Machine learning predicts hepatocellular carcinoma risk from routine clinical data: a large population-based multicentric study
Menée à partir de données cliniques portant sur plus de 900 000 personnes et 983 cas de carcinome hépatocellulaire, cette étude met en évidence la performance d'un algorithme d'apprentissage automatique pour prédire le risque de développer la maladie
Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which risk stratification is crucial, yet remains challenging. Here, we develop an interpretable machine-learning framework for HCC risk stratification based on routinely collected clinical data. We utilize prospectively collected multimodal data from over 900,000 individuals and 983 cases of HCC across two population-scale cohorts: the “UK Biobank study” (development) and the “All of Us Research Program” (external testing). We assess individual and cumulative contributions of data modalities including demographics, lifestyle, health records, blood, genomics, and metabolomics. Our final, random-forest-based models significantly outperform all publicly available state-of-the-art risk-scores on both internal and external test sets. We demonstrate robustness across ethnic subgroups, provide comprehensive interpretability and release all code, model weights and a web-calculator for external validation and agentic integration. Our study presents PRE-Screen-HCC, a robust and interpretable machine-learning framework for HCC risk stratification and early detection.
Cancer Discovery , article en libre accès, 2026