Asier Rabasco Meneghetti, Marta Ligero Hernández, Jens-Peter Kühn, Steffen Löck, Zunamys Itzell Carrero, Raquel Perez-Lopez, Keno K Bressem, Titus J Brinker, Alexander T Pearson, Daniel Truhn, Sven Nebelung, Jakob Nikolas Kather
{"title":"End-to-end prediction of clinical outcomes in head and neck squamous cell carcinoma with foundation model-based multiple instance learning.","authors":"Asier Rabasco Meneghetti, Marta Ligero Hernández, Jens-Peter Kühn, Steffen Löck, Zunamys Itzell Carrero, Raquel Perez-Lopez, Keno K Bressem, Titus J Brinker, Alexander T Pearson, Daniel Truhn, Sven Nebelung, Jakob Nikolas Kather","doi":"10.1186/s44398-025-00003-8","DOIUrl":"10.1186/s44398-025-00003-8","url":null,"abstract":"<p><strong>Background: </strong>Foundation models have shown promise in medical AI by learning flexible features from large datasets, offering new opportunities for improving endpoint prediction. However, usage of foundation models for endpoint prediction using routine imaging in head and neck squamous cell carcinoma patients remains unexplored. Within this study, we evaluated the potential of foundation-model based multiple instance learning for prediction of 2-year overall survival, locoregional control and freedom from distant metastasis across three external head and neck squamous cell carcinoma patient cohorts using 2D, multiview and 3D approaches while comparing prediction and stratification performance with handcrafted radiomics and clinical baselines.</p><p><strong>Results: </strong>2D multiple-instance learning models achieved 2-year test area under the receiver-operator curve (AUROC) range of 0.75-0.84 for 2-year overall survival, 0.66-0.75 for 2-year locoregional control and 0.71-0.78 for 2-year freedom from distant metastasis across three different external cohorts, outperforming multiview and 3D multiple instance learning models (AUROC range: 0.50-0.77, p <math><mo>≥</mo></math> 0.15) and showing comparable or superior performance to handcrafted radiomics (AUROC range: 0.64-0.74, p <math><mo>≥</mo></math> 0.012). Significant stratification was observed from the 2D MIL models (hazard ratios: 2.14-4.77, p <math><mo>≤</mo></math> 0.039). 2D MIL models were also shown to learn endpoint-specific correlation patterns such as N-stage for 2-year freedom from distant metastasis prognosis. Multimodal enhancement of 2-year OS/FFDM (AUROC range: 0.82-0.87, p <math><mo>≤</mo></math> 0.018) for patients without human papilloma virus positive tumors.</p><p><strong>Conclusions: </strong>FM-based 2D MIL demonstrates promise in HNSCC risk prediction as well as stratification of clinical outcomes. The models match or outperform radiomics baselines, learning clinically-related patterns and showing enhancement of clinical baselines in non-human papilloma virus positive patients.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1186/s44398-025-00003-8.</p>","PeriodicalId":520917,"journal":{"name":"BMC artificial intelligence..","volume":"1 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}