From classical machine learning to emerging foundation models: review on multimodal data integration for cancer research

IF 13.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Amgad Muneer, Muhammad Waqas, Maliazurina B. Saad, Eman Showkatian, Rukhmini Bandyopadhyay, Hui Xu, Wentao Li, Joe Y. Chang, Zhongxing Liao, Cara Haymaker, Luisa Solis Soto, Carol C. Wu, Natalie I. Vokes, Xiuning Le, Lauren A. Byers, Don L. Gibbons, John V. Heymach, Jianjun Zhang, Jia Wu
{"title":"From classical machine learning to emerging foundation models: review on multimodal data integration for cancer research","authors":"Amgad Muneer,&nbsp;Muhammad Waqas,&nbsp;Maliazurina B. Saad,&nbsp;Eman Showkatian,&nbsp;Rukhmini Bandyopadhyay,&nbsp;Hui Xu,&nbsp;Wentao Li,&nbsp;Joe Y. Chang,&nbsp;Zhongxing Liao,&nbsp;Cara Haymaker,&nbsp;Luisa Solis Soto,&nbsp;Carol C. Wu,&nbsp;Natalie I. Vokes,&nbsp;Xiuning Le,&nbsp;Lauren A. Byers,&nbsp;Don L. Gibbons,&nbsp;John V. Heymach,&nbsp;Jianjun Zhang,&nbsp;Jia Wu","doi":"10.1007/s10462-026-11522-9","DOIUrl":null,"url":null,"abstract":"<div><p>Cancer research is increasingly driven by the integration of diverse data modalities, spanning from genomics and proteomics to imaging and clinical factors. However, extracting actionable insights from these vast and heterogeneous datasets remains a key challenge. The rise of foundation models (FMs) large deep-learning models pretrained on extensive amounts of data serving as a backbone for a wide range of downstream tasks—offers new avenues for discovering biomarkers, improving diagnosis, and personalizing treatment. This paper presents a comprehensive review of widely adopted integration strategies of multimodal data to assist advance the computational approaches for data-driven discoveries in oncology. We examine emerging trends in machine learning (ML) and deep learning (DL), including methodological frameworks, validation protocols, and open-source resources targeting cancer subtype classification, biomarker discovery, treatment guidance, and outcome prediction. This study also comprehensively covers the shift from traditional ML to FMs for multimodal integration. We present a holistic view of recent FMs advancements and challenges faced during the integration of multi-omics with advanced imaging data. We identify state-of-the-art FMs, publicly available multi-modal repositories, and advanced tools and methods for data integration. We argue that current state-of-the-art integration methods provide the essential groundwork for developing the next generation of large-scale, pre-trained models poised to further revolutionize oncology. To the best of our knowledge, this is the first review to systematically map the transition from conventional ML to advanced FM for multimodal data integration in oncology, while also framing these developments as foundational for the forthcoming era of large-scale AI models in cancer research. The GitHub repo of this project available at https://github.com/WuLabMDA/Medical-Foundation-Models.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"59 4","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-026-11522-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-026-11522-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer research is increasingly driven by the integration of diverse data modalities, spanning from genomics and proteomics to imaging and clinical factors. However, extracting actionable insights from these vast and heterogeneous datasets remains a key challenge. The rise of foundation models (FMs) large deep-learning models pretrained on extensive amounts of data serving as a backbone for a wide range of downstream tasks—offers new avenues for discovering biomarkers, improving diagnosis, and personalizing treatment. This paper presents a comprehensive review of widely adopted integration strategies of multimodal data to assist advance the computational approaches for data-driven discoveries in oncology. We examine emerging trends in machine learning (ML) and deep learning (DL), including methodological frameworks, validation protocols, and open-source resources targeting cancer subtype classification, biomarker discovery, treatment guidance, and outcome prediction. This study also comprehensively covers the shift from traditional ML to FMs for multimodal integration. We present a holistic view of recent FMs advancements and challenges faced during the integration of multi-omics with advanced imaging data. We identify state-of-the-art FMs, publicly available multi-modal repositories, and advanced tools and methods for data integration. We argue that current state-of-the-art integration methods provide the essential groundwork for developing the next generation of large-scale, pre-trained models poised to further revolutionize oncology. To the best of our knowledge, this is the first review to systematically map the transition from conventional ML to advanced FM for multimodal data integration in oncology, while also framing these developments as foundational for the forthcoming era of large-scale AI models in cancer research. The GitHub repo of this project available at https://github.com/WuLabMDA/Medical-Foundation-Models.

从经典的机器学习到新兴的基础模型:癌症研究的多模态数据集成综述
从基因组学和蛋白质组学到成像和临床因素,癌症研究越来越多地受到多种数据模式整合的推动。然而,从这些庞大且异构的数据集中提取可操作的见解仍然是一个关键挑战。基础模型(FMs)的兴起——基于大量数据进行预训练的大型深度学习模型,作为一系列下游任务的支柱——为发现生物标志物、改善诊断和个性化治疗提供了新的途径。本文介绍了广泛采用的多模态数据集成策略的全面回顾,以帮助推进肿瘤数据驱动发现的计算方法。我们研究了机器学习(ML)和深度学习(DL)的新兴趋势,包括方法框架、验证协议和针对癌症亚型分类、生物标志物发现、治疗指导和结果预测的开源资源。本研究还全面涵盖了从传统ML到FMs的多模态集成的转变。我们提出了一个整体的观点,最近的FMs的进展和面临的挑战,在整合多组学与先进的成像数据。我们确定了最先进的FMs,公开可用的多模态存储库,以及用于数据集成的先进工具和方法。我们认为,目前最先进的集成方法为开发下一代大规模预训练模型提供了必要的基础,这些模型有望进一步革新肿瘤学。据我们所知,这是第一次系统地描绘从传统ML到先进FM的肿瘤学多模式数据集成的过渡,同时也将这些发展作为即将到来的癌症研究中大规模人工智能模型时代的基础。这个项目的GitHub版本可在https://github.com/WuLabMDA/Medical-Foundation-Models找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书