Siqi Li, Xin Li, Kunyu Yu, Qiming Wu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu
{"title":"Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.","authors":"Siqi Li, Xin Li, Kunyu Yu, Qiming Wu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu","doi":"10.34133/hds.0321","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Clinical and biomedical research in low-resource settings often faces substantial challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pretrained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. This review aims to analyze TL applications, highlight overlooked techniques, and suggest improvements for future healthcare research. <b>Methods:</b> Following the PRISMA-ScR guidelines, we conducted a search for published articles that employed TL with structured clinical or biomedical data by searching the SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL databases. <b>Results:</b> We screened 5,080 papers, with 86 meeting the inclusion criteria. Among these, only 2% (2 of 86) utilized external studies, and 5% (4 of 86) addressed scenarios involving multi-site collaborations with privacy constraints. <b>Conclusions:</b> To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"5 ","pages":"0321"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408193/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Clinical and biomedical research in low-resource settings often faces substantial challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pretrained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. This review aims to analyze TL applications, highlight overlooked techniques, and suggest improvements for future healthcare research. Methods: Following the PRISMA-ScR guidelines, we conducted a search for published articles that employed TL with structured clinical or biomedical data by searching the SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL databases. Results: We screened 5,080 papers, with 86 meeting the inclusion criteria. Among these, only 2% (2 of 86) utilized external studies, and 5% (4 of 86) addressed scenarios involving multi-site collaborations with privacy constraints. Conclusions: To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines.
背景:低资源环境下的临床和生物医学研究往往面临着巨大的挑战,因为需要有足够样本量的高质量数据来构建有效的模型。这些限制阻碍了稳健的模型训练,并促使研究人员寻求利用相关研究中的现有知识来支持新的研究工作的方法。迁移学习(TL)是一种机器学习技术,作为一种强大的解决方案,它利用来自预训练模型的知识来增强新模型的性能,为各种医疗保健领域提供了希望。尽管其概念起源于20世纪90年代,但在医学研究中的应用仍然有限,特别是在图像分析之外。本文旨在分析TL的应用,强调被忽视的技术,并为未来的医疗保健研究提出改进建议。方法:根据PRISMA-ScR指南,我们通过检索SCOPUS、MEDLINE、Web of Science、Embase和CINAHL数据库,对已发表的使用TL的结构化临床或生物医学数据的文章进行检索。结果:共筛选论文5080篇,其中86篇符合纳入标准。其中,只有2%(86人中2人)利用了外部研究,5%(86人中4人)处理了涉及隐私限制的多站点协作的场景。结论:为了在解决医疗保健研究中的区域差异、不平等和隐私限制的同时,利用结构化医疗数据实现可操作的TL,我们提倡仔细识别合适的源数据和模型,选择合适的TL框架,并使用适当的基线验证TL模型。