将机器学习融入疾病风险预测建模的统计方法:系统综述。

Health data science Pub Date : 2024-07-23 eCollection Date: 2024-01-01 DOI:10.34133/hds.0165
Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun
{"title":"将机器学习融入疾病风险预测建模的统计方法:系统综述。","authors":"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun","doi":"10.34133/hds.0165","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0165"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.\",\"authors\":\"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun\",\"doi\":\"10.34133/hds.0165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>\",\"PeriodicalId\":73207,\"journal\":{\"name\":\"Health data science\",\"volume\":\"4 \",\"pages\":\"0165\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34133/hds.0165\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:疾病预测模型通常使用统计方法或机器学习,这两种方法都有各自相应的应用场景,单独使用时会增加出错的风险。将机器学习融入统计方法可能会产生稳健的预测模型。本系统综述旨在全面评估当前全球疾病预测整合模型的发展情况。研究方法检索PubMed、EMbase、Web of Science、CNKI、VIP、万方和SinoMed数据库,收集从数据库建立到2023年5月1日有关将机器学习融入统计方法的预测模型的研究。提取的信息包括研究的基本特征、整合方法、应用场景、建模细节和模型性能。结果:共纳入了 20 项符合条件的英文研究和 1 项中文研究。其中 5 项研究侧重于诊断模型,16 项研究侧重于预测疾病的发生或预后。分类模型的整合策略包括多数投票、加权投票、堆叠和模型选择(当统计方法和机器学习出现分歧时)。回归模型采用的策略包括简单统计、加权统计和堆叠。在大多数研究中,整合模型的 AUROC 超过 0.75,表现优于统计方法和机器学习。堆叠用于预测因子大于 100 个的情况,需要相对较多的训练数据。结论在预测模型中将机器学习与统计方法相结合的研究仍然有限,但一些研究显示出整合模型优于单一模型的巨大潜力。本研究为在不同情况下选择集成方法提供了启示。未来的研究可以重点关注整合策略的改进和验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.

Background: Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. Methods: PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. Results: A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. Conclusion: Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信