机器学习算法在死亡率流行病学研究中的应用。

IF 3.3 3区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
George O. Agogo , Henry Mwambi
{"title":"机器学习算法在死亡率流行病学研究中的应用。","authors":"George O. Agogo ,&nbsp;Henry Mwambi","doi":"10.1016/j.annepidem.2024.12.015","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Epidemiologic studies are important in assessing risk factors of mortality. Machine learning (ML) is efficient in analyzing multidimensional data to unravel dependencies between risk factors and health outcomes.</div></div><div><h3>Methods</h3><div>Using a representative sample from the National Health and Nutrition Examination Survey data collected from 2009 to 2016 linked to the National Death Index public-use mortality data through December 31, 2019, we applied logistic, random forests, k-Nearest Neighbors, multivariate adaptive regression splines, support vector machines, extreme gradient boosting, and super learner ML algorithms to study risk factors of all-cause mortality. We evaluated the algorithms using area under the receiver operating curve (AUC-ROC), sensitivity, negative predictive value (NPV) among other metrics and interpreted the results using SHapley Additive exPlanation.</div></div><div><h3>Results</h3><div>The AUC-ROC ranged from 0.80 ─ 0.87. The super learner had the highest AUC-ROC of 0.87 (95 % CI, 0.86 ─ 0.88), sensitivity of 0.86 (95 % CI, 0.84 ─ 0.88) and NPV of 0.98 (95 % CI, 0.98 ─ 0.99). Key risk factors of mortality included advanced age, larger waist circumference, male and systolic blood pressure. Being married, high annual household income, and high education level were linked with low risk of mortality.</div></div><div><h3>Conclusions</h3><div>Machine learning can be used to identify risk factors of mortality, which is critical for individualized targeted interventions in epidemiologic studies.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"102 ","pages":"Pages 36-47"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of machine learning algorithms in an epidemiologic study of mortality\",\"authors\":\"George O. Agogo ,&nbsp;Henry Mwambi\",\"doi\":\"10.1016/j.annepidem.2024.12.015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>Epidemiologic studies are important in assessing risk factors of mortality. Machine learning (ML) is efficient in analyzing multidimensional data to unravel dependencies between risk factors and health outcomes.</div></div><div><h3>Methods</h3><div>Using a representative sample from the National Health and Nutrition Examination Survey data collected from 2009 to 2016 linked to the National Death Index public-use mortality data through December 31, 2019, we applied logistic, random forests, k-Nearest Neighbors, multivariate adaptive regression splines, support vector machines, extreme gradient boosting, and super learner ML algorithms to study risk factors of all-cause mortality. We evaluated the algorithms using area under the receiver operating curve (AUC-ROC), sensitivity, negative predictive value (NPV) among other metrics and interpreted the results using SHapley Additive exPlanation.</div></div><div><h3>Results</h3><div>The AUC-ROC ranged from 0.80 ─ 0.87. The super learner had the highest AUC-ROC of 0.87 (95 % CI, 0.86 ─ 0.88), sensitivity of 0.86 (95 % CI, 0.84 ─ 0.88) and NPV of 0.98 (95 % CI, 0.98 ─ 0.99). Key risk factors of mortality included advanced age, larger waist circumference, male and systolic blood pressure. Being married, high annual household income, and high education level were linked with low risk of mortality.</div></div><div><h3>Conclusions</h3><div>Machine learning can be used to identify risk factors of mortality, which is critical for individualized targeted interventions in epidemiologic studies.</div></div>\",\"PeriodicalId\":50767,\"journal\":{\"name\":\"Annals of Epidemiology\",\"volume\":\"102 \",\"pages\":\"Pages 36-47\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047279724002874\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047279724002874","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

目的:流行病学研究对评估死亡危险因素具有重要意义。机器学习(ML)在分析多维数据以揭示风险因素与健康结果之间的依赖关系方面非常有效。方法:利用2009年至2016年收集的国家健康与营养检查调查数据中的代表性样本,与截至2019年12月31日的国家死亡指数公共使用死亡率数据相关联,应用logistic、随机森林、k-近邻、多元自适应回归样条、支持向量机、极端梯度增强和超级学习者ML算法研究全因死亡率的危险因素。我们使用受试者工作曲线下面积(AUC-ROC)、灵敏度、负预测值(NPV)等指标对算法进行评估,并使用SHapley加性解释对结果进行解释。结果:AUC-ROC范围为0.80 ~ 0.87。超级学习者的AUC-ROC最高,为0.87 (95% CI, 0.86─0.88),灵敏度最高,为0.86 (95% CI, 0.84─0.88),净现值最高,为0.98 (95% CI, 0.98─0.99)。死亡的主要危险因素包括高龄、大腰围、男性和收缩压。已婚、高家庭年收入和高教育水平与低死亡率有关。结论:机器学习可用于识别死亡的危险因素,这对于流行病学研究中个性化的针对性干预至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of machine learning algorithms in an epidemiologic study of mortality

Purpose

Epidemiologic studies are important in assessing risk factors of mortality. Machine learning (ML) is efficient in analyzing multidimensional data to unravel dependencies between risk factors and health outcomes.

Methods

Using a representative sample from the National Health and Nutrition Examination Survey data collected from 2009 to 2016 linked to the National Death Index public-use mortality data through December 31, 2019, we applied logistic, random forests, k-Nearest Neighbors, multivariate adaptive regression splines, support vector machines, extreme gradient boosting, and super learner ML algorithms to study risk factors of all-cause mortality. We evaluated the algorithms using area under the receiver operating curve (AUC-ROC), sensitivity, negative predictive value (NPV) among other metrics and interpreted the results using SHapley Additive exPlanation.

Results

The AUC-ROC ranged from 0.80 ─ 0.87. The super learner had the highest AUC-ROC of 0.87 (95 % CI, 0.86 ─ 0.88), sensitivity of 0.86 (95 % CI, 0.84 ─ 0.88) and NPV of 0.98 (95 % CI, 0.98 ─ 0.99). Key risk factors of mortality included advanced age, larger waist circumference, male and systolic blood pressure. Being married, high annual household income, and high education level were linked with low risk of mortality.

Conclusions

Machine learning can be used to identify risk factors of mortality, which is critical for individualized targeted interventions in epidemiologic studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of Epidemiology
Annals of Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
1.80%
发文量
207
审稿时长
59 days
期刊介绍: The journal emphasizes the application of epidemiologic methods to issues that affect the distribution and determinants of human illness in diverse contexts. Its primary focus is on chronic and acute conditions of diverse etiologies and of major importance to clinical medicine, public health, and health care delivery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信