国家人口调查中最大摄氧量预测的非运动机器学习模型

Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu
{"title":"国家人口调查中最大摄氧量预测的非运动机器学习模型","authors":"Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu","doi":"10.1101/2022.09.30.22280471","DOIUrl":null,"url":null,"abstract":"ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys\",\"authors\":\"Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu\",\"doi\":\"10.1101/2022.09.30.22280471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2022.09.30.22280471\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2022.09.30.22280471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:最大摄氧量(VO2 max)是衡量心肺适能(CRF)的一个指标,需要进行运动测试,因此在大规模人群研究中很少得到确定。非运动算法是一种有效的VO2 max估计方法,但现有模型在推广能力和预测能力方面存在局限性。本研究旨在使用机器学习(ML)方法和来自美国全国人口调查的数据来改进非运动算法。方法:我们使用1999-2004年国家健康和营养检查调查(NHANES)的数据,其中次最大运动测试产生了VO2max的估计。我们应用多种监督机器学习算法建立了两个模型:一个是使用临床实践中容易获得的变量的简约模型,另一个是扩展模型,该模型额外包括来自双能x射线吸收仪(DEXA)和标准实验室测试的更复杂的变量。我们使用Shapley加性解释(SHAP)来解释新模型并确定关键预测因子。为了进行比较,将现有的非练习算法原原本本地应用于测试集。结果:在最终研究人群中纳入的5668名NHANES参与者中,平均年龄为32.5岁,其中49.9%为女性。光梯度增强机(Light Gradient Boosting Machine, LightGBM)在多种监督机器学习算法中表现最佳。与现有可应用于NHANES的最佳非运动算法相比,精简的LightGBM模型(RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33])和扩展模型(RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09])的误差分别显著降低了15% (P<0.01)和12% (P<0.01)。结论:我们的非运动ML模型比现有的非运动算法更准确地预测了NHANES参与者的最大摄氧量。关键词:机器学习,GBDTs,心肺健康,最大摄氧量,NHANES
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys
ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信