国家人口调查中最大摄氧量预测的非运动机器学习模型

Journal of the American Medical Informatics Association : JAMIA Pub Date : 2022-10-04 DOI:10.1101/2022.09.30.22280471

Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu

{"title":"国家人口调查中最大摄氧量预测的非运动机器学习模型","authors":"Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu","doi":"10.1101/2022.09.30.22280471","DOIUrl":null,"url":null,"abstract":"ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys\",\"authors\":\"Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu\",\"doi\":\"10.1101/2022.09.30.22280471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2022.09.30.22280471\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2022.09.30.22280471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景:最大摄氧量(VO2 max)是衡量心肺适能(CRF)的一个指标，需要进行运动测试，因此在大规模人群研究中很少得到确定。非运动算法是一种有效的VO2 max估计方法，但现有模型在推广能力和预测能力方面存在局限性。本研究旨在使用机器学习(ML)方法和来自美国全国人口调查的数据来改进非运动算法。方法:我们使用1999-2004年国家健康和营养检查调查(NHANES)的数据，其中次最大运动测试产生了VO2max的估计。我们应用多种监督机器学习算法建立了两个模型:一个是使用临床实践中容易获得的变量的简约模型，另一个是扩展模型，该模型额外包括来自双能x射线吸收仪(DEXA)和标准实验室测试的更复杂的变量。我们使用Shapley加性解释(SHAP)来解释新模型并确定关键预测因子。为了进行比较，将现有的非练习算法原原本本地应用于测试集。结果:在最终研究人群中纳入的5668名NHANES参与者中，平均年龄为32.5岁，其中49.9%为女性。光梯度增强机(Light Gradient Boosting Machine, LightGBM)在多种监督机器学习算法中表现最佳。与现有可应用于NHANES的最佳非运动算法相比，精简的LightGBM模型(RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33])和扩展模型(RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09])的误差分别显著降低了15% (P<0.01)和12% (P<0.01)。结论:我们的非运动ML模型比现有的非运动算法更准确地预测了NHANES参与者的最大摄氧量。关键词:机器学习，GBDTs，心肺健康，最大摄氧量，NHANES

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys

ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Medical Informatics Association : JAMIA

自引率

0.00%

发文量