Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study.

A. Boueiz, Zhonghui Xu, Yale Chang, A. Masoomi, A. Gregory, S. Lutz, D. Qiao, J. Crapo, J. Dy, E. Silverman, P. Castaldi
{"title":"Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study.","authors":"A. Boueiz, Zhonghui Xu, Yale Chang, A. Masoomi, A. Gregory, S. Lutz, D. Qiao, J. Crapo, J. Dy, E. Silverman, P. Castaldi","doi":"10.15326/jcopdf.2021.0275","DOIUrl":null,"url":null,"abstract":"Background\nThe heterogeneous nature of COPD complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features.\n\n\nMethods\nWe included 4,496 smokers with available data from their enrollment and 5-year follow-up visits in the Genetic Epidemiology of COPD (COPDGene) study. We constructed linear regression (LR) and supervised random forest (RF) models to predict 5-year progression in FEV1 from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit.\n\n\nResults\nPredicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For RF, R-squared was 0.15 and the area under the ROC curves for the prediction of subjects in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). RF provided slightly better performance than LR. The accuracy was best for GOLD1-2 subjects and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD.\n\n\nConclusion\nRF along with deep phenotyping predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.","PeriodicalId":10249,"journal":{"name":"Chronic obstructive pulmonary diseases","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chronic obstructive pulmonary diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15326/jcopdf.2021.0275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Background The heterogeneous nature of COPD complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features. Methods We included 4,496 smokers with available data from their enrollment and 5-year follow-up visits in the Genetic Epidemiology of COPD (COPDGene) study. We constructed linear regression (LR) and supervised random forest (RF) models to predict 5-year progression in FEV1 from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit. Results Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For RF, R-squared was 0.15 and the area under the ROC curves for the prediction of subjects in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). RF provided slightly better performance than LR. The accuracy was best for GOLD1-2 subjects and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD. Conclusion RF along with deep phenotyping predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.
COPDGene®研究中1秒内用力呼气量进展的机器学习预测。
背景:慢性阻塞性肺病的异质性使疾病进展预测因素的识别复杂化。我们的目标是通过使用机器学习和结合丰富的表型特征数据集来改善COPD疾病进展的预测。方法:在COPD遗传流行病学(COPDGene)研究中,我们纳入了4496名吸烟者,并对他们进行了5年的随访。我们构建了线性回归(LR)和监督随机森林(RF)模型,根据46个基线特征预测FEV1的5年进展。通过交叉验证,我们将参与者随机分为训练样本和测试样本。我们还在COPDGene 10年随访中验证了结果。结果预测FEV1随时间的变化比简单预测未来的绝对FEV1水平更具挑战性。对于RF, r平方为0.15,预测观察进展的前四分位数受试者的ROC曲线下面积分别为0.71(检验)、0.10和0.70(验证)。RF的性能略好于LR。GOLD1-2患者的准确性最好,但在疾病晚期更难实现准确预测。预测变量的相对重要性不同,GOLD的预测也不同。结论rf和深度表型预测FEV1进展具有合理的准确性。未来的模型还有很大的改进空间。该预测模型有助于识别疾病快速进展风险增加的吸烟者。这些发现可能有助于选择有针对性的临床试验的患者群体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信