Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.

IF 2.5 Q2 RESPIRATORY SYSTEM
Oh Beom Kwon, Solji Han, Hwa Young Lee, Hye Seon Kang, Sung Kyoung Kim, Ju Sang Kim, Chan Kwon Park, Sang Haak Lee, Seung Joon Kim, Jin Woo Kim, Chang Dong Yeo
{"title":"Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.","authors":"Oh Beom Kwon,&nbsp;Solji Han,&nbsp;Hwa Young Lee,&nbsp;Hye Seon Kang,&nbsp;Sung Kyoung Kim,&nbsp;Ju Sang Kim,&nbsp;Chan Kwon Park,&nbsp;Sang Haak Lee,&nbsp;Seung Joon Kim,&nbsp;Jin Woo Kim,&nbsp;Chang Dong Yeo","doi":"10.4046/trd.2022.0048","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.</p><p><strong>Methods: </strong>We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.</p><p><strong>Results: </strong>A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.</p><p><strong>Conclusion: </strong>The LightGBM model showed the best performance in predicting postoperative lung function.</p>","PeriodicalId":23368,"journal":{"name":"Tuberculosis and Respiratory Diseases","volume":"86 3","pages":"203-215"},"PeriodicalIF":2.5000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/85/a1/trd-2022-0048.PMC10323210.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tuberculosis and Respiratory Diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4046/trd.2022.0048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.

Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.

Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.

Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

Abstract Image

Abstract Image

Abstract Image

使用机器学习模型预测肺癌患者术后肺功能。
背景:手术切除是早期肺癌的标准治疗方法。由于术后肺功能与死亡率相关,因此使用预测的术后肺功能来确定治疗方式。本研究的目的是评估线性回归和机器学习模型的预测性能。方法:从临床数据仓库中提取数据,建立三组:一组为线性回归模型;集合II,省略缺失数据的机器学习模型;集合III,输入缺失数据的机器学习模型。实现了最小绝对收缩和选择算子(LASSO)、Ridge回归、ElasticNet、随机森林、极限梯度增强(XGBoost)和光梯度增强机(LightGBM) 6种机器学习模型。以术后6个月1秒用力呼气量为观察指标。对机器学习模型的超参数调优进行了五重交叉验证。数据集以70:30的比例分为训练数据集和测试数据集。在集III中进行数据集分割后实现。采用R2和均方误差(MSE)对三组的预测性能进行评价。结果:第一组和第三组共纳入1487例患者,第二组共纳入896例患者。在set I中,R2值为0.27,在set II中,LightGBM是最佳模型,R2值最高为0.5,MSE最低为154.95。在第三组中,LightGBM是最佳模型,R2最高为0.56,MSE最低为174.07。结论:LightGBM模型预测术后肺功能的效果最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.30
自引率
0.00%
发文量
42
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信