Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system

IF 2.6 Q2 HEALTH POLICY & SERVICES
Saori Tou, Koutarou Matsumoto, Asato Hashinokuchi, Fumihiko Kinoshita, Hideki Nakaguma, Yukio Kozuma, Rui Sugeta, Yasunobu Nohara, Takanori Yamashita, Yoshifumi Wakata, Tomoyoshi Takenaka, Kazunori Iwatani, Hidehisa Soejima, Tomoharu Yoshizumi, Naoki Nakashima, Masahiro Kamouchi
{"title":"Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system","authors":"Saori Tou,&nbsp;Koutarou Matsumoto,&nbsp;Asato Hashinokuchi,&nbsp;Fumihiko Kinoshita,&nbsp;Hideki Nakaguma,&nbsp;Yukio Kozuma,&nbsp;Rui Sugeta,&nbsp;Yasunobu Nohara,&nbsp;Takanori Yamashita,&nbsp;Yoshifumi Wakata,&nbsp;Tomoyoshi Takenaka,&nbsp;Kazunori Iwatani,&nbsp;Hidehisa Soejima,&nbsp;Tomoharu Yoshizumi,&nbsp;Naoki Nakashima,&nbsp;Masahiro Kamouchi","doi":"10.1002/lrh2.10469","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>The reliability of data-driven predictions in real-world scenarios remains uncertain. This study aimed to develop and validate a machine-learning-based model for predicting clinical outcomes using real-world data from an electronic clinical pathway (ePath) system.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>All available data were collected from patients with lung cancer who underwent video-assisted thoracoscopic surgery at two independent hospitals utilizing the ePath system. The primary clinical outcome of interest was prolonged air leak (PAL), defined as drainage removal more than 2 days post-surgery. Data-driven prediction models were developed in a cohort of 314 patients from a university hospital applying sparse linear regression models (least absolute shrinkage and selection operator, ridge, and elastic net) and decision tree ensemble models (random forest and extreme gradient boosting). Model performance was then validated in a cohort of 154 patients from a tertiary hospital using the area under the receiver operating characteristic curve (AUROC) and calibration plots.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>To mitigate bias, variables with missing data related to PAL or those with high rates of missing data were excluded from the dataset. Fivefold cross-validation indicated improved AUROCs when utilizing key variables, even post-imputation of missing data. Dichotomizing continuous variables enhanced performance, particularly when fewer variables were employed in the decision tree ensemble models. Consequently, regression models incorporating seven key variables in complete case analysis demonstrated superior discriminatory ability for both internal (AUROCs: 0.77–0.84) and external cohorts (AUROCs: 0.75–0.84). These models exhibited satisfactory calibration in both cohorts.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The data-driven prediction model implementing the ePath system exhibited adequate performance in predicting PAL post-video-assisted thoracoscopic surgery, optimizing variables and considering population characteristics in a real-world setting.</p>\n </section>\n </div>","PeriodicalId":43916,"journal":{"name":"Learning Health Systems","volume":"9 2","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/lrh2.10469","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning Health Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lrh2.10469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH POLICY & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

The reliability of data-driven predictions in real-world scenarios remains uncertain. This study aimed to develop and validate a machine-learning-based model for predicting clinical outcomes using real-world data from an electronic clinical pathway (ePath) system.

Methods

All available data were collected from patients with lung cancer who underwent video-assisted thoracoscopic surgery at two independent hospitals utilizing the ePath system. The primary clinical outcome of interest was prolonged air leak (PAL), defined as drainage removal more than 2 days post-surgery. Data-driven prediction models were developed in a cohort of 314 patients from a university hospital applying sparse linear regression models (least absolute shrinkage and selection operator, ridge, and elastic net) and decision tree ensemble models (random forest and extreme gradient boosting). Model performance was then validated in a cohort of 154 patients from a tertiary hospital using the area under the receiver operating characteristic curve (AUROC) and calibration plots.

Results

To mitigate bias, variables with missing data related to PAL or those with high rates of missing data were excluded from the dataset. Fivefold cross-validation indicated improved AUROCs when utilizing key variables, even post-imputation of missing data. Dichotomizing continuous variables enhanced performance, particularly when fewer variables were employed in the decision tree ensemble models. Consequently, regression models incorporating seven key variables in complete case analysis demonstrated superior discriminatory ability for both internal (AUROCs: 0.77–0.84) and external cohorts (AUROCs: 0.75–0.84). These models exhibited satisfactory calibration in both cohorts.

Conclusions

The data-driven prediction model implementing the ePath system exhibited adequate performance in predicting PAL post-video-assisted thoracoscopic surgery, optimizing variables and considering population characteristics in a real-world setting.

Abstract Image

肺癌视频胸腔镜手术后长时间空气泄漏的数据驱动预测:通过ePath系统使用真实世界数据开发和验证基于机器学习的模型
在现实场景中,数据驱动预测的可靠性仍然不确定。本研究旨在开发和验证一种基于机器学习的模型,该模型使用来自电子临床路径(ePath)系统的真实数据来预测临床结果。方法收集两家独立医院使用ePath系统行视频胸腔镜手术的肺癌患者的资料。主要的临床结果是延长的空气泄漏(PAL),定义为术后2天以上的引流。应用稀疏线性回归模型(最小绝对收缩和选择算子、脊线和弹性网)和决策树集成模型(随机森林和极端梯度增强),在一所大学医院的314名患者队列中开发了数据驱动的预测模型。然后在来自一家三级医院的154名患者的队列中,使用受试者工作特征曲线(AUROC)下的面积和校准图验证了模型的性能。结果:为了减轻偏倚,与PAL相关的数据缺失变量或数据缺失率高的变量被排除在数据集中。五倍交叉验证表明,当利用关键变量时,即使是缺失数据的后代入,auroc也得到了改善。连续变量的二分类提高了性能,特别是当决策树集成模型中使用较少的变量时。因此,在完整的病例分析中,纳入七个关键变量的回归模型对内部队列(AUROCs: 0.77-0.84)和外部队列(AUROCs: 0.75-0.84)都显示出卓越的区分能力。这些模型在两个队列中都显示出令人满意的校准。结论采用ePath系统的数据驱动预测模型在预测视频胸腔镜手术后PAL方面表现良好,优化了变量并考虑了现实环境中的人群特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Learning Health Systems
Learning Health Systems HEALTH POLICY & SERVICES-
CiteScore
5.60
自引率
22.60%
发文量
55
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信