Development, deployment, and feature interpretability of a three-class prediction model for pulmonary diseases.

IF 4.5 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Zhenyu Cao, Gang Xu, Yuan Gao, Jianying Xu, Fengjuan Tian, Hengfeng Shi, Dengfa Yang, Zongyu Xie, Jian Wang
{"title":"Development, deployment, and feature interpretability of a three-class prediction model for pulmonary diseases.","authors":"Zhenyu Cao, Gang Xu, Yuan Gao, Jianying Xu, Fengjuan Tian, Hengfeng Shi, Dengfa Yang, Zongyu Xie, Jian Wang","doi":"10.1186/s13244-025-02020-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To develop a high-performance machine learning model for predicting and interpreting features of pulmonary diseases.</p><p><strong>Patients and methods: </strong>This retrospective study analyzed clinical and imaging data from patients with non-small cell lung cancer (NSCLC), granulomatous inflammation, and benign tumors, collected across multiple centers from January 2015 to October 2023. Data from two hospitals in Anhui Province were split into a development set (n = 1696) and a test set (n = 424) in an 8:2 ratio, with an external validation set (n = 909) from Zhejiang Province. Features with p < 0.05 from univariate analyses were selected using the Boruta algorithm for input into Random Forest (RF) and XGBoost models. Model efficacy was assessed using receiver operating characteristic (ROC) analysis.</p><p><strong>Results: </strong>A total of 3030 patients were included: 2269 with NSCLC, 529 with granulomatous inflammation, and 232 with benign tumors. The Obuchowski indices for RF and XGBoost in the test set were 0.7193 (95% CI: 0.6567-0.7812) and 0.8282 (95% CI: 0.7883-0.8650), respectively. In the external validation set, indices were 0.7932 (95% CI: 0.7572-0.8250) for RF and 0.8074 (95% CI: 0.7740-0.8387) for XGBoost. XGBoost achieved better accuracy in both the test (0.81) and external validation (0.79) sets. Calibration Curve and Decision Curve Analysis (DCA) showed XGBoost offered higher net clinical benefit.</p><p><strong>Conclusion: </strong>The XGBoost model outperforms RF in the three-class classification of lung diseases.</p><p><strong>Critical relevance statement: </strong>XGBoost surpasses Random Forest in accurately classifying NSCLC, granulomatous inflammation, and benign tumors, offering superior clinical utility via multicenter data.</p><p><strong>Key points: </strong>Lung cancer classification model has broad clinical applicability. XGBoost outperforms random forests using CT imaging data. XGBoost model can be deployed on a website for clinicians.</p>","PeriodicalId":13639,"journal":{"name":"Insights into Imaging","volume":"16 1","pages":"133"},"PeriodicalIF":4.5000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202249/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insights into Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13244-025-02020-7","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To develop a high-performance machine learning model for predicting and interpreting features of pulmonary diseases.

Patients and methods: This retrospective study analyzed clinical and imaging data from patients with non-small cell lung cancer (NSCLC), granulomatous inflammation, and benign tumors, collected across multiple centers from January 2015 to October 2023. Data from two hospitals in Anhui Province were split into a development set (n = 1696) and a test set (n = 424) in an 8:2 ratio, with an external validation set (n = 909) from Zhejiang Province. Features with p < 0.05 from univariate analyses were selected using the Boruta algorithm for input into Random Forest (RF) and XGBoost models. Model efficacy was assessed using receiver operating characteristic (ROC) analysis.

Results: A total of 3030 patients were included: 2269 with NSCLC, 529 with granulomatous inflammation, and 232 with benign tumors. The Obuchowski indices for RF and XGBoost in the test set were 0.7193 (95% CI: 0.6567-0.7812) and 0.8282 (95% CI: 0.7883-0.8650), respectively. In the external validation set, indices were 0.7932 (95% CI: 0.7572-0.8250) for RF and 0.8074 (95% CI: 0.7740-0.8387) for XGBoost. XGBoost achieved better accuracy in both the test (0.81) and external validation (0.79) sets. Calibration Curve and Decision Curve Analysis (DCA) showed XGBoost offered higher net clinical benefit.

Conclusion: The XGBoost model outperforms RF in the three-class classification of lung diseases.

Critical relevance statement: XGBoost surpasses Random Forest in accurately classifying NSCLC, granulomatous inflammation, and benign tumors, offering superior clinical utility via multicenter data.

Key points: Lung cancer classification model has broad clinical applicability. XGBoost outperforms random forests using CT imaging data. XGBoost model can be deployed on a website for clinicians.

肺部疾病三级预测模型的开发、部署及特征可解释性
目的:建立用于预测和解释肺部疾病特征的高性能机器学习模型。患者和方法:本回顾性研究分析了2015年1月至2023年10月在多个中心收集的非小细胞肺癌(NSCLC)、肉芽肿性炎症和良性肿瘤患者的临床和影像学数据。安徽省两家医院的数据按8:2的比例分为开发集(n = 1696)和测试集(n = 424),外部验证集(n = 909)来自浙江省。结果:共纳入3030例患者:非小细胞肺癌2269例,肉芽肿性炎症529例,良性肿瘤232例。测试集RF和XGBoost的Obuchowski指数分别为0.7193 (95% CI: 0.6567 ~ 0.7812)和0.8282 (95% CI: 0.7883 ~ 0.8650)。在外部验证集中,RF的指数为0.7932 (95% CI: 0.5772 -0.8250), XGBoost的指数为0.8074 (95% CI: 0.7740-0.8387)。XGBoost在测试集(0.81)和外部验证集(0.79)中都获得了更好的准确性。校准曲线和决策曲线分析(DCA)显示XGBoost具有更高的临床净效益。结论:XGBoost模型在肺部疾病的三级分类中优于RF模型。关键相关性声明:XGBoost在准确分类非小细胞肺癌、肉芽肿性炎症和良性肿瘤方面优于Random Forest,通过多中心数据提供了优越的临床应用。重点:肺癌分型模型具有广泛的临床适用性。XGBoost使用CT成像数据优于随机森林。XGBoost模型可以在网站上部署,供临床医生使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Insights into Imaging
Insights into Imaging Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
7.30
自引率
4.30%
发文量
182
审稿时长
13 weeks
期刊介绍: Insights into Imaging (I³) is a peer-reviewed open access journal published under the brand SpringerOpen. All content published in the journal is freely available online to anyone, anywhere! I³ continuously updates scientific knowledge and progress in best-practice standards in radiology through the publication of original articles and state-of-the-art reviews and opinions, along with recommendations and statements from the leading radiological societies in Europe. Founded by the European Society of Radiology (ESR), I³ creates a platform for educational material, guidelines and recommendations, and a forum for topics of controversy. A balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes I³ an indispensable source for current information in this field. I³ is owned by the ESR, however authors retain copyright to their article according to the Creative Commons Attribution License (see Copyright and License Agreement). All articles can be read, redistributed and reused for free, as long as the author of the original work is cited properly. The open access fees (article-processing charges) for this journal are kindly sponsored by ESR for all Members. The journal went open access in 2012, which means that all articles published since then are freely available online.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信