Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.

IF 2.3 3区生物学 Q2 MULTIDISCIPLINARY SCIENCES

PeerJ Pub Date : 2025-05-20 eCollection Date: 2025-01-01 DOI:10.7717/peerj.19411

Bihua Yao, Xingyu Yu, Liannv Qiu, Er-Min Gu, Siyu Mao, Lei Jiang, Jijun Tong, Jianguo Wu

{"title":"Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.","authors":"Bihua Yao, Xingyu Yu, Liannv Qiu, Er-Min Gu, Siyu Mao, Lei Jiang, Jijun Tong, Jianguo Wu","doi":"10.7717/peerj.19411","DOIUrl":null,"url":null,"abstract":"Background: Tuberculous pleural effusion (TPE) is a prevalent tuberculosis complication, with diagnosis presenting considerable challenges. Timely and precise identification of TPE is vital for effective patient management and prognosis, yet existing diagnostic methods tend to be invasive, lengthy, and often lack sufficient accuracy. This study seeks to design and validate an interpretable machine learning model based on routine laboratory data to enable noninvasive and rapid TPE diagnosis.Methods: A multicenter prospective study was conducted across China between January 2021 and September 2024, enrolling 963 patients. The derivation cohort, comprising 763 patients, was employed for model training and internal validation, while 200 patients formed the external validation cohort. The model was built upon 18 routine laboratory parameters, including pleural fluid and serum biomarkers, with multiple machine learning (ML) algorithms evaluated. Light gradient boosting machine (LGBM) emerged as the top-performing model. Shapley Additive exPlanations (SHAP) analysis assessed feature importance and interpretability. Model performance was evaluated via area under the curve (AUC) and accuracy metrics.Results: Of the 10 ML models compared, LGBM demonstrated superior performance. Feature importance analysis identified 11 key variables, leading to constructing a highly interpretable LGBM model. The model achieved an AUC of 0.9454 in internal validation and 0.9262 in external validation, showcasing strong robustness and generalizability. SHAP analysis enhanced interpretability by highlighting each feature's contribution to prediction outcomes. This model has since been integrated into clinical practice for noninvasive, rapid TPE diagnosis. During external validation, the model achieved a sensitivity of 0.8600, specificity of 0.9056, positive predictive value of 0.8698, and negative predictive value of 0.8686, underscoring its accuracy across diverse patient cohorts.Interpretation: This interpretable machine learning model offers a noninvasive, accurate solution for early TPE diagnosis, significantly reducing reliance on invasive procedures. The integration of SHAP ensures the model's clinical interpretability, mitigating concerns surrounding the \"black-box\" nature of many machine learning approaches.Conclusions: This interpretable LGBM-based model provides a reliable, noninvasive tool for TPE diagnosis. It supports clinical decision-making with real-time risk assessment and promises broader applicability through future integration into clinical information systems.","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"13 ","pages":"e19411"},"PeriodicalIF":2.3000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101438/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.19411","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Tuberculous pleural effusion (TPE) is a prevalent tuberculosis complication, with diagnosis presenting considerable challenges. Timely and precise identification of TPE is vital for effective patient management and prognosis, yet existing diagnostic methods tend to be invasive, lengthy, and often lack sufficient accuracy. This study seeks to design and validate an interpretable machine learning model based on routine laboratory data to enable noninvasive and rapid TPE diagnosis.

Methods: A multicenter prospective study was conducted across China between January 2021 and September 2024, enrolling 963 patients. The derivation cohort, comprising 763 patients, was employed for model training and internal validation, while 200 patients formed the external validation cohort. The model was built upon 18 routine laboratory parameters, including pleural fluid and serum biomarkers, with multiple machine learning (ML) algorithms evaluated. Light gradient boosting machine (LGBM) emerged as the top-performing model. Shapley Additive exPlanations (SHAP) analysis assessed feature importance and interpretability. Model performance was evaluated via area under the curve (AUC) and accuracy metrics.

Results: Of the 10 ML models compared, LGBM demonstrated superior performance. Feature importance analysis identified 11 key variables, leading to constructing a highly interpretable LGBM model. The model achieved an AUC of 0.9454 in internal validation and 0.9262 in external validation, showcasing strong robustness and generalizability. SHAP analysis enhanced interpretability by highlighting each feature's contribution to prediction outcomes. This model has since been integrated into clinical practice for noninvasive, rapid TPE diagnosis. During external validation, the model achieved a sensitivity of 0.8600, specificity of 0.9056, positive predictive value of 0.8698, and negative predictive value of 0.8686, underscoring its accuracy across diverse patient cohorts.

Interpretation: This interpretable machine learning model offers a noninvasive, accurate solution for early TPE diagnosis, significantly reducing reliance on invasive procedures. The integration of SHAP ensures the model's clinical interpretability, mitigating concerns surrounding the "black-box" nature of many machine learning approaches.

Conclusions: This interpretable LGBM-based model provides a reliable, noninvasive tool for TPE diagnosis. It supports clinical decision-making with real-time risk assessment and promises broader applicability through future integration into clinical information systems.

查看原文本刊更多论文

利用LGBM和SHAP对结核性胸腔积液进行可解释的无创诊断：机器学习模型的开发和临床应用。

背景：结核性胸腔积液（TPE）是一种常见的结核病并发症，其诊断具有相当大的挑战性。及时、准确地识别TPE对于有效的患者管理和预后至关重要，但现有的诊断方法往往是侵入性的、冗长的，而且往往缺乏足够的准确性。本研究旨在设计并验证一种基于常规实验室数据的可解释机器学习模型，以实现无创和快速的TPE诊断。方法：于2021年1月至2024年9月在中国开展了一项多中心前瞻性研究，纳入了963名患者。衍生队列包括763例患者，用于模型训练和内部验证，200例患者组成外部验证队列。该模型基于18个常规实验室参数，包括胸膜液和血清生物标志物，并评估了多种机器学习（ML）算法。光梯度增强机（LGBM）成为表现最好的机型。Shapley加性解释（SHAP）分析评估了特征的重要性和可解释性。通过曲线下面积（AUC）和精度指标评估模型性能。结果：在10 ML模型中，LGBM表现出更好的性能。特征重要性分析确定了11个关键变量，从而构建了一个高度可解释的LGBM模型。模型内部验证的AUC为0.9454，外部验证的AUC为0.9262，具有较强的鲁棒性和泛化性。SHAP分析通过突出每个特征对预测结果的贡献来增强可解释性。该模型已被整合到临床实践中，用于无创、快速的TPE诊断。在外部验证中，该模型的敏感性为0.8600，特异性为0.9056，阳性预测值为0.8698，阴性预测值为0.8686，强调了其在不同患者队列中的准确性。解释：这种可解释的机器学习模型为早期TPE诊断提供了一种非侵入性、准确的解决方案，显著减少了对侵入性手术的依赖。SHAP的集成确保了模型的临床可解释性，减轻了围绕许多机器学习方法的“黑箱”性质的担忧。结论：该可解释的基于lgbm的模型为TPE诊断提供了可靠、无创的工具。它支持临床决策与实时风险评估，并承诺通过未来集成到临床信息系统更广泛的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ MULTIDISCIPLINARY SCIENCES-

CiteScore

4.70

自引率

3.70%

发文量

1665

审稿时长

10 weeks

期刊介绍： PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.