Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.
{"title":"Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.","authors":"Bihua Yao, Xingyu Yu, Liannv Qiu, Er-Min Gu, Siyu Mao, Lei Jiang, Jijun Tong, Jianguo Wu","doi":"10.7717/peerj.19411","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Tuberculous pleural effusion (TPE) is a prevalent tuberculosis complication, with diagnosis presenting considerable challenges. Timely and precise identification of TPE is vital for effective patient management and prognosis, yet existing diagnostic methods tend to be invasive, lengthy, and often lack sufficient accuracy. This study seeks to design and validate an interpretable machine learning model based on routine laboratory data to enable noninvasive and rapid TPE diagnosis.</p><p><strong>Methods: </strong>A multicenter prospective study was conducted across China between January 2021 and September 2024, enrolling 963 patients. The derivation cohort, comprising 763 patients, was employed for model training and internal validation, while 200 patients formed the external validation cohort. The model was built upon 18 routine laboratory parameters, including pleural fluid and serum biomarkers, with multiple machine learning (ML) algorithms evaluated. Light gradient boosting machine (LGBM) emerged as the top-performing model. Shapley Additive exPlanations (SHAP) analysis assessed feature importance and interpretability. Model performance was evaluated <i>via</i> area under the curve (AUC) and accuracy metrics.</p><p><strong>Results: </strong>Of the 10 ML models compared, LGBM demonstrated superior performance. Feature importance analysis identified 11 key variables, leading to constructing a highly interpretable LGBM model. The model achieved an AUC of 0.9454 in internal validation and 0.9262 in external validation, showcasing strong robustness and generalizability. SHAP analysis enhanced interpretability by highlighting each feature's contribution to prediction outcomes. This model has since been integrated into clinical practice for noninvasive, rapid TPE diagnosis. During external validation, the model achieved a sensitivity of 0.8600, specificity of 0.9056, positive predictive value of 0.8698, and negative predictive value of 0.8686, underscoring its accuracy across diverse patient cohorts.</p><p><strong>Interpretation: </strong>This interpretable machine learning model offers a noninvasive, accurate solution for early TPE diagnosis, significantly reducing reliance on invasive procedures. The integration of SHAP ensures the model's clinical interpretability, mitigating concerns surrounding the \"black-box\" nature of many machine learning approaches.</p><p><strong>Conclusions: </strong>This interpretable LGBM-based model provides a reliable, noninvasive tool for TPE diagnosis. It supports clinical decision-making with real-time risk assessment and promises broader applicability through future integration into clinical information systems.</p>","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"13 ","pages":"e19411"},"PeriodicalIF":2.3000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101438/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.19411","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Tuberculous pleural effusion (TPE) is a prevalent tuberculosis complication, with diagnosis presenting considerable challenges. Timely and precise identification of TPE is vital for effective patient management and prognosis, yet existing diagnostic methods tend to be invasive, lengthy, and often lack sufficient accuracy. This study seeks to design and validate an interpretable machine learning model based on routine laboratory data to enable noninvasive and rapid TPE diagnosis.
Methods: A multicenter prospective study was conducted across China between January 2021 and September 2024, enrolling 963 patients. The derivation cohort, comprising 763 patients, was employed for model training and internal validation, while 200 patients formed the external validation cohort. The model was built upon 18 routine laboratory parameters, including pleural fluid and serum biomarkers, with multiple machine learning (ML) algorithms evaluated. Light gradient boosting machine (LGBM) emerged as the top-performing model. Shapley Additive exPlanations (SHAP) analysis assessed feature importance and interpretability. Model performance was evaluated via area under the curve (AUC) and accuracy metrics.
Results: Of the 10 ML models compared, LGBM demonstrated superior performance. Feature importance analysis identified 11 key variables, leading to constructing a highly interpretable LGBM model. The model achieved an AUC of 0.9454 in internal validation and 0.9262 in external validation, showcasing strong robustness and generalizability. SHAP analysis enhanced interpretability by highlighting each feature's contribution to prediction outcomes. This model has since been integrated into clinical practice for noninvasive, rapid TPE diagnosis. During external validation, the model achieved a sensitivity of 0.8600, specificity of 0.9056, positive predictive value of 0.8698, and negative predictive value of 0.8686, underscoring its accuracy across diverse patient cohorts.
Interpretation: This interpretable machine learning model offers a noninvasive, accurate solution for early TPE diagnosis, significantly reducing reliance on invasive procedures. The integration of SHAP ensures the model's clinical interpretability, mitigating concerns surrounding the "black-box" nature of many machine learning approaches.
Conclusions: This interpretable LGBM-based model provides a reliable, noninvasive tool for TPE diagnosis. It supports clinical decision-making with real-time risk assessment and promises broader applicability through future integration into clinical information systems.
期刊介绍:
PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.