Integrating WGCNA and machine learning to distinguish active pulmonary tuberculosis from latent tuberculosis infection based on neutrophil extracellular trap-related genes.
Tao Wang, Tao Lu, Weili Lu, Jiahuan He, Zhiyu Wu, Ying Lei
{"title":"Integrating WGCNA and machine learning to distinguish active pulmonary tuberculosis from latent tuberculosis infection based on neutrophil extracellular trap-related genes.","authors":"Tao Wang, Tao Lu, Weili Lu, Jiahuan He, Zhiyu Wu, Ying Lei","doi":"10.1016/j.diagmicrobio.2025.117053","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Pulmonary tuberculosis (PTB) remains a major global public health challenge, with diagnostic delays being a key factor contributing to its high morbidity and mortality. Growing evidence suggests that neutrophil extracellular traps (NETs) are closely associated with PTB pathogenesis. This study focuses on elucidating the role of NETs in PTB and identifying critical diagnostic methods and potential biomarkers.</p><p><strong>Methods: </strong>Weighted gene co-expression network analysis (WGCNA) was employed to identify the three modules most strongly correlated with NETs. Differentially expressed genes (DEGs) from GSE39939 dataset were intersected with module genes to obtain NET-related DEGs. Four machine learning algorithms (LASSO, random forest, RFE, and Boruta) were applied to select feature genes and develop a PTB diagnostic model. Model's performance was evaluated using support vector machine (SVM)-based receiver operating characteristic (ROC) and precision-recall (PR) curves, with validation in the GSE39940 dataset. The optimal algorithm was selected to refine feature genes and construct a miRNA-gene regulatory network.</p><p><strong>Results: </strong>ROC and PR curve analyses revealed that RFE and Boruta algorithms exhibited superior diagnostic efficacy in distinguishing active PTB from latent TB infection (LTBI). Further analysis identified five overlapping high-ranking feature genes (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1) between the RFE and Boruta algorithms. hsa-miR-1264, hsa-miR-664a-3p, hsa-miR-548e-5p, hsa-miR-4775, and hsa-miR-5056 were predicted to potentially target these genes.</p><p><strong>Conclusion: </strong>RFE algorithm achieves high diagnostic accuracy for PTB and identifies five potential biomarkers (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1). These findings may provide valuable tools for PTB diagnosis and treatment.</p>","PeriodicalId":11329,"journal":{"name":"Diagnostic microbiology and infectious disease","volume":"113 4","pages":"117053"},"PeriodicalIF":1.8000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic microbiology and infectious disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.diagmicrobio.2025.117053","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Pulmonary tuberculosis (PTB) remains a major global public health challenge, with diagnostic delays being a key factor contributing to its high morbidity and mortality. Growing evidence suggests that neutrophil extracellular traps (NETs) are closely associated with PTB pathogenesis. This study focuses on elucidating the role of NETs in PTB and identifying critical diagnostic methods and potential biomarkers.
Methods: Weighted gene co-expression network analysis (WGCNA) was employed to identify the three modules most strongly correlated with NETs. Differentially expressed genes (DEGs) from GSE39939 dataset were intersected with module genes to obtain NET-related DEGs. Four machine learning algorithms (LASSO, random forest, RFE, and Boruta) were applied to select feature genes and develop a PTB diagnostic model. Model's performance was evaluated using support vector machine (SVM)-based receiver operating characteristic (ROC) and precision-recall (PR) curves, with validation in the GSE39940 dataset. The optimal algorithm was selected to refine feature genes and construct a miRNA-gene regulatory network.
Results: ROC and PR curve analyses revealed that RFE and Boruta algorithms exhibited superior diagnostic efficacy in distinguishing active PTB from latent TB infection (LTBI). Further analysis identified five overlapping high-ranking feature genes (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1) between the RFE and Boruta algorithms. hsa-miR-1264, hsa-miR-664a-3p, hsa-miR-548e-5p, hsa-miR-4775, and hsa-miR-5056 were predicted to potentially target these genes.
Conclusion: RFE algorithm achieves high diagnostic accuracy for PTB and identifies five potential biomarkers (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1). These findings may provide valuable tools for PTB diagnosis and treatment.
期刊介绍:
Diagnostic Microbiology and Infectious Disease keeps you informed of the latest developments in clinical microbiology and the diagnosis and treatment of infectious diseases. Packed with rigorously peer-reviewed articles and studies in bacteriology, immunology, immunoserology, infectious diseases, mycology, parasitology, and virology, the journal examines new procedures, unusual cases, controversial issues, and important new literature. Diagnostic Microbiology and Infectious Disease distinguished independent editorial board, consisting of experts from many medical specialties, ensures you extensive and authoritative coverage.