A machine learning-based predictive model for multilobar pulmonary consolidation induced by macrolide-resistant Mycoplasma pneumoniae pneumonia caused by the 23S rRNA A2063G mutation.
{"title":"A machine learning-based predictive model for multilobar pulmonary consolidation induced by macrolide-resistant <i>Mycoplasma pneumoniae</i> pneumonia caused by the 23S rRNA A2063G mutation.","authors":"Yan Guo, Yonghan Luo","doi":"10.1128/spectrum.02458-25","DOIUrl":null,"url":null,"abstract":"<p><p>This study aims to develop a machine learning (ML)-based predictive model for assessing the risk of multilobar pulmonary consolidation in children with macrolide-resistant <i>Mycoplasma pneumoniae</i> pneumonia (MRMP) caused by the 23S rRNA A2063G mutation, a subgroup underrepresented in prior studies. A total of 404 MRMP cases diagnosed between October 2024 and February 2025 were included in this study. Key clinical characteristics, including laboratory test results, symptoms, and treatment outcomes, were extracted from electronic medical records. Six ML models, including Logistic Regression, Naive Bayes, K-Nearest Neighbors, Multilayer Perceptron, Random Forest, and XG-Boost, were developed to predict multilobar pulmonary consolidation. Least absolute shrinkage and selection operator (LASSO) regression was used to select relevant variables. Model performance was then evaluated using receiver operating characteristic (ROC) curves and decision curve analysis (DCA). Finally, Sharpley Additive Explanations was used for model interpretability. XG-Boost demonstrated the highest predictive performance with an area under the ROC curve of 0.976 and 0.904 in the training and validation sets, respectively, showing a high sensitivity of 0.97, specificity of 0.81, accuracy of 0.94, and an F1 score of 0.95. Key predictors identified for multilobar pulmonary consolidation included the top 10 variables: C-reactive protein, lactate dehydrogenase, fibrinogen, platelet count, albumin, hemoglobin, creatinine, aspartate aminotransferase, interleukin-6, and oxygen therapy. DCA showed that the model also exhibited strong clinical utility. The XG-Boost predictive model offers a robust tool for identifying high-risk children with MRMP caused by the 23S rRNA A2063G mutation. By integrating clinical features, the model enhances early risk stratification and can support clinical decision-making, improving the accuracy and efficiency of treatment plans.IMPORTANCEMacrolide-resistant <i>Mycoplasma pneumoniae</i> pneumonia caused by the 23S rRNA A2063G mutation poses a significant threat to pediatric health, often leading to severe multilobar pulmonary consolidation. This study develops a high-performance machine learning model (XG-Boost) that accurately predicts this complication using key clinical indicators such as C-reactive protein, lactate dehydrogenase, and IL-6. With an area under the ROC curve of 0.976, the model enables early risk stratification, guiding clinicians in optimizing treatment for high-risk children. By improving diagnostic precision and intervention timing, this tool can reduce disease severity, minimize hospital stays, and enhance patient outcomes. The interpretability of the model via Sharpley Additive Explanations analysis further ensures its clinical applicability, making it a valuable advancement in managing antibiotic-resistant pediatric pneumonia.</p>","PeriodicalId":18670,"journal":{"name":"Microbiology spectrum","volume":" ","pages":"e0245825"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiology spectrum","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/spectrum.02458-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to develop a machine learning (ML)-based predictive model for assessing the risk of multilobar pulmonary consolidation in children with macrolide-resistant Mycoplasma pneumoniae pneumonia (MRMP) caused by the 23S rRNA A2063G mutation, a subgroup underrepresented in prior studies. A total of 404 MRMP cases diagnosed between October 2024 and February 2025 were included in this study. Key clinical characteristics, including laboratory test results, symptoms, and treatment outcomes, were extracted from electronic medical records. Six ML models, including Logistic Regression, Naive Bayes, K-Nearest Neighbors, Multilayer Perceptron, Random Forest, and XG-Boost, were developed to predict multilobar pulmonary consolidation. Least absolute shrinkage and selection operator (LASSO) regression was used to select relevant variables. Model performance was then evaluated using receiver operating characteristic (ROC) curves and decision curve analysis (DCA). Finally, Sharpley Additive Explanations was used for model interpretability. XG-Boost demonstrated the highest predictive performance with an area under the ROC curve of 0.976 and 0.904 in the training and validation sets, respectively, showing a high sensitivity of 0.97, specificity of 0.81, accuracy of 0.94, and an F1 score of 0.95. Key predictors identified for multilobar pulmonary consolidation included the top 10 variables: C-reactive protein, lactate dehydrogenase, fibrinogen, platelet count, albumin, hemoglobin, creatinine, aspartate aminotransferase, interleukin-6, and oxygen therapy. DCA showed that the model also exhibited strong clinical utility. The XG-Boost predictive model offers a robust tool for identifying high-risk children with MRMP caused by the 23S rRNA A2063G mutation. By integrating clinical features, the model enhances early risk stratification and can support clinical decision-making, improving the accuracy and efficiency of treatment plans.IMPORTANCEMacrolide-resistant Mycoplasma pneumoniae pneumonia caused by the 23S rRNA A2063G mutation poses a significant threat to pediatric health, often leading to severe multilobar pulmonary consolidation. This study develops a high-performance machine learning model (XG-Boost) that accurately predicts this complication using key clinical indicators such as C-reactive protein, lactate dehydrogenase, and IL-6. With an area under the ROC curve of 0.976, the model enables early risk stratification, guiding clinicians in optimizing treatment for high-risk children. By improving diagnostic precision and intervention timing, this tool can reduce disease severity, minimize hospital stays, and enhance patient outcomes. The interpretability of the model via Sharpley Additive Explanations analysis further ensures its clinical applicability, making it a valuable advancement in managing antibiotic-resistant pediatric pneumonia.
期刊介绍:
Microbiology Spectrum publishes commissioned review articles on topics in microbiology representing ten content areas: Archaea; Food Microbiology; Bacterial Genetics, Cell Biology, and Physiology; Clinical Microbiology; Environmental Microbiology and Ecology; Eukaryotic Microbes; Genomics, Computational, and Synthetic Microbiology; Immunology; Pathogenesis; and Virology. Reviews are interrelated, with each review linking to other related content. A large board of Microbiology Spectrum editors aids in the development of topics for potential reviews and in the identification of an editor, or editors, who shepherd each collection.