Development and validation of a machine learning model for predicting co-infection of Mycoplasma pneumonia in pediatric patients.

IF 1.7 4区医学 Q2 PEDIATRICS

Translational pediatrics Pub Date : 2025-06-27 Epub Date: 2025-06-25 DOI:10.21037/tp-2024-562

Xiaohan Liu, Wenbei Xu, Lingjian Meng, Juan Long, Xiaonan Sun, Qiang Li, Haiquan Kang, Yiping Mao, Chunfeng Hu, Kai Xu, Yankai Meng

{"title":"Development and validation of a machine learning model for predicting co-infection of Mycoplasma pneumonia in pediatric patients.","authors":"Xiaohan Liu, Wenbei Xu, Lingjian Meng, Juan Long, Xiaonan Sun, Qiang Li, Haiquan Kang, Yiping Mao, Chunfeng Hu, Kai Xu, Yankai Meng","doi":"10.21037/tp-2024-562","DOIUrl":null,"url":null,"abstract":"Background: Mycoplasma pneumoniae pneumonia (MPP) is endemic in China, while Mycoplasma co-infection with other pathogens (Co-MPP) linked to severe outcomes. Despite radiomics and machine learning potential in pneumonia, pediatric Co-MPP differentiation remains underexplored. This study aimed to bridge this gap by evaluating machine learning models, particularly radiomics features derived from high-resolution computed tomography (HRCT) scans, to differentiate between MPP and Co-MPP, and to compare their predictive performance with traditional clinical models.Methods: We conducted a retrospective analysis of hospitalized pediatric pneumonia patients from June to December 2023 at Affiliated Hospital of Xuzhou Medical University. Chest computed tomography (CT) scans were performed using a multi-slice CT scanner with over 64 detectors. Fluorescent quantitative polymerase chain reaction (PCR) was used to detect 14 pathogens in bronchoalveolar lavage (BAL) fluid. The most recent laboratory results prior to BAL were included in multifactorial logistic regression (LR) analysis, selecting variables with P<0.05 for constructing the clinical model. The largest cross-section of the lesion was selected, and image segmentation was performed using ITK-SNAP software. Radiomics features were extracted with Pyradomics. Features were filtered using t-tests, Mann-Whitney U tests, and Spearman rank correlation coefficients. The least absolute shrinkage and selection operator (LASSO) regression and ten-fold cross-validation were used for feature selection and to construct the radiomics model, optimizing the dimensionality of the dataset. Eight different machine learning models [LR, support vector machine (SVM), K-nearest neighbor (KNN), RandomForest, ExtraTrees, eXtreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP)] were trained with the selected features, with five-fold cross-validation yielding the final radiomics model. The clinical and radiomics models were combined to create a nomogram model. Data analysis was performed using R software and SPSS 26.0.Results: A total of 124 cases of MPP and children with Co-MPP were included. The extracted radiomics features consisted of first-order signal intensity features (n=360), morphological features (n=14), and texture features (n=1,460). LASSO regression and ten-fold cross-validation identified 23 non-zero correlation coefficient features for constructing Radscore. The LR model demonstrated superior predictive performance for Co-MPP in the validation cohort, with an area under the curve (AUC) of 0.951, sensitivity of 0.778, and specificity of 0.875. The nomogram model combining clinical and radiomics labels significantly outperformed the clinical model (P=0.004). Calibration curve analysis indicated that the nomogram model exhibited the best agreement with actual values. Both the radiomics and nomogram models provided greater clinical net benefits compared to the clinical model.Conclusions: The radiomics model trained using machine learning effectively predicts Co-MPP in children, while the combined clinical and radiomics nomogram model offers the best predictive performance.","PeriodicalId":23294,"journal":{"name":"Translational pediatrics","volume":"14 6","pages":"1201-1212"},"PeriodicalIF":1.7000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12268717/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tp-2024-562","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Mycoplasma pneumoniae pneumonia (MPP) is endemic in China, while Mycoplasma co-infection with other pathogens (Co-MPP) linked to severe outcomes. Despite radiomics and machine learning potential in pneumonia, pediatric Co-MPP differentiation remains underexplored. This study aimed to bridge this gap by evaluating machine learning models, particularly radiomics features derived from high-resolution computed tomography (HRCT) scans, to differentiate between MPP and Co-MPP, and to compare their predictive performance with traditional clinical models.

Methods: We conducted a retrospective analysis of hospitalized pediatric pneumonia patients from June to December 2023 at Affiliated Hospital of Xuzhou Medical University. Chest computed tomography (CT) scans were performed using a multi-slice CT scanner with over 64 detectors. Fluorescent quantitative polymerase chain reaction (PCR) was used to detect 14 pathogens in bronchoalveolar lavage (BAL) fluid. The most recent laboratory results prior to BAL were included in multifactorial logistic regression (LR) analysis, selecting variables with P<0.05 for constructing the clinical model. The largest cross-section of the lesion was selected, and image segmentation was performed using ITK-SNAP software. Radiomics features were extracted with Pyradomics. Features were filtered using t-tests, Mann-Whitney U tests, and Spearman rank correlation coefficients. The least absolute shrinkage and selection operator (LASSO) regression and ten-fold cross-validation were used for feature selection and to construct the radiomics model, optimizing the dimensionality of the dataset. Eight different machine learning models [LR, support vector machine (SVM), K-nearest neighbor (KNN), RandomForest, ExtraTrees, eXtreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP)] were trained with the selected features, with five-fold cross-validation yielding the final radiomics model. The clinical and radiomics models were combined to create a nomogram model. Data analysis was performed using R software and SPSS 26.0.

Results: A total of 124 cases of MPP and children with Co-MPP were included. The extracted radiomics features consisted of first-order signal intensity features (n=360), morphological features (n=14), and texture features (n=1,460). LASSO regression and ten-fold cross-validation identified 23 non-zero correlation coefficient features for constructing Radscore. The LR model demonstrated superior predictive performance for Co-MPP in the validation cohort, with an area under the curve (AUC) of 0.951, sensitivity of 0.778, and specificity of 0.875. The nomogram model combining clinical and radiomics labels significantly outperformed the clinical model (P=0.004). Calibration curve analysis indicated that the nomogram model exhibited the best agreement with actual values. Both the radiomics and nomogram models provided greater clinical net benefits compared to the clinical model.

Conclusions: The radiomics model trained using machine learning effectively predicts Co-MPP in children, while the combined clinical and radiomics nomogram model offers the best predictive performance.

Abstract Image

查看原文本刊更多论文

用于预测儿科患者肺炎支原体合并感染的机器学习模型的开发和验证

背景：肺炎支原体肺炎（MPP）在中国是一种地方性疾病，而支原体与其他病原体的合并感染（Co-MPP）与严重后果有关。尽管放射组学和机器学习在肺炎中具有潜力，但儿科Co-MPP分化仍未得到充分探索。本研究旨在通过评估机器学习模型，特别是来自高分辨率计算机断层扫描（HRCT）的放射组学特征，来区分MPP和Co-MPP，并将其预测性能与传统临床模型进行比较，从而弥合这一差距。方法：对徐州医科大学附属医院2023年6月至12月住院的儿童肺炎患者进行回顾性分析。胸部计算机断层扫描（CT）使用64个探测器的多层CT扫描仪进行。采用荧光定量聚合酶链反应（PCR）检测支气管肺泡灌洗液中14种病原菌。在BAL之前的最新实验室结果被纳入多因素逻辑回归（LR）分析，选择变量与PU测试和Spearman秩相关系数。使用最小绝对收缩和选择算子（LASSO）回归和十倍交叉验证进行特征选择和构建放射组学模型，优化数据集的维数。八种不同的机器学习模型[LR，支持向量机（SVM）， k近邻（KNN），随机森林，ExtraTrees，极端梯度增强（XGBoost），光梯度增强机（LightGBM）和多层感知器（MLP）]用所选的特征进行训练，经过五次交叉验证产生最终的放射组学模型。将临床模型和放射组学模型相结合，形成nomogram模型。采用R软件和SPSS 26.0进行数据分析。结果：共纳入124例MPP及合并MPP患儿。提取的放射组学特征包括一阶信号强度特征（n=360）、形态特征（n=14）和纹理特征（n= 1460）。LASSO回归和十重交叉验证鉴定了23个非零相关系数特征用于构建Radscore。在验证队列中，LR模型对Co-MPP的预测效果较好，曲线下面积（AUC）为0.951，灵敏度为0.778，特异性为0.875。结合临床和放射组学标记的nomogram模型显著优于临床模型（P=0.004）。标定曲线分析表明，模态图模型与实际值吻合最好。与临床模型相比，放射组学和nomogram模型都提供了更大的临床净收益。结论：使用机器学习训练的放射组学模型可以有效预测儿童的Co-MPP，而临床与放射组学相结合的nomogram模型预测效果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊