Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study.

IF 3.4 3区医学 Q2 INFECTIOUS DISEASES

BMC Infectious Diseases Pub Date : 2025-03-27 DOI:10.1186/s12879-025-10797-7

Juan Xie, Run-Wei Ma, Yu-Jing Feng, Yuan Qiao, Hong-Yan Zhu, Xing-Ping Tao, Wen-Juan Chen, Cong-Yun Liu, Tan Li, Kai Liu, Li-Ming Cheng

{"title":"Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study.","authors":"Juan Xie, Run-Wei Ma, Yu-Jing Feng, Yuan Qiao, Hong-Yan Zhu, Xing-Ping Tao, Wen-Juan Chen, Cong-Yun Liu, Tan Li, Kai Liu, Li-Ming Cheng","doi":"10.1186/s12879-025-10797-7","DOIUrl":null,"url":null,"abstract":"Background: Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods.Objective: The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management.Methods: First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time.Results: The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making.Conclusion: The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.","PeriodicalId":8981,"journal":{"name":"BMC Infectious Diseases","volume":"25 1","pages":"428"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951648/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12879-025-10797-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods.

Objective: The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management.

Methods: First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time.

Results: The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making.

Conclusion: The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.

查看原文本刊更多论文

基于机器学习的儿童百日咳风险预测模型：一项多中心回顾性研究。

背景：百日咳是一种高度传染性的呼吸道疾病。尽管疫苗接种降低了发病率，但由于免疫逃逸和疫苗效力减弱，某些地区的病例再次出现。确定高危患者以减轻传播并及时避免并发症至关重要。然而，目前的诊断方法，包括PCR和细菌培养，既耗时又昂贵。一些研究试图建立基于多变量数据的风险预测模型，但其性能有待提高。因此，本研究旨在进一步优化和扩展风险评估工具，以更有效地识别高危人群，弥补现有诊断方法的不足。目的：本研究旨在建立一种既高效又具有良好泛化能力，适用于不同数据集的百日咳风险预测模型。该模型使用基于多中心数据的机器学习技术构建，并筛选关键特征。通过在在线平台上的部署，对模型的性能和泛化能力进行了评估。同时，本研究旨在为临床实践提供一种快速准确的辅助诊断工具，帮助及时识别高危患者，优化早期干预策略，降低并发症风险，减少传播，从而提高公共卫生管理效率。方法：首先收集来自7个中心的1085例疑似百日咳患者的资料，采用lasso回归和Boruta算法分析10个关键特征：PDW-MPV-RATIO、SII、白细胞、血小板分布宽度、平均血小板体积、淋巴细胞、咳嗽持续时间、疫苗接种、发热、溶解淋巴细胞。然后对8个模型进行训练和验证，以评估它们的性能，并根据这些特征确认它们与外部数据集的泛化能力。最后，构建了临床医生实时使用模型的在线平台。结果：随机森林模型在验证集中表现出良好的识别能力，AUC为0.98，在外部验证集中AUC为0.97。校正曲线和决策曲线分析表明，该模型在预测中低危患者方面具有较高的准确性，可以帮助临床医生避免不必要的干预，特别是在资源有限的情况下。该模型的应用有助于优化高危患者的早期识别和管理，提高临床决策水平。结论：本研究建立的百日咳预测模型经多中心数据验证，预测效果良好，可成功在线实施。未来的研究应拓宽数据来源，纳入动态数据，以提高模型的准确性和适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Infectious Diseases 医学-传染病学

CiteScore

6.50

自引率

0.00%

发文量

860

审稿时长

3.3 months

期刊介绍： BMC Infectious Diseases is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of infectious and sexually transmitted diseases in humans, as well as related molecular genetics, pathophysiology, and epidemiology.