Audêncio Victor, Diego Augusto Medeiros Santos, Eduardo Koerich Nery, Danilo Pereira Mori, Pamella Cristina de Carvalho Lucas, Denise Cammarota, Guillermo Leonardo Florez Montero, Fabiano Novaes Barcellos Filho, Ana Lúcia Frugis Yu, Telma Regina Marques Pinto Carvalhanas
{"title":"Improving meningitis surveillance and diagnosis with machine learning: Insights from São Paulo.","authors":"Audêncio Victor, Diego Augusto Medeiros Santos, Eduardo Koerich Nery, Danilo Pereira Mori, Pamella Cristina de Carvalho Lucas, Denise Cammarota, Guillermo Leonardo Florez Montero, Fabiano Novaes Barcellos Filho, Ana Lúcia Frugis Yu, Telma Regina Marques Pinto Carvalhanas","doi":"10.1371/journal.pdig.0000925","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Meningitis, an inflammatory condition of the membranes surrounding the brain and spinal cord, can be caused by various agents. Bacterial meningitis is particularly severe due to its high morbidity and mortality rates. This study aims to develop machine learning (ML) models to classify the aetiology of bacterial meningitis using data from the Notifiable Diseases Information System (SINAN) in São Paulo State, Brazil.</p><p><strong>Methods: </strong>Data were collected from the SINAN database, including sociodemographic variables, clinical symptoms, and cerebrospinal fluid (CSF) analyses. Five ML models Random Forest, LightGBM, XGBoost, CatBoost, and AdaBoost were applied to classify meningitis cases into bacterial, fungal, viral, and other types. Models were evaluated using metrics such as AUC-ROC, accuracy, precision, recall, F1-score, and MCC.</p><p><strong>Results: </strong>The CatBoost model demonstrated superior performance, achieving an AUC-ROC of 0.95 for binary classification (bacterial vs. non-bacterial) and 0.85 for multiclass classification (Neisseria meningitidis, Streptococcus pneumoniae, and Haemophilus influenzae). XGBoost and LightGBM also showed promising results with AUC-ROC scores of 0.94 and 0.92, respectively, for binary classification. The CatBoost model exhibited high sensitivity and reasonable specificity, highlighting its applicability in the rapid and accurate diagnosis of meningitis. SHAP analysis identified variables such as leukocyte count and the presence of petechiae as influential predictors in the models.</p><p><strong>Conclusion: </strong>ML algorithms, particularly CatBoost, XGBoost, and LightGBM, proved highly effective in the differential diagnosis of meningitis, offering a valuable tool for the rapid identification of meningitis types and bacterial serogroups. These techniques can be integrated into public health protocols to improve meningitis outbreak responses and optimize patient treatment.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 7","pages":"e0000925"},"PeriodicalIF":7.7000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12244477/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Meningitis, an inflammatory condition of the membranes surrounding the brain and spinal cord, can be caused by various agents. Bacterial meningitis is particularly severe due to its high morbidity and mortality rates. This study aims to develop machine learning (ML) models to classify the aetiology of bacterial meningitis using data from the Notifiable Diseases Information System (SINAN) in São Paulo State, Brazil.
Methods: Data were collected from the SINAN database, including sociodemographic variables, clinical symptoms, and cerebrospinal fluid (CSF) analyses. Five ML models Random Forest, LightGBM, XGBoost, CatBoost, and AdaBoost were applied to classify meningitis cases into bacterial, fungal, viral, and other types. Models were evaluated using metrics such as AUC-ROC, accuracy, precision, recall, F1-score, and MCC.
Results: The CatBoost model demonstrated superior performance, achieving an AUC-ROC of 0.95 for binary classification (bacterial vs. non-bacterial) and 0.85 for multiclass classification (Neisseria meningitidis, Streptococcus pneumoniae, and Haemophilus influenzae). XGBoost and LightGBM also showed promising results with AUC-ROC scores of 0.94 and 0.92, respectively, for binary classification. The CatBoost model exhibited high sensitivity and reasonable specificity, highlighting its applicability in the rapid and accurate diagnosis of meningitis. SHAP analysis identified variables such as leukocyte count and the presence of petechiae as influential predictors in the models.
Conclusion: ML algorithms, particularly CatBoost, XGBoost, and LightGBM, proved highly effective in the differential diagnosis of meningitis, offering a valuable tool for the rapid identification of meningitis types and bacterial serogroups. These techniques can be integrated into public health protocols to improve meningitis outbreak responses and optimize patient treatment.