利用MALDI-TOF质谱和机器学习技术自动鉴定沙门氏菌血清型。

IF 6.1 2区 医学 Q1 MICROBIOLOGY
Journal of Clinical Microbiology Pub Date : 2025-07-09 Epub Date: 2025-06-11 DOI:10.1128/jcm.00037-25
Jun Ren, Jintao Xia, Mengyu Zhang, Chunhong Liu, Yuanyuan Xu, Jianing Wu, Yingzhu Li, Mingming Zhou, Shengjie Li, Wenjun Cao
{"title":"利用MALDI-TOF质谱和机器学习技术自动鉴定沙门氏菌血清型。","authors":"Jun Ren, Jintao Xia, Mengyu Zhang, Chunhong Liu, Yuanyuan Xu, Jianing Wu, Yingzhu Li, Mingming Zhou, Shengjie Li, Wenjun Cao","doi":"10.1128/jcm.00037-25","DOIUrl":null,"url":null,"abstract":"<p><p><i>Salmonella</i> serotyping is essential for epidemiological studies and clinical treatment guidance. However, traditional serological agglutination methods are time-consuming, technically complex, and difficult to adopt at scale. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) is a rapid and cost-effective microbial identification technique, but it cannot be used to differentiate <i>Salmonella</i> serotypes. This study aims to integrate MALDI-TOF MS with machine learning algorithms to develop and validate a model for <i>Salmonella</i> serotype identification, improving efficiency and simplifying workflows. A total of 692 <i>Salmonella</i> isolates from Children's Hospital, Zhejiang University School of Medicine (ZUCH) and Wanbei Coal-Electricity Group General Hospital (WCGH) were analyzed using MALDI-TOF MS, generating 2,048 spectra. The ZUCH data were randomly divided into training and internal validation sets. The WCGH data were used as an external validation set. Ten machine learning algorithms were evaluated for their ability to identify eight <i>Salmonella</i> serotypes (B, C1, C2/3, D, E, Not A-F, <i>Salmonella</i> Typhimurium, and <i>Salmonella</i> Enteritidis). From 192 initial features, 16 features were selected for the final model construction. XGBoost demonstrated the best discriminative ability (area under the receiver operating characteristic curve [AUC] = 0.9898, sensitivity = 0.88, and specificity = 0.98) for the training set. The streamlined XGBoost model achieved AUCs of 0.9662 and 0.9778 for the internal and external validation sets, respectively, accurately identifying <i>Salmonella</i> serotypes. To enhance usability, the model was deployed as a Streamlit-based application, facilitating interaction and broader application. MALDI-TOF MS combined with XGBoost provides a fast and accurate method for <i>Salmonella</i> serotype identification, offering an efficient solution for laboratory diagnostics and epidemiological studies.</p><p><strong>Importance: </strong><i>Salmonella</i> serotyping is vital for outbreak tracking and clinical guidance, but traditional methods are slow and laborious. This study combines matrix-assisted laser desorption ionization-time of flight mass spectrometry with machine learning (XGBoost) to enable rapid, accurate, and cost-effective serotyping. The streamlined model performed excellently in validation and was deployed as a user-friendly Streamlit app, enhancing usability. This innovation simplifies workflows, reduces diagnostic time, and supports scalable use in clinical and public health settings, improving outbreak response and epidemiological research.</p>","PeriodicalId":15511,"journal":{"name":"Journal of Clinical Microbiology","volume":" ","pages":"e0003725"},"PeriodicalIF":6.1000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239726/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automated identification of <i>Salmonella</i> serotype using MALDI-TOF mass spectrometry and machine learning techniques.\",\"authors\":\"Jun Ren, Jintao Xia, Mengyu Zhang, Chunhong Liu, Yuanyuan Xu, Jianing Wu, Yingzhu Li, Mingming Zhou, Shengjie Li, Wenjun Cao\",\"doi\":\"10.1128/jcm.00037-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><i>Salmonella</i> serotyping is essential for epidemiological studies and clinical treatment guidance. However, traditional serological agglutination methods are time-consuming, technically complex, and difficult to adopt at scale. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) is a rapid and cost-effective microbial identification technique, but it cannot be used to differentiate <i>Salmonella</i> serotypes. This study aims to integrate MALDI-TOF MS with machine learning algorithms to develop and validate a model for <i>Salmonella</i> serotype identification, improving efficiency and simplifying workflows. A total of 692 <i>Salmonella</i> isolates from Children's Hospital, Zhejiang University School of Medicine (ZUCH) and Wanbei Coal-Electricity Group General Hospital (WCGH) were analyzed using MALDI-TOF MS, generating 2,048 spectra. The ZUCH data were randomly divided into training and internal validation sets. The WCGH data were used as an external validation set. Ten machine learning algorithms were evaluated for their ability to identify eight <i>Salmonella</i> serotypes (B, C1, C2/3, D, E, Not A-F, <i>Salmonella</i> Typhimurium, and <i>Salmonella</i> Enteritidis). From 192 initial features, 16 features were selected for the final model construction. XGBoost demonstrated the best discriminative ability (area under the receiver operating characteristic curve [AUC] = 0.9898, sensitivity = 0.88, and specificity = 0.98) for the training set. The streamlined XGBoost model achieved AUCs of 0.9662 and 0.9778 for the internal and external validation sets, respectively, accurately identifying <i>Salmonella</i> serotypes. To enhance usability, the model was deployed as a Streamlit-based application, facilitating interaction and broader application. MALDI-TOF MS combined with XGBoost provides a fast and accurate method for <i>Salmonella</i> serotype identification, offering an efficient solution for laboratory diagnostics and epidemiological studies.</p><p><strong>Importance: </strong><i>Salmonella</i> serotyping is vital for outbreak tracking and clinical guidance, but traditional methods are slow and laborious. This study combines matrix-assisted laser desorption ionization-time of flight mass spectrometry with machine learning (XGBoost) to enable rapid, accurate, and cost-effective serotyping. The streamlined model performed excellently in validation and was deployed as a user-friendly Streamlit app, enhancing usability. This innovation simplifies workflows, reduces diagnostic time, and supports scalable use in clinical and public health settings, improving outbreak response and epidemiological research.</p>\",\"PeriodicalId\":15511,\"journal\":{\"name\":\"Journal of Clinical Microbiology\",\"volume\":\" \",\"pages\":\"e0003725\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239726/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Microbiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1128/jcm.00037-25\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Microbiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1128/jcm.00037-25","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

沙门氏菌血清分型对流行病学研究和临床治疗指导至关重要。然而,传统的血清学凝集方法耗时长,技术复杂,难以大规模采用。基质辅助激光解吸电离飞行时间质谱(MALDI-TOF MS)是一种快速、经济的微生物鉴定技术,但它不能用于沙门氏菌血清型的区分。本研究旨在将MALDI-TOF质谱与机器学习算法相结合,开发并验证沙门氏菌血清型鉴定模型,提高效率并简化工作流程。采用MALDI-TOF质谱法对浙江大学附属儿童医院、浙江大学医学院和皖北煤电集团总医院的692株沙门氏菌进行分析,得到2048个谱图。ZUCH数据随机分为训练集和内部验证集。WCGH数据被用作外部验证集。评估了10种机器学习算法识别8种沙门氏菌血清型(B、C1、C2/3、D、E、Not A-F、鼠伤寒沙门氏菌和肠炎沙门氏菌)的能力。从192个初始特征中,选择了16个特征进行最终的模型构建。XGBoost在训练集上表现出最佳的判别能力(受试者工作特征曲线下面积[AUC] = 0.9898,灵敏度= 0.88,特异性= 0.98)。流线型XGBoost模型在内部验证集和外部验证集的auc分别为0.9662和0.9778,可准确识别沙门氏菌血清型。为了提高可用性,该模型被部署为基于streamlite的应用程序,方便交互和更广泛的应用。MALDI-TOF MS联合XGBoost提供了一种快速、准确的沙门氏菌血清型鉴定方法,为实验室诊断和流行病学研究提供了有效的解决方案。重要性:沙门氏菌血清分型对疫情追踪和临床指导至关重要,但传统方法缓慢且费力。本研究将基质辅助激光解吸电离飞行时间质谱法与机器学习(XGBoost)相结合,实现快速、准确、经济高效的血清分型。流线型模型在验证中表现出色,并作为用户友好的Streamlit应用程序部署,增强了可用性。这一创新简化了工作流程,缩短了诊断时间,并支持在临床和公共卫生环境中可扩展使用,从而改善了疫情应对和流行病学研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated identification of Salmonella serotype using MALDI-TOF mass spectrometry and machine learning techniques.

Salmonella serotyping is essential for epidemiological studies and clinical treatment guidance. However, traditional serological agglutination methods are time-consuming, technically complex, and difficult to adopt at scale. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) is a rapid and cost-effective microbial identification technique, but it cannot be used to differentiate Salmonella serotypes. This study aims to integrate MALDI-TOF MS with machine learning algorithms to develop and validate a model for Salmonella serotype identification, improving efficiency and simplifying workflows. A total of 692 Salmonella isolates from Children's Hospital, Zhejiang University School of Medicine (ZUCH) and Wanbei Coal-Electricity Group General Hospital (WCGH) were analyzed using MALDI-TOF MS, generating 2,048 spectra. The ZUCH data were randomly divided into training and internal validation sets. The WCGH data were used as an external validation set. Ten machine learning algorithms were evaluated for their ability to identify eight Salmonella serotypes (B, C1, C2/3, D, E, Not A-F, Salmonella Typhimurium, and Salmonella Enteritidis). From 192 initial features, 16 features were selected for the final model construction. XGBoost demonstrated the best discriminative ability (area under the receiver operating characteristic curve [AUC] = 0.9898, sensitivity = 0.88, and specificity = 0.98) for the training set. The streamlined XGBoost model achieved AUCs of 0.9662 and 0.9778 for the internal and external validation sets, respectively, accurately identifying Salmonella serotypes. To enhance usability, the model was deployed as a Streamlit-based application, facilitating interaction and broader application. MALDI-TOF MS combined with XGBoost provides a fast and accurate method for Salmonella serotype identification, offering an efficient solution for laboratory diagnostics and epidemiological studies.

Importance: Salmonella serotyping is vital for outbreak tracking and clinical guidance, but traditional methods are slow and laborious. This study combines matrix-assisted laser desorption ionization-time of flight mass spectrometry with machine learning (XGBoost) to enable rapid, accurate, and cost-effective serotyping. The streamlined model performed excellently in validation and was deployed as a user-friendly Streamlit app, enhancing usability. This innovation simplifies workflows, reduces diagnostic time, and supports scalable use in clinical and public health settings, improving outbreak response and epidemiological research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Clinical Microbiology
Journal of Clinical Microbiology 医学-微生物学
CiteScore
17.10
自引率
4.30%
发文量
347
审稿时长
3 months
期刊介绍: The Journal of Clinical Microbiology® disseminates the latest research concerning the laboratory diagnosis of human and animal infections, along with the laboratory's role in epidemiology and the management of infectious diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信