Pei Yee Woh , Fadjar Soengkono , Yehao Chen , Zati Hakim Azizul Hasan , Siti Nursheena Mohd Zain , Jose Quiroga , Kevin Wing Hin Kwok
{"title":"Genomic insights into nontyphoidal Salmonella: Prediction of antimicrobial resistance with whole genome-based machine learning","authors":"Pei Yee Woh , Fadjar Soengkono , Yehao Chen , Zati Hakim Azizul Hasan , Siti Nursheena Mohd Zain , Jose Quiroga , Kevin Wing Hin Kwok","doi":"10.1016/j.ijantimicag.2025.107575","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Nontyphoidal <em>Salmonella</em> is a world-leading foodborne pathogen associated with an increased rate of antimicrobial resistance (AMR) and remains endemic in Asia. Utilizing whole genome sequencing (WGS) could significantly contribute to AMR prediction, from bioinformatic phylogenomic analysis to the advancement of machine learning (ML), leading towards automated AMR diagnostic.</div></div><div><h3>Methods</h3><div>We obtained the <em>Salmonella</em> WGS from the National Centre for Biotechnology Information database and analysed their resistance profiles. We extracted, transformed, and labelled the resistance data with one-hot encoding platform for eXtreme Gradient Boosting (XGBoost) and convolutional neural network (CNN) model construction, training, and evaluation.</div></div><div><h3>Results</h3><div>We selected a total of 788 <em>Salmonella</em> isolates associated with resistance genotype and phenotype data. These isolates had high resistance to aminoglycoside, beta-lactam, phenicol, quinolone, sulphonamide, tetracycline, and trimethoprim. <em>S</em>. Weltevreden ST365 (<em>n</em> = 121) was the most common serovar with the highest occurrence in food products. Through ML, both XGBoost and CNN models enabled highly accurate AMR prediction with performance accuracy of 0.97625 and 0.9904, respectively. Moreover, the interpretation of Shapley Additive exPlanations values uncovers the most valuable genomic features and associated genes for each antimicrobial agent tested.</div></div><div><h3>Conclusions</h3><div>Our study provides new knowledge in demonstrating the AMR phylogeographical relatedness and AMR prediction through XGBoost and CNN with competitive performance. Hence, WGS-based ML prediction and its machine application could be promoted as a promising tool for AMR work in food safety and public health settings.</div></div><div><h3>Video Abstract</h3><div><span><span><span><span><video><source></source></video></span><span><span>Download: <span>Download video (5MB)</span></span></span></span></span></span></div></div>","PeriodicalId":13818,"journal":{"name":"International Journal of Antimicrobial Agents","volume":"66 5","pages":"Article 107575"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Antimicrobial Agents","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092485792500130X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Nontyphoidal Salmonella is a world-leading foodborne pathogen associated with an increased rate of antimicrobial resistance (AMR) and remains endemic in Asia. Utilizing whole genome sequencing (WGS) could significantly contribute to AMR prediction, from bioinformatic phylogenomic analysis to the advancement of machine learning (ML), leading towards automated AMR diagnostic.
Methods
We obtained the Salmonella WGS from the National Centre for Biotechnology Information database and analysed their resistance profiles. We extracted, transformed, and labelled the resistance data with one-hot encoding platform for eXtreme Gradient Boosting (XGBoost) and convolutional neural network (CNN) model construction, training, and evaluation.
Results
We selected a total of 788 Salmonella isolates associated with resistance genotype and phenotype data. These isolates had high resistance to aminoglycoside, beta-lactam, phenicol, quinolone, sulphonamide, tetracycline, and trimethoprim. S. Weltevreden ST365 (n = 121) was the most common serovar with the highest occurrence in food products. Through ML, both XGBoost and CNN models enabled highly accurate AMR prediction with performance accuracy of 0.97625 and 0.9904, respectively. Moreover, the interpretation of Shapley Additive exPlanations values uncovers the most valuable genomic features and associated genes for each antimicrobial agent tested.
Conclusions
Our study provides new knowledge in demonstrating the AMR phylogeographical relatedness and AMR prediction through XGBoost and CNN with competitive performance. Hence, WGS-based ML prediction and its machine application could be promoted as a promising tool for AMR work in food safety and public health settings.
期刊介绍:
The International Journal of Antimicrobial Agents is a peer-reviewed publication offering comprehensive and current reference information on the physical, pharmacological, in vitro, and clinical properties of individual antimicrobial agents, covering antiviral, antiparasitic, antibacterial, and antifungal agents. The journal not only communicates new trends and developments through authoritative review articles but also addresses the critical issue of antimicrobial resistance, both in hospital and community settings. Published content includes solicited reviews by leading experts and high-quality original research papers in the specified fields.