Xiang Li, Adelumola Oladeinde, Michael Rothrock, Tae Jung Chung, Walid Ghazi Al Hakeem
{"title":"Using core genome and machine learning for serovar prediction in Salmonella enterica subspecies I strains.","authors":"Xiang Li, Adelumola Oladeinde, Michael Rothrock, Tae Jung Chung, Walid Ghazi Al Hakeem","doi":"10.1093/femsle/fnaf040","DOIUrl":null,"url":null,"abstract":"<p><p>This study presents a dual investigation of Salmonella enterica subspecies I, focusing on serovar prediction and core genome characteristics. We utilized two large genomic datasets (panX and NCBI Pathogen Detection) to test machine learning methods for predicting Salmonella serovars based on genomic differences. Among the four tested algorithms, the Random Forest model demonstrated higher performance, achieving 90.3% accuracy with the panX dataset and 95.3% with the NCBI dataset, particularly effective when trained on >50% of available data. When combined with hierarchical clustering validation, our approach achieved 100% prediction accuracy on the simulated data. Parallel analysis of panX core genome characteristics revealed that pathogenicity-related genes (including sseA, invA, mgtC, phoP, phoQ, and sitA) exhibited similar phylogenetic topologies distinct from the core genome phylogenetic tree, suggesting shared evolutionary histories. Notably, all identified core antibiotic resistance genes and virulence factors showed evidence of negative selection, indicating their essential role in bacterial survival. This study not only presents a promising machine learning-based alternative for Salmonella serovar classification, particularly valuable when analyzing newly identified serovars alongside known reference strains but also provides insights into the evolutionary dynamics of core virulence-associated genes, contributing to our understanding of Salmonella genomic architecture and pathogenicity.</p>","PeriodicalId":12214,"journal":{"name":"Fems Microbiology Letters","volume":"372 ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fems Microbiology Letters","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/femsle/fnaf040","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents a dual investigation of Salmonella enterica subspecies I, focusing on serovar prediction and core genome characteristics. We utilized two large genomic datasets (panX and NCBI Pathogen Detection) to test machine learning methods for predicting Salmonella serovars based on genomic differences. Among the four tested algorithms, the Random Forest model demonstrated higher performance, achieving 90.3% accuracy with the panX dataset and 95.3% with the NCBI dataset, particularly effective when trained on >50% of available data. When combined with hierarchical clustering validation, our approach achieved 100% prediction accuracy on the simulated data. Parallel analysis of panX core genome characteristics revealed that pathogenicity-related genes (including sseA, invA, mgtC, phoP, phoQ, and sitA) exhibited similar phylogenetic topologies distinct from the core genome phylogenetic tree, suggesting shared evolutionary histories. Notably, all identified core antibiotic resistance genes and virulence factors showed evidence of negative selection, indicating their essential role in bacterial survival. This study not only presents a promising machine learning-based alternative for Salmonella serovar classification, particularly valuable when analyzing newly identified serovars alongside known reference strains but also provides insights into the evolutionary dynamics of core virulence-associated genes, contributing to our understanding of Salmonella genomic architecture and pathogenicity.
期刊介绍:
FEMS Microbiology Letters gives priority to concise papers that merit rapid publication by virtue of their originality, general interest and contribution to new developments in microbiology. All aspects of microbiology, including virology, are covered.
2019 Impact Factor: 1.987, Journal Citation Reports (Source Clarivate, 2020)
Ranking: 98/135 (Microbiology)
The journal is divided into eight Sections:
Physiology and Biochemistry (including genetics, molecular biology and ‘omic’ studies)
Food Microbiology (from food production and biotechnology to spoilage and food borne pathogens)
Biotechnology and Synthetic Biology
Pathogens and Pathogenicity (including medical, veterinary, plant and insect pathogens – particularly those relating to food security – with the exception of viruses)
Environmental Microbiology (including ecophysiology, ecogenomics and meta-omic studies)
Virology (viruses infecting any organism, including Bacteria and Archaea)
Taxonomy and Systematics (for publication of novel taxa, taxonomic reclassifications and reviews of a taxonomic nature)
Professional Development (including education, training, CPD, research assessment frameworks, research and publication metrics, best-practice, careers and history of microbiology)
If you are unsure which Section is most appropriate for your manuscript, for example in the case of transdisciplinary studies, we recommend that you contact the Editor-In-Chief by email prior to submission. Our scope includes any type of microorganism - all members of the Bacteria and the Archaea and microbial members of the Eukarya (yeasts, filamentous fungi, microbial algae, protozoa, oomycetes, myxomycetes, etc.) as well as all viruses.