{"title":"Genomic Markers Distinguishing Shiga Toxin-Producing <i>Escherichia coli</i>: Insights from Pangenome and Phylogenomic Analyses.","authors":"Asmaa Elrefaey, Kingsley E Bentum, Emmanuel Kuufire, Tyric James, Rejoice Nyarku, Viona Osei, Yilkal Woube, Temesgen Samuel, Woubit Abebe","doi":"10.3390/pathogens14090862","DOIUrl":null,"url":null,"abstract":"<p><p>Shiga toxin-producing <i>Escherichia coli</i> (STEC) are genetically diverse foodborne pathogens of major global public health concerns. Serogroup-level identification is critical for effective surveillance and outbreak control; however, it is often challenged by STEC's genome plasticity and frequent recombination. In this study, we employed a standardized pangenomic pipeline integrating Roary ILP Bacterial Core Annotation Pipeline (RIBAP) and Panaroo to analyze 160 complete, high-quality STEC genomes representing eight major serogroups at a 95% sequence identity threshold. Candidate serogroup-specific markers were identified using gene presence/absence profiles from RIBAP and Panaroo. Our analysis revealed several high-confidence markers, including metabolic genes (<i>dgcE</i>, <i>fcl_</i>2, <i>dmsA</i>, <i>hisC</i>) and surface polysaccharide-related genes (<i>capD</i>, <i>rfbX</i>, <i>wzzB</i>). Comparative pangenomic evaluation showed that RIBAP predicted a larger pangenome size than Panaroo. Additionally, some genomes from the O104:H1, O145:H28, and O45:H2 serotypes clustered outside their expected clades, indicating sporadic serotype misplacements in phylogenetic reconstructions. Functional annotation suggested that most candidate markers are involved in critical processes such as glucose metabolism, lipopolysaccharide biosynthesis, and cell surface assembly. Notably, approximately 22.9% of the identified proteins were annotated as hypothetical. Overall, this study highlights the utility of pangenomic analysis for potential identification of clinically relevant STEC serogroups markers and phylogenetic interpretation. We also note that pangenome analysis could guide the development of more accurate diagnostic and surveillance tools.</p>","PeriodicalId":19758,"journal":{"name":"Pathogens","volume":"14 9","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12472240/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pathogens","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/pathogens14090862","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Shiga toxin-producing Escherichia coli (STEC) are genetically diverse foodborne pathogens of major global public health concerns. Serogroup-level identification is critical for effective surveillance and outbreak control; however, it is often challenged by STEC's genome plasticity and frequent recombination. In this study, we employed a standardized pangenomic pipeline integrating Roary ILP Bacterial Core Annotation Pipeline (RIBAP) and Panaroo to analyze 160 complete, high-quality STEC genomes representing eight major serogroups at a 95% sequence identity threshold. Candidate serogroup-specific markers were identified using gene presence/absence profiles from RIBAP and Panaroo. Our analysis revealed several high-confidence markers, including metabolic genes (dgcE, fcl_2, dmsA, hisC) and surface polysaccharide-related genes (capD, rfbX, wzzB). Comparative pangenomic evaluation showed that RIBAP predicted a larger pangenome size than Panaroo. Additionally, some genomes from the O104:H1, O145:H28, and O45:H2 serotypes clustered outside their expected clades, indicating sporadic serotype misplacements in phylogenetic reconstructions. Functional annotation suggested that most candidate markers are involved in critical processes such as glucose metabolism, lipopolysaccharide biosynthesis, and cell surface assembly. Notably, approximately 22.9% of the identified proteins were annotated as hypothetical. Overall, this study highlights the utility of pangenomic analysis for potential identification of clinically relevant STEC serogroups markers and phylogenetic interpretation. We also note that pangenome analysis could guide the development of more accurate diagnostic and surveillance tools.
期刊介绍:
Pathogens (ISSN 2076-0817) publishes reviews, regular research papers and short notes on all aspects of pathogens and pathogen-host interactions. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodical details must be provided for research articles.