L. Zaslavsky, V. Chetvernin, D. Dernovoy, B. Fedorov, W. Klimke, A. Souvorov, I. Tolstoy, T. Tatusova, D. Lipman
{"title":"An approach to phylogenomic analysis of bacterial pathogens","authors":"L. Zaslavsky, V. Chetvernin, D. Dernovoy, B. Fedorov, W. Klimke, A. Souvorov, I. Tolstoy, T. Tatusova, D. Lipman","doi":"10.1109/BIBMW.2011.6112529","DOIUrl":null,"url":null,"abstract":"From the beginning of the microbial genome sequencing era, researchers have shown a commendable commitment to phylogenetic diversity. The completion of one genome from each prokaryotic division or phylum is still a frequently articulated community goal. However, largely because of the interest in human pathogens and advances in sequencing technologies, there are also now a number of very closely related genomes whose organization and gene content can be directly compared. Studying genetic variability of pathogenic bacteria using whole-genome sequencing provides a way to understanding the mechanism of bacterial adaptation to rapid environmental changes and can be a source of useful information on virulence mechanisms. The bacterial genome datasets available in public archives represent a large collection of genome at different levels of sequence quality and assembly. A fast and reliable method of phylogenetic classification based on genome sequences provides a necessary foundation for a more detailed comparative analysis. NCBI has developed an approach of grouping bacterial organisms into phylogenetic clades using a genome dissimilarity measure based on the comparison of universally conserved markers. Special adjustments have been made to compensate for data inaccuracy and incompleteness. Tests performed on complete and draft genomes from phylum Proteobacteria demonstrated that the proposed robust genomic distance allows stable and reliable species-level clustering and can be used for forming phylogenetic clades. Since the tradeoff for the increased robustness of the method is its limited sensitivity at a very fine level, a phylogenomic refinement could be done within each constructed clade when file-level phylogenetic resolution of close genomes is necessary.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"16 1","pages":"981-981"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
From the beginning of the microbial genome sequencing era, researchers have shown a commendable commitment to phylogenetic diversity. The completion of one genome from each prokaryotic division or phylum is still a frequently articulated community goal. However, largely because of the interest in human pathogens and advances in sequencing technologies, there are also now a number of very closely related genomes whose organization and gene content can be directly compared. Studying genetic variability of pathogenic bacteria using whole-genome sequencing provides a way to understanding the mechanism of bacterial adaptation to rapid environmental changes and can be a source of useful information on virulence mechanisms. The bacterial genome datasets available in public archives represent a large collection of genome at different levels of sequence quality and assembly. A fast and reliable method of phylogenetic classification based on genome sequences provides a necessary foundation for a more detailed comparative analysis. NCBI has developed an approach of grouping bacterial organisms into phylogenetic clades using a genome dissimilarity measure based on the comparison of universally conserved markers. Special adjustments have been made to compensate for data inaccuracy and incompleteness. Tests performed on complete and draft genomes from phylum Proteobacteria demonstrated that the proposed robust genomic distance allows stable and reliable species-level clustering and can be used for forming phylogenetic clades. Since the tradeoff for the increased robustness of the method is its limited sensitivity at a very fine level, a phylogenomic refinement could be done within each constructed clade when file-level phylogenetic resolution of close genomes is necessary.