Reconstruction of ancient operons from complete microbial genome sequences

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003 Pub Date : 2003-08-11 DOI:10.1109/CSB.2003.1227383

Yuhong Wang, J. Rose, Bi-Cheng Wang, Dawei Lin

{"title":"Reconstruction of ancient operons from complete microbial genome sequences","authors":"Yuhong Wang, J. Rose, Bi-Cheng Wang, Dawei Lin","doi":"10.1109/CSB.2003.1227383","DOIUrl":null,"url":null,"abstract":"Completed genomes not only provide DNA sequence information, but also reveal the relative locations of genes. In this paper, we propose a new method for reconstruction of \"ancient operons\" by taking advantages of the evolutionary information in both orthologous genes and their locations in a genome. The basic assumption is that the closer two genes were in an ancient genome, the more likely they will stay close in the current genome. An assembly of nonrandom neighboring pairs of genes in current genomes should be able to reconstruct the gene groups that were together at a certain point of time during evolution. Given the fact that genes that are close neighbors are more likely functionally related, the gene groups generated by this assembly process are named \"ancient operons\". The assembly is only meaningful when enough nonrandom pairs can be found. This was made possible by over 100 microbial genomes available in recent years. For proof of concept, we chose 63 nonredundant complete microbial genomes from RefSeq database [May 2003 release} at NCBI. In order to normalize the effect of protein sequence mutations and other changes due to evolution, we only consider assembly of COGs (cluster of orthologous group) in these genomes. There are total 4901 COGs from NCBI COG database are used. The assembly process is similar to the one that assembles DNA sequences into contigs. In our case, the neighbor COG pairs are used as basic assembly units. A target Junction is defined based on neighbor frequency of pair-wise link among all 4901 COGs after analysis for all 63 genomes. We used random cost algorithm, a global optimization algorithm to minimize the target function and assembled COGs into contigs. The significance of these contigs are then assessed by statistical methods. The results suggest that the assembled contigs are statistically and biologically significant. This method and the assembled ancient operons provides a new way for studying microbial genomes, their evolution and for annotating proteins of unknown functions.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2003.1227383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Completed genomes not only provide DNA sequence information, but also reveal the relative locations of genes. In this paper, we propose a new method for reconstruction of "ancient operons" by taking advantages of the evolutionary information in both orthologous genes and their locations in a genome. The basic assumption is that the closer two genes were in an ancient genome, the more likely they will stay close in the current genome. An assembly of nonrandom neighboring pairs of genes in current genomes should be able to reconstruct the gene groups that were together at a certain point of time during evolution. Given the fact that genes that are close neighbors are more likely functionally related, the gene groups generated by this assembly process are named "ancient operons". The assembly is only meaningful when enough nonrandom pairs can be found. This was made possible by over 100 microbial genomes available in recent years. For proof of concept, we chose 63 nonredundant complete microbial genomes from RefSeq database [May 2003 release} at NCBI. In order to normalize the effect of protein sequence mutations and other changes due to evolution, we only consider assembly of COGs (cluster of orthologous group) in these genomes. There are total 4901 COGs from NCBI COG database are used. The assembly process is similar to the one that assembles DNA sequences into contigs. In our case, the neighbor COG pairs are used as basic assembly units. A target Junction is defined based on neighbor frequency of pair-wise link among all 4901 COGs after analysis for all 63 genomes. We used random cost algorithm, a global optimization algorithm to minimize the target function and assembled COGs into contigs. The significance of these contigs are then assessed by statistical methods. The results suggest that the assembled contigs are statistically and biologically significant. This method and the assembled ancient operons provides a new way for studying microbial genomes, their evolution and for annotating proteins of unknown functions.

查看原文本刊更多论文

从完整的微生物基因组序列中重建古代操纵子

完整的基因组不仅提供了DNA序列信息，而且揭示了基因的相对位置。本文提出了一种利用同源基因及其在基因组中的位置的进化信息来重建“古代操纵子”的新方法。基本的假设是，两个基因在古代基因组中越接近，它们就越有可能在现在的基因组中保持接近。当前基因组中非随机相邻基因对的组装应该能够重建在进化过程中某个时间点在一起的基因群。考虑到近邻基因更有可能在功能上相关，这种组装过程产生的基因群被命名为“古代操纵子”。只有在找到足够多的非随机对时，程序集才有意义。这是由于近年来已有100多个微生物基因组。为了证明这一概念，我们从NCBI的RefSeq数据库[2003年5月发布]中选择了63个非冗余的完整微生物基因组。为了使蛋白质序列突变和其他进化变化的影响归一化，我们只考虑这些基因组中COGs(同源群簇)的组装。共使用了NCBI COG数据库中的4901个COG。组装过程类似于将DNA序列组装成contigs。在本例中，相邻的COG对用作基本装配单元。在对所有63个基因组进行分析后，根据4901个COGs的成对连接的邻居频率定义了一个目标结。我们使用随机代价算法，一种全局优化算法来最小化目标函数，并将cog装配成contigs。然后用统计方法评估这些组合的显著性。结果表明，组装的contigs具有统计学和生物学意义。该方法和组装的古操纵子为研究微生物基因组及其进化和未知功能蛋白的注释提供了新的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003

自引率

0.00%

发文量