Estimation of the number of authentic orphan genes in bacterial genomes.

S. Fukuchi, K. Nishikawa
{"title":"Estimation of the number of authentic orphan genes in bacterial genomes.","authors":"S. Fukuchi, K. Nishikawa","doi":"10.1093/DNARES/11.4.219","DOIUrl":null,"url":null,"abstract":"Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as \"orphans.\" The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of \"authentic\" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea.","PeriodicalId":11212,"journal":{"name":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","volume":"96 1","pages":"219-31, 311-313"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/DNARES/11.4.219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as "orphans." The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of "authentic" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea.
细菌基因组中真实孤儿基因数量的估计。
基因组注释产生了相当数量的假定蛋白质,缺乏与已知蛋白质的序列相似性。这些人被称为“孤儿”。孤儿基因的比例因基因组而异,且与基因组大小无关。在本研究中,我们发现孤儿基因的比例与生物的分离指数(IIO)大致相关,IIO是本研究中引入的一个指标,它代表了通过序列相似性来衡量的给定基因组的分离程度。然而,在线性相关性方面存在异常基因组,由那些可能含有过量孤儿基因的基因组组成。对亲缘关系较近的菌株进行基因组序列比较发现,一些注释基因不保守,提示它们是偶然发生的orf。在密切相关的基因组中排除这些非保守的orf,提高了孤儿基因比例与IIO值之间的相关性。假设这种相关性在一般情况下成立,这种关系被用来估计基因组中“真正的”孤儿基因的数量。利用这一真实孤儿基因的定义,对16个基因组(包括5个古细菌的基因组)的过度赋值(如结构注释的百分比)引起的异常进行了校正。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信