Beyond Identity- When Classical Homology Searching Fails, Why, and What you Can do About It

William C. Ray, H. Ozer, David W. Armbruster, C. Daniels
{"title":"Beyond Identity- When Classical Homology Searching Fails, Why, and What you Can do About It","authors":"William C. Ray, H. Ozer, David W. Armbruster, C. Daniels","doi":"10.1109/OCCBIO.2009.23","DOIUrl":null,"url":null,"abstract":"Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identi?ed interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are suf?ciently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow signi?- cant divergence. There is insuf?cient information in the base conservation to create useful scoring models. Using our tools we have identi?ed critical residue interdependencies within the endonuclease target that enable detection of introns in whole- genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each node’s Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows development of models of similar power for arbitrary RNA families.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identi?ed interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are suf?ciently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow signi?- cant divergence. There is insuf?cient information in the base conservation to create useful scoring models. Using our tools we have identi?ed critical residue interdependencies within the endonuclease target that enable detection of introns in whole- genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each node’s Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows development of models of similar power for arbitrary RNA families.
超越同一性——当经典同源搜索失败时,原因,以及你能做些什么
蛋白质和核酸序列的多序列比对是一种普遍存在的方法,用于模拟序列家族,遍及每个生物领域。尽管它们很有用,但msa和由此衍生的方法未能捕捉到对家庭成员和位置身份一样重要的中介关系。我们最近开发了新的方法,MAVL和StickWRLD,来量化和可视化序列族模型的附加特征,并确定了一个新的序列族模型。发现残差水平上的间接性依赖是许多序列家族成员关系的关键指标。其中一些依赖关系不能通过任何现有的建模方法来建模,包括隐马尔可夫模型。在某些情况下,依赖关系是足够的。很明显,所有的通用方法都将被明确排除在家族之外的序列作为比任何实际成员更好的候选。古细菌中的tRNA内含子内切酶靶标就是这样一个家族。最初的特征是从古细菌trna中切除的内含子,其中一些作为引导rna靶向核糖体rna的o -甲基化,这些序列具有非常短的特征特征,并允许信号?-不能散度。有保险吗?在基础上保存客户信息以创建有用的评分模型。使用我们的工具,我们有身份?在整个基因组序列中检测内含子的核酸内切酶靶内的关键残基相互依赖性。许多内含子出现在trna之外,包括一些从蛋白质mRNA中切除的内含子。我们识别的依赖关系对应于位置恒等式上的马尔可夫关系网络。每个节点的马尔可夫包层的贡献通过使用投票算法与位置守恒混合来合并。在本文中,我们介绍了这一分析的结果,并将我们的建模方法推广到任意RNA家族。这种概括使得开发具有类似能力的任意RNA家族模型成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信