William C. Ray, H. Ozer, David W. Armbruster, C. Daniels
{"title":"Beyond Identity- When Classical Homology Searching Fails, Why, and What you Can do About It","authors":"William C. Ray, H. Ozer, David W. Armbruster, C. Daniels","doi":"10.1109/OCCBIO.2009.23","DOIUrl":null,"url":null,"abstract":"Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identi?ed interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are suf?ciently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow signi?- cant divergence. There is insuf?cient information in the base conservation to create useful scoring models. Using our tools we have identi?ed critical residue interdependencies within the endonuclease target that enable detection of introns in whole- genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each node’s Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows development of models of similar power for arbitrary RNA families.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identi?ed interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are suf?ciently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow signi?- cant divergence. There is insuf?cient information in the base conservation to create useful scoring models. Using our tools we have identi?ed critical residue interdependencies within the endonuclease target that enable detection of introns in whole- genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each node’s Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows development of models of similar power for arbitrary RNA families.