Anton I. Petrov, Jesse Stombaugh, Craig L. Zirbel, N. Leontis
{"title":"Understanding Sequence Variability of RNA Motifs Using Geometric Search and IsoDiscrepancy Matrices","authors":"Anton I. Petrov, Jesse Stombaugh, Craig L. Zirbel, N. Leontis","doi":"10.1109/OCCBIO.2009.15","DOIUrl":null,"url":null,"abstract":"Many of the nominally single-stranded hairpin, internal, and junction “loop” regions of RNA secondary structures, in fact, form uniquely folded 3D motifs. These elements are largely structured by non-Watson-Crick basepairs. Many 3D motifs are recurrent, meaning they occur in different RNAs. Recurrent motifs have the same 3D structure but not necessarily the same sequence. We describe a methodology for identifying the sequence variability of a given recurrent RNA internal loop that can be generalized to hairpin and junction loops. Since the database of RNA 3D structures now contains a significant number of biologically active, structured RNAs, including ribosomal RNAs, ribozymes, and riboswitches, we can directly observe some of the sequence variability for recurrent motifs in x-ray crystal structures. We use our search program, FR3D, to search the 3D structure database for geometrically similar motif instances that share the same spatial pattern of basepairs. We apply our analysis of RNA basepair isostericity and occurrence frequencies to suggest likely basepair substitutions. We use the IsoDiscrepancy Index (IDI), which we recently introduced to quantify basepair isostericities, to derive 4x4 IDI Tables for each base combination in each basepair family. We illustrate how these tables can be applied to predict the most likely base substitutions that occur in a 3D motif. By comparing observed motif instances, we also determine the most likely locations of inserted (\"bulged\") nucleotides. We compare the predictions from these considerations to observed variability in multiple sequence alignments of the motif.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Many of the nominally single-stranded hairpin, internal, and junction “loop” regions of RNA secondary structures, in fact, form uniquely folded 3D motifs. These elements are largely structured by non-Watson-Crick basepairs. Many 3D motifs are recurrent, meaning they occur in different RNAs. Recurrent motifs have the same 3D structure but not necessarily the same sequence. We describe a methodology for identifying the sequence variability of a given recurrent RNA internal loop that can be generalized to hairpin and junction loops. Since the database of RNA 3D structures now contains a significant number of biologically active, structured RNAs, including ribosomal RNAs, ribozymes, and riboswitches, we can directly observe some of the sequence variability for recurrent motifs in x-ray crystal structures. We use our search program, FR3D, to search the 3D structure database for geometrically similar motif instances that share the same spatial pattern of basepairs. We apply our analysis of RNA basepair isostericity and occurrence frequencies to suggest likely basepair substitutions. We use the IsoDiscrepancy Index (IDI), which we recently introduced to quantify basepair isostericities, to derive 4x4 IDI Tables for each base combination in each basepair family. We illustrate how these tables can be applied to predict the most likely base substitutions that occur in a 3D motif. By comparing observed motif instances, we also determine the most likely locations of inserted ("bulged") nucleotides. We compare the predictions from these considerations to observed variability in multiple sequence alignments of the motif.