Debra Knisley, Jeff Knisley, Chelsea Ross, Alissa Rockney
{"title":"Classifying multigraph models of secondary RNA structure using graph-theoretic descriptors.","authors":"Debra Knisley, Jeff Knisley, Chelsea Ross, Alissa Rockney","doi":"10.5402/2012/157135","DOIUrl":null,"url":null,"abstract":"<p><p>The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"157135"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5402/2012/157135","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISRN bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5402/2012/157135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures.
考虑到RNA分子在基因调控等生物过程中的重要性,从初级序列中预测次级RNA折叠仍然是一个重要的研究领域。为了促进这项工作,已经开发了二级结构的图模型来量化并从而表征二级褶皱的拓扑特性。在这项工作中,我们利用二级RNA结构的多图表示来检查现有图论描述符将所有可能的拓扑分类为RNA样或非RNA样的能力。我们使用了一百多个描述符和几种不同的机器学习方法,包括最近邻算法、单类分类器和几种聚类技术。我们预测,与目前在RAG (RNA- as - graphs)数据库中预测的相比,将有更多的拓扑被识别为代表RNA二级结构的拓扑。结果还表明,哪些描述符和算法在分类和探索二级RNA结构方面更有信息。