{"title":"SparseRNAfolD:经过优化的稀疏 RNA 无假结折叠,并考虑了悬垂因素。","authors":"Mateo Gray, Sebastian Will, Hosna Jabbari","doi":"10.1186/s13015-024-00256-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mn>3</mn></msup> <mo>)</mo></mrow> </math> time and <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mn>2</mn></msup> <mo>)</mo></mrow> </math> space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements.</p><p><strong>Results: </strong>Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases.</p><p><strong>Conclusion: </strong>Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"19 1","pages":"9"},"PeriodicalIF":1.5000,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289965/pdf/","citationCount":"0","resultStr":"{\"title\":\"SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration.\",\"authors\":\"Mateo Gray, Sebastian Will, Hosna Jabbari\",\"doi\":\"10.1186/s13015-024-00256-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mn>3</mn></msup> <mo>)</mo></mrow> </math> time and <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mn>2</mn></msup> <mo>)</mo></mrow> </math> space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements.</p><p><strong>Results: </strong>Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases.</p><p><strong>Conclusion: </strong>Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.</p>\",\"PeriodicalId\":50823,\"journal\":{\"name\":\"Algorithms for Molecular Biology\",\"volume\":\"19 1\",\"pages\":\"9\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289965/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms for Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13015-024-00256-4\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-024-00256-4","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
动机通过自由能最小化计算 RNA 二级结构预测是分析 RNA 结构及其相互作用不可或缺的方法。这些方法能在指数级的众多可能结构中找到自由能(MFE)最小的结构,而且对于较长的 RNA 序列来说,其时间和空间复杂度都很有限(对于无假结结构来说,时间为 O ( n 3 ) ,空间为 O ( n 2 ) )。此外,精确的自由能计算(包括悬垂贡献)可能难以实现且成本高昂,尤其是在优化时间和空间要求时:结果:在此,我们介绍了一种快速高效的稀疏化 MFE 无伪缺口结构预测算法 SparseRNAFolD,该算法采用了精确的能量模型,考虑了悬垂贡献。虽然稀疏化技术以前曾被用于提高采用现实能量模型的无伪结结构预测方法 SparseMFEFold 的时间和空间复杂性,但由于计算复杂,它没有扩展到包括纠缠贡献。这可能会以预测精度为代价。在这项工作中,我们比较了三种不同的悬垂贡献稀疏化实现方法,并提供了每种方法的优缺点。此外,我们还将我们的算法与线性时空算法 LinearFold 进行了比较,发现在实际应用中,SparseRNAFolD 在所有长度的序列中都具有更低的内存消耗,而在长度不超过 1000 个碱基的序列中耗时更短:我们的 SparseRNAFolD 算法是一种基于 MFE 的算法,它保证了结果的最优性,并采用了最通用的能量模型,包括纠缠贡献。我们为在无伪节点模型中将当差应用于稀疏递归提供了基础,该模型有可能扩展到伪节点。
SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration.
Motivation: Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( time and space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements.
Results: Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases.
Conclusion: Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.
期刊介绍:
Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning.
Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms.
Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.