An efficient algorithm to compute the minimum free energy of interacting nucleic acid strands

Ahmed Shalaby, Damien Woods
{"title":"An efficient algorithm to compute the minimum free energy of interacting nucleic acid strands","authors":"Ahmed Shalaby, Damien Woods","doi":"arxiv-2407.09676","DOIUrl":null,"url":null,"abstract":"The information-encoding molecules RNA and DNA form a combinatorially large\nset of secondary structures through nucleic acid base pairing. Thermodynamic\nprediction algorithms predict favoured, or minimum free energy (MFE), secondary\nstructures, and can assign an equilibrium probability to any structure via the\npartition function: a Boltzman-weighted sum over the set of secondary\nstructures. MFE is NP-hard in the presence pseudoknots, base pairings that\nviolate a restricted planarity condition. However, unpseudoknotted structures\nare amenable to dynamic programming: for a single DNA/RNA strand there are\npolynomial time algorithms for MFE and partition function. For multiple\nstrands, the problem is more complicated due to entropic penalties. Dirks et al\n[SICOMP Review; 2007] showed that for O(1) strands, with N bases, there is a\npolynomial time in N partition function algorithm, however their technique did\nnot generalise to MFE which they left open. We give the first polynomial time\n(O(N^4)) algorithm for unpseudoknotted multiple (O(1)) strand MFE, answering\nthe open problem from Dirks et al. The challenge lies in considering rotational\nsymmetry of secondary structures, a feature not immediately amenable to dynamic\nprogramming algorithms. Our proof has two main technical contributions: First,\na polynomial upper bound on the number of symmetric secondary structures to be\nconsidered when computing rotational symmetry penalties. Second, that bound is\nleveraged by a backtracking algorithm to find the MFE in an exponential space\nof contenders. Our MFE algorithm has the same asymptotic run time as Dirks et\nal's partition function algorithm, suggesting efficient handling of rotational\nsymmetry, although higher space complexity. It also seems reasonably tight in\nthe number of strands since Codon, Hajiaghayi & Thachuk [DNA27, 2021] have\nshown that unpseudoknotted MFE is NP-hard for O(N) strands.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.09676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The information-encoding molecules RNA and DNA form a combinatorially large set of secondary structures through nucleic acid base pairing. Thermodynamic prediction algorithms predict favoured, or minimum free energy (MFE), secondary structures, and can assign an equilibrium probability to any structure via the partition function: a Boltzman-weighted sum over the set of secondary structures. MFE is NP-hard in the presence pseudoknots, base pairings that violate a restricted planarity condition. However, unpseudoknotted structures are amenable to dynamic programming: for a single DNA/RNA strand there are polynomial time algorithms for MFE and partition function. For multiple strands, the problem is more complicated due to entropic penalties. Dirks et al [SICOMP Review; 2007] showed that for O(1) strands, with N bases, there is a polynomial time in N partition function algorithm, however their technique did not generalise to MFE which they left open. We give the first polynomial time (O(N^4)) algorithm for unpseudoknotted multiple (O(1)) strand MFE, answering the open problem from Dirks et al. The challenge lies in considering rotational symmetry of secondary structures, a feature not immediately amenable to dynamic programming algorithms. Our proof has two main technical contributions: First, a polynomial upper bound on the number of symmetric secondary structures to be considered when computing rotational symmetry penalties. Second, that bound is leveraged by a backtracking algorithm to find the MFE in an exponential space of contenders. Our MFE algorithm has the same asymptotic run time as Dirks et al's partition function algorithm, suggesting efficient handling of rotational symmetry, although higher space complexity. It also seems reasonably tight in the number of strands since Codon, Hajiaghayi & Thachuk [DNA27, 2021] have shown that unpseudoknotted MFE is NP-hard for O(N) strands.
计算相互作用核酸链最小自由能的高效算法
信息编码分子 RNA 和 DNA 通过核酸碱基配对形成了大量二级结构组合。热力学预测算法可以预测受青睐的二级结构或最小自由能(MFE),并通过分区函数(二级结构集合的波尔兹曼加权和)为任何结构分配平衡概率。如果存在伪碱基配对(pseudoknots),即违反受限平面性条件的碱基配对,则 MFE 是 NP-困难的。然而,无伪配位结构可用于动态编程:对于单条 DNA/RNA 链,有多项式时间的 MFE 和分割函数算法。对于多链来说,由于熵罚的存在,问题就更加复杂了。Dirks 等人[SICOMP Review; 2007]的研究表明,对于具有 N 个碱基的 O(1) 条链,存在 N 个分区函数的多项式时间算法,但他们的技术并没有推广到 MFE,这一点他们还没有解决。我们给出了第一个多项式时间(O(N^4))无伪节点多(O(1))链 MFE 算法,回答了 Dirks 等人提出的未决问题。我们的证明有两大技术贡献:首先,我们给出了计算旋转对称性惩罚时需要考虑的对称二级结构数量的多项式上限。其次,通过反向追踪算法利用该上界,在指数级的竞争者空间中找到 MFE。我们的 MFE 算法与 Dirks 等人的分割函数算法具有相同的渐进运行时间,表明虽然空间复杂度较高,但能有效处理旋转对称性。由于 Codon、Hajiaghayi 和 Thachuk [DNA27, 2021] 已经证明,对于 O(N) 条链,无伪节点 MFE 是 NP 难的,因此它在链数上也显得相当紧凑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信