Algorithms for Molecular Biology最新文献

筛选
英文 中文
Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features. 比较在代谢超图上嵌入方法捕捉基于分类的特征的能力。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-04-29 DOI: 10.1186/s13015-026-00298-w
Mattia Cervellini, Blerina Sinaimeri, Catherine Matias, Alessio Martino
{"title":"Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.","authors":"Mattia Cervellini, Blerina Sinaimeri, Catherine Matias, Alessio Martino","doi":"10.1186/s13015-026-00298-w","DOIUrl":"https://doi.org/10.1186/s13015-026-00298-w","url":null,"abstract":"<p><strong>Background: </strong>Metabolic networks are complex systems that describe the biochemical reactions within an organism through pairwise interactions between chemical compounds. While this representation is widely used to study biological function, it fails to capture the full structure of metabolic reactions, many of which involve more than two compounds. Hypergraphs offer a more natural representation, where nodes represent metabolites and hyperedges represent reactions involving multiple participants. Clustering such metabolic hypergraphs can reveal systematic differences among evolutionarily distinct organisms, providing insight into ecological constraints and evolutionary pressures.</p><p><strong>Methods: </strong>In this study, we investigate how different graphs and hypergraphs embedding methods influence their unsupervised clustering, with the goal of capturing taxonomy-based classes. We apply 14 distinct embedding strategies to a large-scale dataset of 8467 metabolic hypergraphs. Each embedding was followed by hierarchical clustering using a fixed linkage method. To assess performance, we compared the resulting clusters against known taxonomic groupings.</p><p><strong>Results: </strong>Our findings show that the choice of hypergraph embedding has a significant effect on clustering outcomes. Among the tested methods, Bag of Hyperedges with Jaccard distance, Histogram Cosine Kernel, and a Hypergraph Auto-Encoder consistently performed best. We also advocate that the embedding method should be chosen based on the goal of the downstream task.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147788014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pattern matching with Elastic-Degenerate strings and Elastic-Founder graphs. 弹性简并字符串和弹性方正图的模式匹配。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-04-28 DOI: 10.1186/s13015-025-00289-3
Rocco Ascone, Giulia Bernardini, Alessio Conte, Massimo Equi, Esteban Gabory, Roberto Grossi, Nadia Pisanti
{"title":"Pattern matching with Elastic-Degenerate strings and Elastic-Founder graphs.","authors":"Rocco Ascone, Giulia Bernardini, Alessio Conte, Massimo Equi, Esteban Gabory, Roberto Grossi, Nadia Pisanti","doi":"10.1186/s13015-025-00289-3","DOIUrl":"https://doi.org/10.1186/s13015-025-00289-3","url":null,"abstract":"<p><p>Elastic Degenerate (ED) strings and Elastic Founder (EF) graphs, here collectively named variable strings, are two representations of acyclic components of pangenomes which extend the well-known notion of indeterminate string. Recent studies have focused extensively on algorithmic tasks involving these structures and other forms of variable strings that they generalize. Among such tasks, the basic operation of matching a pattern into a text, a fundamental toolkit for pangenomic data analysis, deserves special attention. In this paper, (1) we establish a clear taxonomy across ED strings and EF graphs, categorizing types of variable strings from the simplest linear (solid) string to the most complex general cases; (2) we consider the problem MATCH(X,Y) of matching a solid or variable pattern of type X into a variable text of type Y, and investigate its time complexity when X and Y are chosen from types of variable strings in the taxonomy of (1). For all possible X and Y, we either provide a non-trivial, often sub-quadratic, upper bound for MATCH(X,Y), or we prove a quadratic conditional lower bound, taking as a reference the existing quadratic conditional lower bounds for MATCH(SOLID,ED) and MATCH(SOLID,EF). A preliminary version of this work appeared in [Ascone et al., WABI 2024].</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147788212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient algorithm for exploring RNA branching conformations under the nearest-neighbor thermodynamic model. 在最近邻热力学模型下探索RNA分支构象的有效算法。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-03-28 DOI: 10.1186/s13015-025-00296-4
Svetlana Poznanović, Owen Cardwell, Christine Heitsch
{"title":"An efficient algorithm for exploring RNA branching conformations under the nearest-neighbor thermodynamic model.","authors":"Svetlana Poznanović, Owen Cardwell, Christine Heitsch","doi":"10.1186/s13015-025-00296-4","DOIUrl":"10.1186/s13015-025-00296-4","url":null,"abstract":"<p><strong>Background: </strong>In the Nearest-Neighbor Thermodynamic Model, a standard approach for RNA secondary structure prediction, the energy of the multiloops is modeled using a linear entropic penalty governed by three branching parameters. Although these parameters are typically fixed, recent work has shown that reparametrizing the multiloop score and considering alternative branching conformations can lead to significantly better structure predictions. However, prior approaches for exploring the alternative branching structures were computationally inefficient for long sequences.</p><p><strong>Results: </strong>We present a novel algorithm that partitions the parameter space, identifying all distinct branching structures (optimal under different branching parameters) for a given RNA sequence using the fewest possible minimum free energy computations. Our method efficiently computes the full parameter-space partition and the associated optimal structures, enabling a comprehensive evaluation of the structural landscape across parameter choices. We apply this algorithm to the Archive II benchmarking dataset, assessing the maximum attainable prediction accuracy for each sequence under the reparameterized multiloop model. We find that the potential for improvement over default predictions is substantial in many cases, and that the optimal prediction accuracy is highly sensitive to auxiliary modeling decisions, such as the treatment of lonely base pairs and dangling ends.</p><p><strong>Conclusion: </strong>Our results support the hypothesis that the conventional choice of multiloop parameters may limit prediction accuracy and that exploring alternative parameterizations is both tractable and worthwhile. The efficient partitioning algorithm we introduce makes this exploration feasible for longer sequences and larger datasets. Furthermore, we identify several open challenges in identifying the optimal structure.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13151262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147534046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring and summarizing tumor phylogenies from bulk DNA data. 从大量DNA数据推断和总结肿瘤系统发育。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-02-18 DOI: 10.1186/s13015-025-00295-5
Yuanyuan Qi, Henri Schmidt, Mohammed El-Kebir
{"title":"Inferring and summarizing tumor phylogenies from bulk DNA data.","authors":"Yuanyuan Qi, Henri Schmidt, Mohammed El-Kebir","doi":"10.1186/s13015-025-00295-5","DOIUrl":"10.1186/s13015-025-00295-5","url":null,"abstract":"<p><strong>Background: </strong>Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies.</p><p><strong>Results: </strong>To resolve this challenge, we introduce Sapling, a method to solve two variants of the BACKBONE TREE INFERENCE FROM READS problem, which seeks a small set of backbone trees on a subset of mutations that collectively summarize the space of plausible cancer phylogenies. We prove that the problems are NP-hard.</p><p><strong>Conclusions: </strong>On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the space of plausible cancer phylogenies. In addition, we demonstrate that Sapling is able to infer full-size trees with higher likelihoods than state-of-the-art methods.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13020214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146220997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orientability of undirected phylogenetic networks to a desired class: practical algorithms and application to tree-child orientation. 无向系统发育网络对期望类的可定向性:实用算法和面向树子的应用。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-02-05 DOI: 10.1186/s13015-025-00282-w
Tsuyoshi Urata, Manato Yokoyama, Haruki Miyaji, Momoko Hayamizu
{"title":"Orientability of undirected phylogenetic networks to a desired class: practical algorithms and application to tree-child orientation.","authors":"Tsuyoshi Urata, Manato Yokoyama, Haruki Miyaji, Momoko Hayamizu","doi":"10.1186/s13015-025-00282-w","DOIUrl":"10.1186/s13015-025-00282-w","url":null,"abstract":"<p><p>The <math><mi>C</mi></math> -ORIENTATION problem asks whether it is possible to orient an undirected graph to a directed phylogenetic network of a desired network class <math><mi>C</mi></math> . This problem arises, for example, when visualising evolutionary data, as popular methods such as Neighbor-Net are distance-based and inevitably produce undirected graphs. The complexity of <math><mi>C</mi></math> -ORIENTATION remains open for many classes <math><mi>C</mi></math> , including binary tree-child networks, and practical methods are still lacking. In this paper, we propose (1) an exact FPT algorithm for <math><mi>C</mi></math> -ORIENTATION, applicable to any class <math><mi>C</mi></math> admitting a tractable membership test, and parameterised by the reticulation number and the maximum size of minimal basic cycles, and (2) a very fast heuristic for TREE-CHILD ORIENTATION. While the state-of-the-art for <math><mi>C</mi></math> -ORIENTATION is a simple exponential time algorithm whose computational bottleneck lies in searching for appropriate reticulation vertex placements, our methods significantly reduce this search space. Experiments show that, although our FPT algorithm is still exponential, it significantly outperforms the existing method. The heuristic runs even faster but with increasing false negatives as the reticulation number grows. Given this trade-off, we also discuss theoretical directions for improvement and biological applicability of the heuristic approach.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"21 1","pages":"2"},"PeriodicalIF":1.7,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing double-pushout graph transformation rules and atom-to-atom maps from KEGG RCLASS data. 从KEGG RCLASS数据计算双推图转换规则和原子到原子映射。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-01-29 DOI: 10.1186/s13015-025-00294-6
Nora Beier, Thomas Gatter, Jakob L Andersen, Peter F Stadler
{"title":"Computing double-pushout graph transformation rules and atom-to-atom maps from KEGG RCLASS data.","authors":"Nora Beier, Thomas Gatter, Jakob L Andersen, Peter F Stadler","doi":"10.1186/s13015-025-00294-6","DOIUrl":"10.1186/s13015-025-00294-6","url":null,"abstract":"<p><strong>Background: </strong>Atom-to-atom maps play an important role in many applications. However, they are often difficult to obtain. The KEGG reaction database does not provide atom-to-atom maps for its reactions and instead offers a description of local changes for pairs of reactant and product molecules in terms of so-called RCLASSes. Developed for classification purposes, RCLASS data are difficult to use for purposes such as the construction of atom-to-atom maps or reaction rules. DPO graph transformation rules, on the other hand, work as a convenient and efficient representation, particularly for these applications. The RCLASS data can be understood as collections of local graph patterns in the reactants and products of a reaction, together with partial correspondences of atoms. The problem of converting RCLASS data into DPO rules, therefore, is a special case of the graph reconstruction problem, which consists of inferring a graph from a collection of subgraphs.</p><p><strong>Results: </strong>We developed laveau, a tool that computes explicit DPO rules from KEGG reactions and RCLASS data. The algorithm proceeds stepwise, starting with a translation of individual RDM codes, specifically developed by the KEGG database, into equivalent RDM pattern graphs. Multiple RDM pattern graphs for the same RCLASS are then combined based on their embeddings into the reactant and product molecules, observing certain consistency conditions. In the final step, these combined pairwise patterns are merged into a pair of subgraphs of reactants and products, respectively. If RCLASSes connecting all pairs of reactant and product molecules are available, the complete reaction center(s) is/are contained in the union of these subgraphs. The atom-to-atom map inherited from the RDM codes then defines a DPO transformation rule. Application of these rules to the reactants then yields complete atom-to-atom maps (AAMs). Starting from 3195 RCLASSes, laveau generates a total of 1232 DPO rules and 1594 AAMs.</p><p><strong>Conclusions: </strong>The laveau software makes it possible to extract local atom-to-atom maps from the RCLASSes of the KEGG database, covering a large set of enzyme-catalyzed reactions. The results are made available in the form of DPO rules for use in atom-level models of metabolic networks, filling a crucial gap in the available data.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12949509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving spliced alignment by modeling splice sites with deep learning. 利用深度学习对剪接位点进行建模,改善剪接比对。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2026-01-02 DOI: 10.1186/s13015-025-00293-7
Siying Yang, Neng Huang, Heng Li
{"title":"Improving spliced alignment by modeling splice sites with deep learning.","authors":"Siying Yang, Neng Huang, Heng Li","doi":"10.1186/s13015-025-00293-7","DOIUrl":"10.1186/s13015-025-00293-7","url":null,"abstract":"<p><strong>Motivation: </strong>Spliced alignment refers to the alignment of messenger RNA (mRNA) or protein sequences to eukaryotic genomes. It plays a critical role in gene annotation and the study of gene functions. Accurate spliced alignment demands sophisticated modeling of splice sites, but current aligners use simple models, which may affect their accuracy given dissimilar sequences.</p><p><strong>Results: </strong>We implemented minisplice to learn splice signals with a one-dimensional convolutional neural network (1D-CNN) and trained a model with 7026 parameters for vertebrate and insect genomes. It captures conserved splice signals across phyla and reveals GC-rich introns specific to mammals and birds. We used this model to estimate the empirical splicing probability for every GT and AG in genomes, and modified minimap2 and miniprot to leverage pre-computed splicing probability during alignment. Evaluation on human long-read RNA-seq data and cross-species protein datasets showed our method greatly improves the junction accuracy especially for noisy long RNA-seq reads and proteins of distant homology.</p><p><strong>Availability and implementation: </strong>https://github.com/lh3/minisplice.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"1"},"PeriodicalIF":1.7,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12766944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145896893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engineering rank queries on bit vectors and strings. 对位向量和字符串进行排序查询。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-12-11 DOI: 10.1186/s13015-025-00291-9
Simon Gene Gottlieb, Knut Reinert
{"title":"Engineering rank queries on bit vectors and strings.","authors":"Simon Gene Gottlieb, Knut Reinert","doi":"10.1186/s13015-025-00291-9","DOIUrl":"10.1186/s13015-025-00291-9","url":null,"abstract":"<p><p>Adding rank support to strings over a fixed-sized alphabet has numerous applications. Prominent among those is the (bidirectional) FM-Index which is commonly utilized to index and analyze genomic data. At its core lies the rank operation on the Burrows-Wheeler-Transform (BWT) which, given a position in the BWT and a character, answers how often the specified character appears from the start to that position. Implementing those rank queries is usually based on bit vectors with rank support. In this work, we discuss three implementation improvements. First, a novel approach named paired-blocks that reduces the space overhead of the support structure by half to a total of only <math><mrow><mn>1.6</mn> <mo>%</mo></mrow> </math> . Second, a method for masking bits for the population count (also known as popcount) which greatly improves the runtime of 512-bit wide blocks in conjunction with AVX512 SIMD extensions. Third, a revised method for EPR-dictionaries (Pockrandt et al. in International conference on research in computational molecular biology. Springer, New York, 2017) called flattened bit vectors (fBV) with less space consumption and faster rank operations on strings, which is competitive in size and depending on the parameters between <math><mrow><mn>2</mn> <mo>×</mo></mrow> </math> and <math><mrow><mn>9</mn> <mo>×</mo></mrow> </math> faster than Wavelet Trees (Gog et al. in 13th International Symposium on Experimental Algorithms. Springer, New York, 2014).</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"21"},"PeriodicalIF":1.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12703928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA triplet repeats: improved algorithms for structure prediction and interactions. Rna三联体重复:结构预测和相互作用的改进算法。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-12-10 DOI: 10.1186/s13015-025-00292-8
Kimon Boehmer, Sarah J Berkemer, Sebastian Will, Yann Ponty
{"title":"RNA triplet repeats: improved algorithms for structure prediction and interactions.","authors":"Kimon Boehmer, Sarah J Berkemer, Sebastian Will, Yann Ponty","doi":"10.1186/s13015-025-00292-8","DOIUrl":"10.1186/s13015-025-00292-8","url":null,"abstract":"<p><p>RNAs composed of Triplet Repeats (TR) have recently attracted much attention in the field of synthetic biology. We study the mimimum free energy (MFE) secondary structures of such RNAs and give improved algorithms to compute the MFE and the partition function. Furthermore, we study the interaction of multiple RNAs and design a new algorithm for computing MFE and partition function for RNA-RNA interactions, improving the previously known factorial running time to exponential. In the case of TR, we show computational hardness but still obtain a parameterized algorithm. Finally, we propose a polynomial-time algorithm for computing interactions from a base set of RNA strands and conduct experiments on the interaction of TR based on this algorithm. For instance, we study the probability that a base pair is formed between two strands with the same triplet pattern, allowing an assessment of a notion of orthogonality between TR.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13014723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA inverse folding can be solved in linear time for structures without isolated stacks or base pairs. RNA逆折叠可以在线性时间内解决无孤立堆栈或碱基对的结构。
IF 1.7 4区 生物学
Algorithms for Molecular Biology Pub Date : 2025-10-24 DOI: 10.1186/s13015-025-00278-6
Théo Boury, Samuel Gardelle, Laurent Bulteau, Yann Ponty
{"title":"RNA inverse folding can be solved in linear time for structures without isolated stacks or base pairs.","authors":"Théo Boury, Samuel Gardelle, Laurent Bulteau, Yann Ponty","doi":"10.1186/s13015-025-00278-6","DOIUrl":"10.1186/s13015-025-00278-6","url":null,"abstract":"<p><p>Inverse folding is a classic instance of negative RNA design which consists in finding a sequence that uniquely folds into a target secondary structure with respect to energy minimization. A breakthrough result of Bonnet et al. shows that, even in simple base pairs-based (BP) models, the decision version of a mildly constrained version of inverse folding is NP-hard. In this work, we show that inverse folding can be solved in linear time for a large collection of targets, including every structure that contains no isolated BP and no isolated stack (or, equivalently, when all helices consist of <math><msup><mn>3</mn> <mo>+</mo></msup> </math> base pairs). For structures featuring shorter helices, our linear algorithm is no longer guaranteed to produce a solution, but still does so for a large proportion of instances. Our approach introduces a notion of modulo m-separability, generalizing a property pioneered by Hales et al. Separability is a sufficient condition for the existence of a solution to the inverse folding problem. We show that, for any input secondary structure of length n, a modulo m-separated sequence can be produced in time <math><mrow><mi>O</mi> <mo>(</mo> <mi>n</mi> <mspace></mspace> <mi>m</mi> <mspace></mspace> <msup><mn>2</mn> <mi>m</mi></msup> <mo>)</mo></mrow> </math> anytime such a sequence exists. Meanwhile, we show that any structure consisting of <math><msup><mn>3</mn> <mo>+</mo></msup> </math> base pairs is either trivially non-designable, or always admits a modulo-2 separated solution. Solution sequences can thus be produced in linear time, and even be uniformly generated within the set of modulo-2 separable sequences.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"20"},"PeriodicalIF":1.7,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12553252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书