Algorithms for Molecular Biology最新文献_第5页

Automated design of dynamic programming schemes for RNA folding with pseudoknots. RNA伪结折叠动态规划方案的自动设计。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2023-12-01 DOI: 10.1186/s13015-023-00229-z

Bertrand Marchand, Sebastian Will, Sarah J Berkemer, Yann Ponty, Laurent Bulteau

{"title":"Automated design of dynamic programming schemes for RNA folding with pseudoknots.","authors":"Bertrand Marchand, Sebastian Will, Sarah J Berkemer, Yann Ponty, Laurent Bulteau","doi":"10.1186/s13015-023-00229-z","DOIUrl":"10.1186/s13015-023-00229-z","url":null,"abstract":"Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes. In contrast, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. For this purpose, we formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the treewidth tw of the fatgraph, and its output represents a [Formula: see text] algorithm (and even possibly [Formula: see text] in simple energy models) for predicting the MFE folding of an RNA of length n. We demonstrate, for the most common pseudoknot classes, that our automatically generated algorithms achieve the same complexities as reported in the literature for hand-crafted schemes. Our framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"18"},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138471179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

New algorithms for structure informed genome rearrangement. 结构信息基因组重排的新算法。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2023-12-01 DOI: 10.1186/s13015-023-00239-x

Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson

{"title":"New algorithms for structure informed genome rearrangement.","authors":"Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson","doi":"10.1186/s13015-023-00239-x","DOIUrl":"10.1186/s13015-023-00239-x","url":null,"abstract":"We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"17"},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Relative timing information and orthology in evolutionary scenarios. 进化场景中的相对时序信息和正交性。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2023-11-08 DOI: 10.1186/s13015-023-00240-4

David Schaller, Tom Hartmann, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth

{"title":"Relative timing information and orthology in evolutionary scenarios.","authors":"David Schaller, Tom Hartmann, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth","doi":"10.1186/s13015-023-00240-4","DOIUrl":"10.1186/s13015-023-00240-4","url":null,"abstract":"Background: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph.Results: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"16"},"PeriodicalIF":1.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71523304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On a greedy approach for genome scaffolding. 贪婪的基因组支架方法。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-10-29 DOI: 10.1186/s13015-022-00223-x

Tom Davot, Annie Chateau, Rohan Fossé, Rodolphe Giroudeau, Mathias Weller

引用次数: 0

Treewidth-based algorithms for the small parsimony problem on networks. 基于树宽的网络小简约问题算法。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-08-20 DOI: 10.1186/s13015-022-00216-w

Celine Scornavacca, Mathias Weller

{"title":"Treewidth-based algorithms for the small parsimony problem on networks.","authors":"Celine Scornavacca, Mathias Weller","doi":"10.1186/s13015-022-00216-w","DOIUrl":"https://doi.org/10.1186/s13015-022-00216-w","url":null,"abstract":"Background: Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the SMALL PARSIMONY problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component).Results: We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times [Formula: see text], [Formula: see text], and [Formula: see text], respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems.Conclusions: Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a \"good\" tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of \"agreeing trees\" on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network's treewidth as parameter more accessible to this audience.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"15"},"PeriodicalIF":1.0,"publicationDate":"2022-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9392953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40428950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Binning long reads in metagenomics datasets using composition and coverage information. 使用组合和覆盖信息对宏基因组数据集中的长读取进行分组。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-07-11 DOI: 10.1186/s13015-022-00221-z

Anuradha Wickramarachchi, Yu Lin

{"title":"Binning long reads in metagenomics datasets using composition and coverage information.","authors":"Anuradha Wickramarachchi, Yu Lin","doi":"10.1186/s13015-022-00221-z","DOIUrl":"https://doi.org/10.1186/s13015-022-00221-z","url":null,"abstract":"Background: Advancements in metagenomics sequencing allow the study of microbial communities directly from their environments. Metagenomics binning is a key step in the species characterisation of microbial communities. Next-generation sequencing reads are usually assembled into contigs for metagenomics binning mainly due to the limited information within short reads. Third-generation sequencing provides much longer reads that have lengths similar to the contigs assembled from short reads. However, existing contig-binning tools cannot be directly applied on long reads due to the absence of coverage information and the presence of high error rates. The few existing long-read binning tools either use only composition or use composition and coverage information separately. This may ignore bins that correspond to low-abundance species or erroneously split bins that correspond to species with non-uniform coverages. Here we present a reference-free binning approach, LRBinner, that combines composition and coverage information of complete long-read datasets. LRBinner also uses a distance-histogram-based clustering algorithm to extract clusters with varying sizes.Results: The experimental results on both simulated and real datasets show that LRBinner achieves the best binning accuracy in most cases while handling the complete datasets without any sampling. Moreover, we show that binning reads using LRBinner prior to assembly reduces computational resources required for assembly while attaining satisfactory assembly qualities.Conclusion: LRBinner shows that deep-learning techniques can be used for effective feature aggregation to support the metagenomics binning of long reads. Furthermore, accurate binning of long reads supports improvements in metagenomics assembly, especially in complex datasets. Binning also helps to reduce the resources required for assembly. Source code for LRBinner is freely available at https://github.com/anuradhawick/LRBinner.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"14"},"PeriodicalIF":1.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9277797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40587433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Embedding gene trees into phylogenetic networks by conflict resolution algorithms 通过冲突解决算法将基因树嵌入系统发育网络

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-05-19 DOI: 10.1186/s13015-022-00218-8

Marcin Wawerka, D. Dabkowski, Natalia Rutecka, Agnieszka Mykowiecka, P. Górecki

引用次数: 2

Bi-alignments with affine gaps costs 具有仿射间隙的双对齐代价

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-05-16 DOI: 10.1186/s13015-022-00219-7

Peter F. Stadler, S. Will

引用次数: 1

Adding hydrogen atoms to molecular models via fragment superimposition 通过片段叠加将氢原子添加到分子模型中

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-03-29 DOI: 10.1186/s13015-022-00215-x

Patrick Kunzmann, Jacob Marcel Anter, K. Hamacher

引用次数: 2

Perplexity: evaluating transcript abundance estimation in the absence of ground truth. 困惑:在缺乏基本事实的情况下评估转录物丰度估计。

IF 1 4区生物学

Algorithms for Molecular Biology Pub Date : 2022-03-25 DOI: 10.1186/s13015-022-00214-y

Jason Fan, Skylar Chan, Rob Patro

{"title":"Perplexity: evaluating transcript abundance estimation in the absence of ground truth.","authors":"Jason Fan, Skylar Chan, Rob Patro","doi":"10.1186/s13015-022-00214-y","DOIUrl":"https://doi.org/10.1186/s13015-022-00214-y","url":null,"abstract":"Background: There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best.Results: We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models.Conclusions: Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"6"},"PeriodicalIF":1.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8951746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40326298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0