Algorithms for Molecular Biology最新文献

筛选
英文 中文
Treewidth-based algorithms for the small parsimony problem on networks. 基于树宽的网络小简约问题算法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-08-20 DOI: 10.1186/s13015-022-00216-w
Celine Scornavacca, Mathias Weller
{"title":"Treewidth-based algorithms for the small parsimony problem on networks.","authors":"Celine Scornavacca,&nbsp;Mathias Weller","doi":"10.1186/s13015-022-00216-w","DOIUrl":"https://doi.org/10.1186/s13015-022-00216-w","url":null,"abstract":"<p><strong>Background: </strong>Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the SMALL PARSIMONY problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component).</p><p><strong>Results: </strong>We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times [Formula: see text], [Formula: see text], and [Formula: see text], respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems.</p><p><strong>Conclusions: </strong>Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a \"good\" tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of \"agreeing trees\" on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network's treewidth as parameter more accessible to this audience.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"15"},"PeriodicalIF":1.0,"publicationDate":"2022-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9392953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40428950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Binning long reads in metagenomics datasets using composition and coverage information. 使用组合和覆盖信息对宏基因组数据集中的长读取进行分组。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-07-11 DOI: 10.1186/s13015-022-00221-z
Anuradha Wickramarachchi, Yu Lin
{"title":"Binning long reads in metagenomics datasets using composition and coverage information.","authors":"Anuradha Wickramarachchi,&nbsp;Yu Lin","doi":"10.1186/s13015-022-00221-z","DOIUrl":"https://doi.org/10.1186/s13015-022-00221-z","url":null,"abstract":"<p><strong>Background: </strong>Advancements in metagenomics sequencing allow the study of microbial communities directly from their environments. Metagenomics binning is a key step in the species characterisation of microbial communities. Next-generation sequencing reads are usually assembled into contigs for metagenomics binning mainly due to the limited information within short reads. Third-generation sequencing provides much longer reads that have lengths similar to the contigs assembled from short reads. However, existing contig-binning tools cannot be directly applied on long reads due to the absence of coverage information and the presence of high error rates. The few existing long-read binning tools either use only composition or use composition and coverage information separately. This may ignore bins that correspond to low-abundance species or erroneously split bins that correspond to species with non-uniform coverages. Here we present a reference-free binning approach, LRBinner, that combines composition and coverage information of complete long-read datasets. LRBinner also uses a distance-histogram-based clustering algorithm to extract clusters with varying sizes.</p><p><strong>Results: </strong>The experimental results on both simulated and real datasets show that LRBinner achieves the best binning accuracy in most cases while handling the complete datasets without any sampling. Moreover, we show that binning reads using LRBinner prior to assembly reduces computational resources required for assembly while attaining satisfactory assembly qualities.</p><p><strong>Conclusion: </strong>LRBinner shows that deep-learning techniques can be used for effective feature aggregation to support the metagenomics binning of long reads. Furthermore, accurate binning of long reads supports improvements in metagenomics assembly, especially in complex datasets. Binning also helps to reduce the resources required for assembly. Source code for LRBinner is freely available at https://github.com/anuradhawick/LRBinner.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"14"},"PeriodicalIF":1.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9277797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40587433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Embedding gene trees into phylogenetic networks by conflict resolution algorithms 通过冲突解决算法将基因树嵌入系统发育网络
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-05-19 DOI: 10.1186/s13015-022-00218-8
Marcin Wawerka, D. Dabkowski, Natalia Rutecka, Agnieszka Mykowiecka, P. Górecki
{"title":"Embedding gene trees into phylogenetic networks by conflict resolution algorithms","authors":"Marcin Wawerka, D. Dabkowski, Natalia Rutecka, Agnieszka Mykowiecka, P. Górecki","doi":"10.1186/s13015-022-00218-8","DOIUrl":"https://doi.org/10.1186/s13015-022-00218-8","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"76 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78686342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bi-alignments with affine gaps costs 具有仿射间隙的双对齐代价
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-05-16 DOI: 10.1186/s13015-022-00219-7
Peter F. Stadler, S. Will
{"title":"Bi-alignments with affine gaps costs","authors":"Peter F. Stadler, S. Will","doi":"10.1186/s13015-022-00219-7","DOIUrl":"https://doi.org/10.1186/s13015-022-00219-7","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82802988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adding hydrogen atoms to molecular models via fragment superimposition 通过片段叠加将氢原子添加到分子模型中
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-03-29 DOI: 10.1186/s13015-022-00215-x
Patrick Kunzmann, Jacob Marcel Anter, K. Hamacher
{"title":"Adding hydrogen atoms to molecular models via fragment superimposition","authors":"Patrick Kunzmann, Jacob Marcel Anter, K. Hamacher","doi":"10.1186/s13015-022-00215-x","DOIUrl":"https://doi.org/10.1186/s13015-022-00215-x","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"17 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65741668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Perplexity: evaluating transcript abundance estimation in the absence of ground truth. 困惑:在缺乏基本事实的情况下评估转录物丰度估计。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-03-25 DOI: 10.1186/s13015-022-00214-y
Jason Fan, Skylar Chan, Rob Patro
{"title":"Perplexity: evaluating transcript abundance estimation in the absence of ground truth.","authors":"Jason Fan,&nbsp;Skylar Chan,&nbsp;Rob Patro","doi":"10.1186/s13015-022-00214-y","DOIUrl":"https://doi.org/10.1186/s13015-022-00214-y","url":null,"abstract":"<p><strong>Background: </strong>There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best.</p><p><strong>Results: </strong>We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models.</p><p><strong>Conclusions: </strong>Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"6"},"PeriodicalIF":1.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8951746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40326298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parsimonious Clone Tree Integration in cancer 癌症中的简约克隆树整合
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-03-14 DOI: 10.1186/s13015-022-00209-9
P. Sashittal, Simone Zaccaria, M. El-Kebir
{"title":"Parsimonious Clone Tree Integration in cancer","authors":"P. Sashittal, Simone Zaccaria, M. El-Kebir","doi":"10.1186/s13015-022-00209-9","DOIUrl":"https://doi.org/10.1186/s13015-022-00209-9","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86681252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficiently sparse listing of classes of optimal cophylogeny reconciliations. 最优亲缘关系协调类的高效稀疏列表。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-02-15 DOI: 10.1186/s13015-022-00206-y
Yishu Wang, Arnaud Mary, Marie-France Sagot, Blerina Sinaimeri
{"title":"Efficiently sparse listing of classes of optimal cophylogeny reconciliations.","authors":"Yishu Wang,&nbsp;Arnaud Mary,&nbsp;Marie-France Sagot,&nbsp;Blerina Sinaimeri","doi":"10.1186/s13015-022-00206-y","DOIUrl":"https://doi.org/10.1186/s13015-022-00206-y","url":null,"abstract":"<p><strong>Background: </strong>Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist deal with the often huge space of optimal solutions.</p><p><strong>Results: </strong>In this paper, we propose a new approach to tackle this problem. We introduce three different criteria under which two solutions may be considered biologically equivalent, and then we propose polynomial-delay algorithms that enumerate only one representative per equivalence class (without listing all the solutions).</p><p><strong>Conclusions: </strong>Our results are of both theoretical and practical importance. Indeed, as shown by the experiments, we are able to significantly reduce the space of optimal solutions while still maintaining important biological information about the whole space.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"2"},"PeriodicalIF":1.0,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845303/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39788408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new 1.375-approximation algorithm for sorting by transpositions. 一种新的1.375-近似算法用于换位排序。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2022-01-15 DOI: 10.1186/s13015-022-00205-z
Luiz Augusto G Silva, Luis Antonio B Kowada, Noraí Romeu Rocco, Maria Emília M T Walter
{"title":"A new 1.375-approximation algorithm for sorting by transpositions.","authors":"Luiz Augusto G Silva,&nbsp;Luis Antonio B Kowada,&nbsp;Noraí Romeu Rocco,&nbsp;Maria Emília M T Walter","doi":"10.1186/s13015-022-00205-z","DOIUrl":"https://doi.org/10.1186/s13015-022-00205-z","url":null,"abstract":"<p><strong>Background: </strong>SORTING BY TRANSPOSITIONS (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be [Formula: see text]-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation [Formula: see text] into a simple permutation [Formula: see text], presumably easier to handle with. The permutation [Formula: see text] is obtained by inserting new symbols into [Formula: see text] in a way that the lower bound of the transposition distance of [Formula: see text] is kept on [Formula: see text]. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting [Formula: see text] can be mimicked to sort [Formula: see text].</p><p><strong>Results and conclusions: </strong>First, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all [Formula: see text]. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all [Formula: see text]. We implemented our algorithm and EH's. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, [Formula: see text]. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"1"},"PeriodicalIF":1.0,"publicationDate":"2022-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8760837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39913478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes. 一种考虑基因顺序和基因间大小的反转和转位距离的改进近似算法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-12-29 DOI: 10.1186/s13015-021-00203-7
Klairton L Brito, Andre R Oliveira, Alexsandro O Alexandrino, Ulisses Dias, Zanoni Dias
{"title":"An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes.","authors":"Klairton L Brito,&nbsp;Andre R Oliveira,&nbsp;Alexsandro O Alexandrino,&nbsp;Ulisses Dias,&nbsp;Zanoni Dias","doi":"10.1186/s13015-021-00203-7","DOIUrl":"https://doi.org/10.1186/s13015-021-00203-7","url":null,"abstract":"<p><strong>Background: </strong>In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome.</p><p><strong>Results and conclusions: </strong>In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"24"},"PeriodicalIF":1.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8717661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39773174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信