Algorithms for Molecular Biology最新文献

筛选
英文 中文
Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy. 约束增量树构建:新的绝对快速收敛系统发育估计方法,提高了可扩展性和准确性。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2019-02-06 eCollection Date: 2019-01-01 DOI: 10.1186/s13015-019-0136-9
Qiuyi Zhang, Satish Rao, Tandy Warnow
{"title":"Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.","authors":"Qiuyi Zhang,&nbsp;Satish Rao,&nbsp;Tandy Warnow","doi":"10.1186/s13015-019-0136-9","DOIUrl":"https://doi.org/10.1186/s13015-019-0136-9","url":null,"abstract":"<p><strong>Background: </strong>Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was <math><mrow><mi>D</mi> <mi>C</mi> <msub><mi>M</mi> <mrow><mi>NJ</mi></mrow> </msub> <mo>,</mo></mrow> </math> published in SODA 2001. The main empirical advantage of <math> <msub><mrow><mi>DCM</mi></mrow> <mrow><mi>NJ</mi></mrow> </msub> </math> over other AFC methods is its use of neighbor joining (<i>NJ</i>) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, <math> <msub><mrow><mi>DCM</mi></mrow> <mrow><mi>NJ</mi></mrow> </msub> </math> is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy.</p><p><strong>Results: </strong>In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of <math> <msub><mrow><mi>DCM</mi></mrow> <mrow><mi>NJ</mi></mrow> </msub> </math> but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time and space. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"2"},"PeriodicalIF":1.0,"publicationDate":"2019-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-019-0136-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37204080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach. 类药物分子的部分原子电荷自动分配:快速背包方法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2019-02-05 eCollection Date: 2019-01-01 DOI: 10.1186/s13015-019-0138-7
Martin S Engler, Bertrand Caron, Lourens Veen, Daan P Geerke, Alan E Mark, Gunnar W Klau
{"title":"Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach.","authors":"Martin S Engler,&nbsp;Bertrand Caron,&nbsp;Lourens Veen,&nbsp;Daan P Geerke,&nbsp;Alan E Mark,&nbsp;Gunnar W Klau","doi":"10.1186/s13015-019-0138-7","DOIUrl":"https://doi.org/10.1186/s13015-019-0138-7","url":null,"abstract":"<p><p>A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly <math><mi>NP</mi></math> -complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We demonstrate that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule from hours or even days to below a second. Our software is openly available.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":" ","pages":"1"},"PeriodicalIF":1.0,"publicationDate":"2019-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-019-0138-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37030172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments. Regmex:用于从基因组学实验中探索排序序列列表中的基序的统计工具。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-12-08 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0135-2
Morten Muhlig Nielsen, Paula Tataru, Tobias Madsen, Asger Hobolth, Jakob Skou Pedersen
{"title":"Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.","authors":"Morten Muhlig Nielsen,&nbsp;Paula Tataru,&nbsp;Tobias Madsen,&nbsp;Asger Hobolth,&nbsp;Jakob Skou Pedersen","doi":"10.1186/s13015-018-0135-2","DOIUrl":"https://doi.org/10.1186/s13015-018-0135-2","url":null,"abstract":"<p><strong>Background: </strong>Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.</p><p><strong>Methods: </strong>We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic <i>p</i> value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.</p><p><strong>Results: </strong>We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.</p><p><strong>Conclusions: </strong>Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"17"},"PeriodicalIF":1.0,"publicationDate":"2018-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0135-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36831399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Superbubbles revisited. 超级气旋再现。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-12-01 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0134-3
Fabian Gärtner, Lydia Müller, Peter F Stadler
{"title":"Superbubbles revisited.","authors":"Fabian Gärtner,&nbsp;Lydia Müller,&nbsp;Peter F Stadler","doi":"10.1186/s13015-018-0134-3","DOIUrl":"https://doi.org/10.1186/s13015-018-0134-3","url":null,"abstract":"<p><strong>Background: </strong>Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770-777, 2015) achieves this task in <math><mrow><mi>O</mi> <mo>(</mo> <mi>m</mi> <mspace></mspace> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> <mo>)</mo></mrow> </math> -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374-383, 2016) resulting in an overall <math><mrow><mi>O</mi> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>)</mo></mrow> </math> -time algorithm.</p><p><strong>Results: </strong>A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient <math><mrow><mi>O</mi> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>)</mo></mrow> </math> -time algorithm for detecting superbubbles in DAGs that uses only simple data structures.</p><p><strong>Implementation: </strong>We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"16"},"PeriodicalIF":1.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0134-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36755754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improved de novo peptide sequencing using LC retention time information. 利用LC保留时间信息改进的从头肽测序。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-08-29 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0132-5
Yves Frank, Tomas Hruz, Thomas Tschager, Valentin Venzin
{"title":"Improved de novo peptide sequencing using LC retention time information.","authors":"Yves Frank,&nbsp;Tomas Hruz,&nbsp;Thomas Tschager,&nbsp;Valentin Venzin","doi":"10.1186/s13015-018-0132-5","DOIUrl":"https://doi.org/10.1186/s13015-018-0132-5","url":null,"abstract":"<p><strong>Background: </strong>Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence.</p><p><strong>Results: </strong>We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model.</p><p><strong>Conclusions: </strong>Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"14"},"PeriodicalIF":1.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0132-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36459456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sorting signed circular permutations by super short operations. 通过超短操作排序有符号循环排列。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-07-26 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0131-6
Andre R Oliveira, Guillaume Fertin, Ulisses Dias, Zanoni Dias
{"title":"Sorting signed circular permutations by super short operations.","authors":"Andre R Oliveira,&nbsp;Guillaume Fertin,&nbsp;Ulisses Dias,&nbsp;Zanoni Dias","doi":"10.1186/s13015-018-0131-6","DOIUrl":"https://doi.org/10.1186/s13015-018-0131-6","url":null,"abstract":"<p><strong>Background: </strong>One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or <i>genome rearrangements</i>, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form <math><mrow><mi>π</mi><mo>=</mo><mo>(</mo><msub><mi>π</mi><mn>1</mn></msub><msub><mi>π</mi><mn>2</mn></msub><mo>…</mo><msub><mi>π</mi><mi>n</mi></msub><mo>)</mo></mrow></math> , and in that case we can consider without loss of generality that one of them is the identity permutation <math><mrow><msub><mi>ι</mi><mi>n</mi></msub><mo>=</mo><mrow><mo>(</mo><mn>12</mn><mo>…</mo><mi>n</mi><mo>)</mo></mrow></mrow></math> , and that we just need to <i>sort</i> the other (i.e., transform it into <math><msub><mi>ι</mi><mi>n</mi></msub></math> ). The most studied genome rearrangement events are <i>reversals</i>, where a segment of the genome is reversed and reincorporated at the same location; and <i>transpositions</i>, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by <i>super short operations</i> (or SSOs).</p><p><strong>Results and conclusions: </strong>All problems considering SSOs in permutations have been shown to be in <math><mi>P</mi></math> , except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called <i>cyclic permutation graph</i> and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"13"},"PeriodicalIF":1.0,"publicationDate":"2018-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0131-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36360826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Split-inducing indels in phylogenomic analysis. 系统基因组分析中的分裂诱导因子。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-07-16 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0130-7
Alexander Donath, Peter F Stadler
{"title":"Split-inducing indels in phylogenomic analysis.","authors":"Alexander Donath,&nbsp;Peter F Stadler","doi":"10.1186/s13015-018-0130-7","DOIUrl":"https://doi.org/10.1186/s13015-018-0130-7","url":null,"abstract":"<p><strong>Background: </strong>Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.</p><p><strong>Results: </strong>Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (<i>splids</i>) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that <i>splids</i> can be efficiently extracted from phylogenomic data sets.</p><p><strong>Conclusions: </strong>Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"12"},"PeriodicalIF":1.0,"publicationDate":"2018-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0130-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36328089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Locus-aware decomposition of gene trees with respect to polytomous species trees. 多株种树基因树的位点感知分解。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-06-04 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0128-1
Michał Aleksander Ciach, Anna Muszewska, Paweł Górecki
{"title":"Locus-aware decomposition of gene trees with respect to polytomous species trees.","authors":"Michał Aleksander Ciach,&nbsp;Anna Muszewska,&nbsp;Paweł Górecki","doi":"10.1186/s13015-018-0128-1","DOIUrl":"https://doi.org/10.1186/s13015-018-0128-1","url":null,"abstract":"<p><strong>Background: </strong>Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values.</p><p><strong>Results: </strong>We investigate the locus tree inference problem as a possible alternative that combines the advantages of both approaches. We present several algorithms to solve the problem in the parsimony framework. We introduce a novel tree mapping, which allows us to obtain a heuristic solution to the problems of locus tree inference and duplication classification.</p><p><strong>Conclusions: </strong>Our approach allows for faster comparisons of gene and species trees and improves known algorithms for duplication inference in the presence of polytomies in the species trees. We have implemented our algorithms in a software tool available at https://github.com/mciach/LocusTreeInference.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"11"},"PeriodicalIF":1.0,"publicationDate":"2018-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0128-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36204242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Finding local genome rearrangements. 寻找局部基因组重排。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-05-04 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0127-2
Pijus Simonaitis, Krister M Swenson
{"title":"Finding local genome rearrangements.","authors":"Pijus Simonaitis,&nbsp;Krister M Swenson","doi":"10.1186/s13015-018-0127-2","DOIUrl":"https://doi.org/10.1186/s13015-018-0127-2","url":null,"abstract":"<p><strong>Background: </strong>The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. One such criterion has recently become accessible due to the advent of the Hi-C experiment, facilitating the study of 3D spacial distance between breakpoint regions.</p><p><strong>Results: </strong>We establish a link between the minimum number of unlikely rearrangements required by a scenario and the problem of finding a maximum edge-disjoint cycle packing on a certain transformed version of the adjacency graph. This link leads to a 3/2-approximation as well as an exact integer linear programming formulation for our problem, which we prove to be NP-complete. We also present experimental results on fruit flies, showing that Hi-C data is informative when used as a criterion for rearrangements.</p><p><strong>Conclusions: </strong>A new variant of the weighted DCJ distance problem is addressed that ignores scenario length in its objective function. A solution to this problem provides a lower bound on the number of unlikely moves necessary when transforming one gene order into another. This lower bound aids in the study of rearrangement scenarios with respect to chromatin structure, and could eventually be used in the design of a fixed parameter algorithm with a more general objective function.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"9"},"PeriodicalIF":1.0,"publicationDate":"2018-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0127-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36094512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
FSH: fast spaced seed hashing exploiting adjacent hashes. FSH:利用相邻哈希的快速间隔种子哈希。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2018-03-22 eCollection Date: 2018-01-01 DOI: 10.1186/s13015-018-0125-4
Samuele Girotto, Matteo Comin, Cinzia Pizzi
{"title":"FSH: fast spaced seed hashing exploiting adjacent hashes.","authors":"Samuele Girotto,&nbsp;Matteo Comin,&nbsp;Cinzia Pizzi","doi":"10.1186/s13015-018-0125-4","DOIUrl":"https://doi.org/10.1186/s13015-018-0125-4","url":null,"abstract":"<p><strong>Background: </strong>Patterns with wildcards in specified positions, namely <i>spaced seeds</i>, are increasingly used instead of <i>k</i>-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of <i>k</i>-mers can be rapidly computed by exploiting the large overlap between consecutive <i>k</i>-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.</p><p><strong>Results: </strong>The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6[Formula: see text] to 5.3[Formula: see text], depending on the structure of the spaced seed.</p><p><strong>Conclusions: </strong>Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.</p><p><strong>Availability: </strong>The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"8"},"PeriodicalIF":1.0,"publicationDate":"2018-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0125-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35953434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信