Workshop on Algorithms in Bioinformatics最新文献_第2页

Compressing and Indexing Aligned Readsets 压缩和索引对齐的Readsets

Workshop on Algorithms in Bioinformatics Pub Date : 2018-09-19 DOI: 10.4230/LIPIcs.WABI.2021.13

T. Gagie, Garance Gourdel, G. Manzini

引用次数: 4

l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies 惩罚序多分回归估计在基因表达研究中的应用

Workshop on Algorithms in Bioinformatics Pub Date : 2018-08-20 DOI: 10.4230/LIPIcs.WABI.2018.17

S. Chrétien, C. Guyeux, S. Moulin

{"title":"l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies","authors":"S. Chrétien, C. Guyeux, S. Moulin","doi":"10.4230/LIPIcs.WABI.2018.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.WABI.2018.17","url":null,"abstract":"Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126737995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multiple-Choice Knapsack for Assigning Partial Atomic Charges in Drug-Like Molecules 在类药物分子中分配部分原子电荷的多项选择背包

Workshop on Algorithms in Bioinformatics Pub Date : 2018-08-09 DOI: 10.4230/LIPIcs.WABI.2018.16

Martin S. Engler, Bertrand Caron, L. Veen, D. Geerke, A. Mark, G. Klau

{"title":"Multiple-Choice Knapsack for Assigning Partial Atomic Charges in Drug-Like Molecules","authors":"Martin S. Engler, Bertrand Caron, L. Veen, D. Geerke, A. Mark, G. Klau","doi":"10.4230/LIPIcs.WABI.2018.16","DOIUrl":"https://doi.org/10.4230/LIPIcs.WABI.2018.16","url":null,"abstract":"A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly NP-complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We show that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule by several orders of magnitude, that is, from hours or even days to below a second. Our software is openly available at https://github.com/enitram/charge-assign.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130561378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Degenerate String Comparison and Applications 简并字符串比较及其应用

Workshop on Algorithms in Bioinformatics Pub Date : 2018-08-01 DOI: 10.4230/LIPIcs.WABI.2018.21

Mai Alzamel, Lorraine A. K. Ayad, G. Bernardini, R. Grossi, C. Iliopoulos, N. Pisanti, S. Pissis, Giovanna Rosone

{"title":"Degenerate String Comparison and Applications","authors":"Mai Alzamel, Lorraine A. K. Ayad, G. Bernardini, R. Grossi, C. Iliopoulos, N. Pisanti, S. Pissis, Giovanna Rosone","doi":"10.4230/LIPIcs.WABI.2018.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.WABI.2018.21","url":null,"abstract":"A generalised degenerate string (GD string) S^ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet, is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. A similar result can be obtained by employing an automata-based approach but its cost is alphabet-dependent. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S^. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S^. Finally, proof-of-concept experimental results are presented using real protein datasets.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130236249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Parsimonious Migration History Problem: Complexity and Algorithms 精简迁移历史问题:复杂性和算法

Workshop on Algorithms in Bioinformatics Pub Date : 2018-08-01 DOI: 10.4230/LIPIcs.WABI.2018.24

M. El-Kebir

引用次数: 2

Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time 线性时间泛基因组方正重构的最小分割方法

Workshop on Algorithms in Bioinformatics Pub Date : 2018-05-01 DOI: 10.4230/LIPIcs.WABI.2018.15

T. Norri, Bastien Cazaux, D. Kosolobov, V. Mäkinen

引用次数: 6

Detecting Mutations by eBWT 利用eBWT检测突变

Workshop on Algorithms in Bioinformatics Pub Date : 2018-05-01 DOI: 10.4230/LIPIcs.WABI.2018.3

N. Prezza, N. Pisanti, M. Sciortino, Giovanna Rosone

引用次数: 11

Heuristic algorithms for the Maximum Colorful Subtree problem 最大彩色子树问题的启发式算法

Workshop on Algorithms in Bioinformatics Pub Date : 2018-01-23 DOI: 10.4230/LIPIcs.WABI.2018.23

Kai Dührkop, Marie Lataretu, W. White, Sebastian Böcker

{"title":"Heuristic algorithms for the Maximum Colorful Subtree problem","authors":"Kai Dührkop, Marie Lataretu, W. White, Sebastian Böcker","doi":"10.4230/LIPIcs.WABI.2018.23","DOIUrl":"https://doi.org/10.4230/LIPIcs.WABI.2018.23","url":null,"abstract":"In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this resulted in the computational Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem; and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured. \u0000Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem against an exact algorithm. We put particular emphasis on whether a heuristic is able to rank candidates such that the correct solution is ranked highly. We propose this \"intermediate\" evaluation because evaluating the approximating quality of heuristics is misleading: Even a slightly suboptimal solution can be structurally very different from the true solution. On the other hand, we cannot structurally evaluate against the ground truth, as this is unknown. We find that particularly one of the heuristics consistently ranks the correct solution in a favorable position. Integrating the heuristic into the analysis pipeline results in a speedup of 10-fold or more, without sacrificing accuracy.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Biclique Approach to Reference Anchored Gene Blocks and Its Applications to Pathogenicity Islands 参考锚定基因块的Biclique方法及其在致病性岛屿中的应用

Workshop on Algorithms in Bioinformatics Pub Date : 2017-10-13 DOI: 10.1007/978-3-319-43681-4_2

Arnon Benshahar, V. Chalifa-Caspi, D. Hermelin, Michal Ziv-Ukelson

引用次数: 0

Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time 基于距离的平均情况线性时间系统发育推断

Workshop on Algorithms in Bioinformatics Pub Date : 2017-08-03 DOI: 10.4230/LIPIcs.WABI.2017.9

M. Crochemore, Alexandre P. Francisco, S. Pissis, Cátia Vaz

引用次数: 4