PMS6: a fast algorithm for motif discovery.

Q4 Health Professions

International Journal of Bioinformatics Research and Applications Pub Date : 2014-01-01 DOI:10.1504/IJBRA.2014.062990

Shibdas Bandyopadhyay, Sartaj Sahni, Sanguthevar Rajasekaran

引用次数: 24

Abstract

We propose a new algorithm, PMS6, for the (l,d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1.69 for the (17,6) challenge instances. Both PMS5 and PMS6 require some amount of pre-processing. The pre-processing time for PMS6 is 34 times faster than that for PMS5 for (23,9) instances. When pre-processing time is factored in, the run time ratio PMS5/PMS6 is as high as 2.75 for (13,4) instances and as low as 1.95 for (17,6) instances.

查看原文本刊更多论文

PMS6:一种快速的motif发现算法。

针对(l,d)基序发现问题，我们提出了一种新的算法PMS6，该算法要求在给定的字符串集合中，找出所有长度为l且不匹配最多为d的字符串。运行时间比PMS5/PMS6，其中PMS5是已知的在大型实例中发现motif的最快算法，其范围从(21,8)挑战实例的2.20到(17,6)挑战实例的1.69不等。PMS5和PMS6都需要一定数量的预处理。对于(23,9)个实例，PMS6的预处理时间比PMS5快34倍。如果将预处理时间考虑在内，运行时比率PMS5/PMS6在(13,4)实例中高达2.75，在(17,6)实例中低至1.95。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Bioinformatics Research and Applications Health Professions-Health Information Management

CiteScore

0.60

自引率

0.00%

发文量

期刊介绍： Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.