{"title":"A fast bit-parallel multi-patterns string matching algorithm for biological sequences","authors":"R. Prasad, S. Agarwal, I. Yadav, Bharat Singh","doi":"10.1145/1722024.1722077","DOIUrl":null,"url":null,"abstract":"The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722077","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 4
Abstract
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).
In Silico BiologyComputer Science-Computational Theory and Mathematics
CiteScore
2.20
自引率
0.00%
发文量
1
期刊介绍:
The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.