{"title":"合并后缀数组间隔对并行模式匹配的好处","authors":"J. Fischer, D. Köppl, Florian Kurpicz","doi":"10.4230/LIPIcs.CPM.2016.26","DOIUrl":null,"url":null,"abstract":"We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$, we first show how to compute the suffix array interval of a given pattern of length $m$ in $O(\\frac{m}{p}+ \\lg p + \\lg\\lg p\\cdot\\lg\\lg n)$ time for $p \\le m$. For approximate pattern matching with $k$ differences or mismatches, we show how to compute all occurrences of a given pattern in $O(\\frac{m^k\\sigma^k}{p}\\max\\left(k,\\lg\\lg n\\right)\\!+\\!(1+\\frac{m}{p}) \\lg p\\cdot \\lg\\lg n + \\text{occ})$ time, where $\\sigma$ is the size of the alphabet and $p \\le \\sigma^k m^k$. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns $P$ and $P'$, we present a data structure for computing the interval of $PP'$ in $O(\\lg\\lg n)$ sequential time, or in $O(1+\\lg_p\\lg n)$ parallel time. All our data structures are of size $O(n)$ bits (in addition to the suffix array).","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching\",\"authors\":\"J. Fischer, D. Köppl, Florian Kurpicz\",\"doi\":\"10.4230/LIPIcs.CPM.2016.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$, we first show how to compute the suffix array interval of a given pattern of length $m$ in $O(\\\\frac{m}{p}+ \\\\lg p + \\\\lg\\\\lg p\\\\cdot\\\\lg\\\\lg n)$ time for $p \\\\le m$. For approximate pattern matching with $k$ differences or mismatches, we show how to compute all occurrences of a given pattern in $O(\\\\frac{m^k\\\\sigma^k}{p}\\\\max\\\\left(k,\\\\lg\\\\lg n\\\\right)\\\\!+\\\\!(1+\\\\frac{m}{p}) \\\\lg p\\\\cdot \\\\lg\\\\lg n + \\\\text{occ})$ time, where $\\\\sigma$ is the size of the alphabet and $p \\\\le \\\\sigma^k m^k$. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns $P$ and $P'$, we present a data structure for computing the interval of $PP'$ in $O(\\\\lg\\\\lg n)$ sequential time, or in $O(1+\\\\lg_p\\\\lg n)$ parallel time. All our data structures are of size $O(n)$ bits (in addition to the suffix array).\",\"PeriodicalId\":236737,\"journal\":{\"name\":\"Annual Symposium on Combinatorial Pattern Matching\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium on Combinatorial Pattern Matching\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2016.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching
We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$, we first show how to compute the suffix array interval of a given pattern of length $m$ in $O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n)$ time for $p \le m$. For approximate pattern matching with $k$ differences or mismatches, we show how to compute all occurrences of a given pattern in $O(\frac{m^k\sigma^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ})$ time, where $\sigma$ is the size of the alphabet and $p \le \sigma^k m^k$. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns $P$ and $P'$, we present a data structure for computing the interval of $PP'$ in $O(\lg\lg n)$ sequential time, or in $O(1+\lg_p\lg n)$ parallel time. All our data structures are of size $O(n)$ bits (in addition to the suffix array).