{"title":"近似字符串搜索与快速傅里叶变换和简单","authors":"Daniel Liu","doi":"10.7287/peerj.preprints.27615v1","DOIUrl":null,"url":null,"abstract":"Previous algorithms for solving the approximate string matching with Hamming distance problem with wildcard (\"don't care\") characters have been shown to take \\(O(|\\Sigma| N \\log M)\\) time, where \\(N\\) is the length of the text, \\(M\\) is the length of the pattern, and \\(|\\Sigma|\\) is the size of the alphabet. They make use of the Fast Fourier Transform for efficiently calculating convolutions. We describe a novel approach of the problem, which makes use of special encoding schemes that depend on \\((|\\Sigma| - 1)\\)-simplexes in \\((|\\Sigma| - 1)\\)-dimensional space.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"9 1","pages":"e27615"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Approximate string searching with fast fourier transforms and simplexes\",\"authors\":\"Daniel Liu\",\"doi\":\"10.7287/peerj.preprints.27615v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous algorithms for solving the approximate string matching with Hamming distance problem with wildcard (\\\"don't care\\\") characters have been shown to take \\\\(O(|\\\\Sigma| N \\\\log M)\\\\) time, where \\\\(N\\\\) is the length of the text, \\\\(M\\\\) is the length of the pattern, and \\\\(|\\\\Sigma|\\\\) is the size of the alphabet. They make use of the Fast Fourier Transform for efficiently calculating convolutions. We describe a novel approach of the problem, which makes use of special encoding schemes that depend on \\\\((|\\\\Sigma| - 1)\\\\)-simplexes in \\\\((|\\\\Sigma| - 1)\\\\)-dimensional space.\",\"PeriodicalId\":93040,\"journal\":{\"name\":\"PeerJ preprints\",\"volume\":\"9 1\",\"pages\":\"e27615\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ preprints\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7287/peerj.preprints.27615v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ preprints","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7287/peerj.preprints.27615v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
先前用于解决带有通配符(“不关心”)字符的近似字符串匹配与汉明距离问题的算法已被证明需要\(O(|\Sigma| N \log M)\)时间,其中\(N\)是文本的长度,\(M\)是模式的长度,\(|\Sigma|\)是字母表的大小。他们利用快速傅里叶变换来有效地计算卷积。我们描述了一种解决该问题的新方法,该方法利用了在\((|\Sigma| - 1)\)维空间中依赖于\((|\Sigma| - 1)\) -simplexes的特殊编码方案。
Approximate string searching with fast fourier transforms and simplexes
Previous algorithms for solving the approximate string matching with Hamming distance problem with wildcard ("don't care") characters have been shown to take \(O(|\Sigma| N \log M)\) time, where \(N\) is the length of the text, \(M\) is the length of the pattern, and \(|\Sigma|\) is the size of the alphabet. They make use of the Fast Fourier Transform for efficiently calculating convolutions. We describe a novel approach of the problem, which makes use of special encoding schemes that depend on \((|\Sigma| - 1)\)-simplexes in \((|\Sigma| - 1)\)-dimensional space.