Garance Gourdel, T. Kociumaka, J. Radoszewski, Tatiana Starikovskaya
{"title":"用$k$不匹配近似最长公共子串:理论与实践","authors":"Garance Gourdel, T. Kociumaka, J. Radoszewski, Tatiana Starikovskaya","doi":"10.4230/LIPIcs.CPM.2020.16","DOIUrl":null,"url":null,"abstract":"In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\\ell$ such that there is a length-$\\ell$ substring of $X$ and a length-$\\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms for computing $\\ell$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"175 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Approximating longest common substring with $k$ mismatches: Theory and practice\",\"authors\":\"Garance Gourdel, T. Kociumaka, J. Radoszewski, Tatiana Starikovskaya\",\"doi\":\"10.4230/LIPIcs.CPM.2020.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\\\\ell$ such that there is a length-$\\\\ell$ substring of $X$ and a length-$\\\\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\\\\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms for computing $\\\\ell$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.\",\"PeriodicalId\":236737,\"journal\":{\"name\":\"Annual Symposium on Combinatorial Pattern Matching\",\"volume\":\"175 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium on Combinatorial Pattern Matching\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2020.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
在有$k$不匹配的最长公共子串问题中,我们给定两个字符串$X, $Y$,并且必须找到最大长度$\ell$,使得$X$的长度$\ell$子串和$Y$的长度$\ell$子串最多相差$k$个位置。长度$\ well $可以用作$X, Y$之间相似性的可靠度量。在这项工作中,我们开发了新的近似算法来计算$\ well $,从理论的角度来看,它比以前已知的解决方案更有效。我们的方法简单实用,我们通过实验评估证实了这一点,并且可能接近最优,因为我们通过条件下界证明了这一点。
Approximating longest common substring with $k$ mismatches: Theory and practice
In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\ell$ such that there is a length-$\ell$ substring of $X$ and a length-$\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms for computing $\ell$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.