{"title":"生物序列间Damerau-Levenshtein距离的高效计算","authors":"Chunchun Zhao, S. Sahni","doi":"10.1109/ICCABS.2017.8114295","DOIUrl":null,"url":null,"abstract":"We have developed linear space algorithms to compute the Damerau-Levenshtein (DL) distance [1], [2] between two strings and also to find a sequence of edit operations of length equal to the DL distance (optimal trace). Our algorithms require O(s min{m, n} + m + n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(mn) space. Cache efficient and multi-core linear-space algorithms have also been developed. The cache miss efficiency of the algorithms was analyzed using a simple cache model.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"1 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Efficient computation of the Damerau-Levenshtein distance between biological sequences\",\"authors\":\"Chunchun Zhao, S. Sahni\",\"doi\":\"10.1109/ICCABS.2017.8114295\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We have developed linear space algorithms to compute the Damerau-Levenshtein (DL) distance [1], [2] between two strings and also to find a sequence of edit operations of length equal to the DL distance (optimal trace). Our algorithms require O(s min{m, n} + m + n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(mn) space. Cache efficient and multi-core linear-space algorithms have also been developed. The cache miss efficiency of the algorithms was analyzed using a simple cache model.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"1 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2017.8114295\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
我们开发了线性空间算法来计算两个字符串之间的Damerau-Levenshtein (DL)距离[1],[2],并找到长度等于DL距离(最优跟踪)的编辑操作序列。我们的算法需要O(s min{m, n} + m + n)空间,其中s是字母表的大小,m和n分别是两个字符串的长度。以前已知的算法需要O(mn)空间。高速缓存效率和多核线性空间算法也得到了发展。用一个简单的缓存模型分析了算法的缓存缺失效率。
Efficient computation of the Damerau-Levenshtein distance between biological sequences
We have developed linear space algorithms to compute the Damerau-Levenshtein (DL) distance [1], [2] between two strings and also to find a sequence of edit operations of length equal to the DL distance (optimal trace). Our algorithms require O(s min{m, n} + m + n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(mn) space. Cache efficient and multi-core linear-space algorithms have also been developed. The cache miss efficiency of the algorithms was analyzed using a simple cache model.