A better edit distance measure allowing for block swaps

Nhauo Davuth, Sung-Ryul Kim
{"title":"A better edit distance measure allowing for block swaps","authors":"Nhauo Davuth, Sung-Ryul Kim","doi":"10.1145/2513228.2513282","DOIUrl":null,"url":null,"abstract":"Edit Distance, also known as the Levenshtein distance or evolutionary distance, is a concept from information retrieval, and it describes the number of edit operations that have to be made in order to change one string to another. It's one of the most common measures to expose the dissimilarity between two strings. Ordinarily, Edit Distance is based on a character insert, delete and substitution operations. By using these three operators Edit Distance can help us to solve the problem of computing the similarity between two sequences that arise in many areas. However, standard Edit Distance still seems to miss the true relationship between these two similar strings in some cases because of the sequential order of common sub strings. For example, the Edit Distance between \"classbook\" and \"bookclass\" is eight, because of the words \"book\" and \"class\" is reversed but intuitively the two strings seem much closer. In order to solve this problem, we propose a method for extended Edit Distance, which permits block swap operation. The main contribution in this paper is the method to compute the cut points over a single string, and then allowing block swaps, which move sub strings from one position to another in a string, in order to make common substrings in the right order. Through our experiment, it is revealed that Block Swap Edit Distance can help us to find a better measure for Edit Distance.","PeriodicalId":120340,"journal":{"name":"Research in Adaptive and Convergent Systems","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513228.2513282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Edit Distance, also known as the Levenshtein distance or evolutionary distance, is a concept from information retrieval, and it describes the number of edit operations that have to be made in order to change one string to another. It's one of the most common measures to expose the dissimilarity between two strings. Ordinarily, Edit Distance is based on a character insert, delete and substitution operations. By using these three operators Edit Distance can help us to solve the problem of computing the similarity between two sequences that arise in many areas. However, standard Edit Distance still seems to miss the true relationship between these two similar strings in some cases because of the sequential order of common sub strings. For example, the Edit Distance between "classbook" and "bookclass" is eight, because of the words "book" and "class" is reversed but intuitively the two strings seem much closer. In order to solve this problem, we propose a method for extended Edit Distance, which permits block swap operation. The main contribution in this paper is the method to compute the cut points over a single string, and then allowing block swaps, which move sub strings from one position to another in a string, in order to make common substrings in the right order. Through our experiment, it is revealed that Block Swap Edit Distance can help us to find a better measure for Edit Distance.
一个更好的编辑距离测量允许块交换
编辑距离,也称为Levenshtein距离或进化距离,是来自信息检索的一个概念,它描述了为了将一个字符串更改为另一个字符串而必须进行的编辑操作的数量。这是暴露两个字符串之间的不相似性的最常见的方法之一。通常,编辑距离是基于字符插入、删除和替换操作的。通过使用这三个算子,编辑距离可以帮助我们解决在许多领域出现的两个序列之间的相似度计算问题。然而,由于公共子字符串的顺序,在某些情况下,标准Edit Distance似乎仍然忽略了这两个相似字符串之间的真正关系。例如,“classbook”和“bookclass”之间的编辑距离是8,因为“book”和“class”是颠倒的,但直观上这两个字符串看起来更近。为了解决这个问题,我们提出了一种扩展编辑距离的方法,允许块交换操作。本文的主要贡献是计算单个字符串上的截断点的方法,然后允许块交换,将子字符串从字符串中的一个位置移动到另一个位置,以使公共子字符串以正确的顺序排列。通过实验表明,块交换编辑距离可以帮助我们找到更好的编辑距离度量方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信