名称匹配的改进N-gram距离

Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery
{"title":"名称匹配的改进N-gram距离","authors":"Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery","doi":"10.1109/ICOICE48418.2019.9035154","DOIUrl":null,"url":null,"abstract":"N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.","PeriodicalId":109414,"journal":{"name":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Improved N-gram Distance for Names Matching\",\"authors\":\"Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery\",\"doi\":\"10.1109/ICOICE48418.2019.9035154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.\",\"PeriodicalId\":109414,\"journal\":{\"name\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOICE48418.2019.9035154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICE48418.2019.9035154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

N-gram distance (N-DIST)是Kondrak最近提出的测量两弦之间距离的方法。结果表明,该距离可以通过智能动态规划程序计算得到。N-DIST由于其表现和计算效率在广泛的应用中发挥了重要作用。为了实现更合理的距离度量,提出了归一化编辑距离。许多算法和研究都是沿着这条线进行的,在过去的几年里取得了令人印象深刻的成绩。然而,N-DIST的原始定义存在一个基本问题,这个问题一直没有得到改进:它与上下文无关的性质。在确定可能的动作,即删除、插入、换位和替换时,考虑对字符串问题的局部行为进行了研究,这些行为确实包含了关于其内容的大量有用信息。在这个建议的框架中,开发了两个操作。原来的N-DIST算法不考虑换位操作,并且算法固定了插入和删除操作的代价。此外,本文提出的E-N-DIST算法计算替换和换位操作的代价依赖于2n+1- 1个状态,而原来的N-DIST算法只依赖于2n个状态。本文通过实验验证了E-N-DIST算法,该算法给出了一种比本文讨论的算法更精确的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Improved N-gram Distance for Names Matching
N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信