Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery
{"title":"名称匹配的改进N-gram距离","authors":"Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery","doi":"10.1109/ICOICE48418.2019.9035154","DOIUrl":null,"url":null,"abstract":"N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.","PeriodicalId":109414,"journal":{"name":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Improved N-gram Distance for Names Matching\",\"authors\":\"Salah Al-Hagree, Maher Al-Sanabani, Mohammed Hadwan, M. Al-Hagery\",\"doi\":\"10.1109/ICOICE48418.2019.9035154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.\",\"PeriodicalId\":109414,\"journal\":{\"name\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOICE48418.2019.9035154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICE48418.2019.9035154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
N-gram distance (N-DIST) was developed by Kondrak's lately to measure the distance between two strings. It was found that, this distance could be computed by a smart dynamic programming procedure. The N-DIST has played important roles in a wide array of applications due to its representational and computational efficiency. To effect a more sensible, distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in last years. There is, however, a fundamental problem with the original definition of N-DIST that has remained without improved: its context-free nature. In determining the possible actions, i.e., deletion, insertion, transposition and substitution, consider work was given to the local behaviors of the string question that indeed encompass great amount of useful information concerning its content. In this proposed framework, two operations are developed. The original N-DIST algorithm does not consider the transposition operations and the algorithm has fixed the cost of insertion and deletion operations. In addition, the proposed E-N-DIST algorithm computes the costs of substitution and transposition operations is dependent on 2n+1- 1 states while the original N-DIST algorithm has been only dependent on 2n states. In this paper, the experiments carried out show the E-N-DIST algorithm, which gives a sort of results that are more accurate than the algorithms under discussion.