设计一种精确高效的阿拉伯语人名匹配算法

2019 First International Conference of Intelligent Computing and Engineering (ICOICE) Pub Date : 2019-12-01 DOI:10.1109/ICOICE48418.2019.9035184

Salah Al-Hagree, Maher Al-Sanabani, Khaled M. Alalayah, Mohammed Hadwan

{"title":"设计一种精确高效的阿拉伯语人名匹配算法","authors":"Salah Al-Hagree, Maher Al-Sanabani, Khaled M. Alalayah, Mohammed Hadwan","doi":"10.1109/ICOICE48418.2019.9035184","DOIUrl":null,"url":null,"abstract":"A great deal of research has been done to find out an accurate algorithm for name matching that would play major role in the application process. Researchers have developed several algorithms to measure the similarity of string, but most of them are designed mainly to deal with Latin-based languages. However, dealing with the Arabic context is a challenging task, owing to the nature and unique features of the Arabic language. This can explain why the name matching algorithms in the Arabic context are rare. Therefore, this paper aims at designing an accurate and efficient algorithm for matching Arabic names. In this paper, a framework for matching Arabic names has been designed to provide a platform for the current and future investigations, involving matching Arabic names. This framework deals with specific characteristics of Arabic language and the various levels of similarities for Arabic letters, mainly keyboard similarities, letter forms and phonetic similarities. Moreover, the proposed algorithm accounts for the operation of transposition and the enhanced states of substitution, deletion and insertion operations. Therefore, the proposed algorithm reduces the storage space of the process, saves the time of processing time and reduces the time complexity from O(N3) to O(N2). Besides, the experiments show that the proposed algorithm is more efficient and more accurate than the other algorithms. Keywords: Matching Arabic names, String matching, Character N-gram, Levenshtein distance.","PeriodicalId":109414,"journal":{"name":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Designing an Accurate and Efficient Algorithm for Matching Arabic Names\",\"authors\":\"Salah Al-Hagree, Maher Al-Sanabani, Khaled M. Alalayah, Mohammed Hadwan\",\"doi\":\"10.1109/ICOICE48418.2019.9035184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A great deal of research has been done to find out an accurate algorithm for name matching that would play major role in the application process. Researchers have developed several algorithms to measure the similarity of string, but most of them are designed mainly to deal with Latin-based languages. However, dealing with the Arabic context is a challenging task, owing to the nature and unique features of the Arabic language. This can explain why the name matching algorithms in the Arabic context are rare. Therefore, this paper aims at designing an accurate and efficient algorithm for matching Arabic names. In this paper, a framework for matching Arabic names has been designed to provide a platform for the current and future investigations, involving matching Arabic names. This framework deals with specific characteristics of Arabic language and the various levels of similarities for Arabic letters, mainly keyboard similarities, letter forms and phonetic similarities. Moreover, the proposed algorithm accounts for the operation of transposition and the enhanced states of substitution, deletion and insertion operations. Therefore, the proposed algorithm reduces the storage space of the process, saves the time of processing time and reduces the time complexity from O(N3) to O(N2). Besides, the experiments show that the proposed algorithm is more efficient and more accurate than the other algorithms. Keywords: Matching Arabic names, String matching, Character N-gram, Levenshtein distance.\",\"PeriodicalId\":109414,\"journal\":{\"name\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOICE48418.2019.9035184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICE48418.2019.9035184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

为了找到一种在应用程序过程中发挥重要作用的准确的名称匹配算法，已经进行了大量的研究。研究人员已经开发了几种算法来测量字符串的相似性，但大多数算法主要用于处理基于拉丁语的语言。然而，由于阿拉伯文的性质和独特特点，处理阿拉伯文的背景是一项具有挑战性的任务。这可以解释为什么阿拉伯上下文中的名称匹配算法很少。因此，本文旨在设计一种准确高效的阿拉伯语人名匹配算法。本文设计了一个匹配阿拉伯名字的框架，为当前和未来涉及匹配阿拉伯名字的调查提供了一个平台。该框架处理阿拉伯语的特定特征和阿拉伯字母的不同程度的相似性，主要是键盘相似性，字母形式和语音相似性。此外，该算法考虑了换位操作和替换、删除和插入操作的增强状态。因此，该算法减少了过程的存储空间，节省了处理时间，将时间复杂度从O(N3)降低到O(N2)。实验结果表明，该算法比其他算法具有更高的效率和精度。关键词:阿拉伯名字匹配，字符串匹配，字符N-gram, Levenshtein距离。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Designing an Accurate and Efficient Algorithm for Matching Arabic Names

A great deal of research has been done to find out an accurate algorithm for name matching that would play major role in the application process. Researchers have developed several algorithms to measure the similarity of string, but most of them are designed mainly to deal with Latin-based languages. However, dealing with the Arabic context is a challenging task, owing to the nature and unique features of the Arabic language. This can explain why the name matching algorithms in the Arabic context are rare. Therefore, this paper aims at designing an accurate and efficient algorithm for matching Arabic names. In this paper, a framework for matching Arabic names has been designed to provide a platform for the current and future investigations, involving matching Arabic names. This framework deals with specific characteristics of Arabic language and the various levels of similarities for Arabic letters, mainly keyboard similarities, letter forms and phonetic similarities. Moreover, the proposed algorithm accounts for the operation of transposition and the enhanced states of substitution, deletion and insertion operations. Therefore, the proposed algorithm reduces the storage space of the process, saves the time of processing time and reduces the time complexity from O(N3) to O(N2). Besides, the experiments show that the proposed algorithm is more efficient and more accurate than the other algorithms. Keywords: Matching Arabic names, String matching, Character N-gram, Levenshtein distance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 First International Conference of Intelligent Computing and Engineering (ICOICE)

自引率

0.00%

发文量