UTILIZING ROOTS AND PATTERNS TO IDENTIFY ARABIC NAMED ENTITIES

Asian Journal of Mathematics and Computer Research Pub Date : 2022-11-07 DOI:10.56557/ajomcor/2022/v29i27922

Abdulmonem Ahmed, Aybaba Hançrli̇oğullari, Ali̇ Riza Tosun

{"title":"UTILIZING ROOTS AND PATTERNS TO IDENTIFY ARABIC NAMED ENTITIES","authors":"Abdulmonem Ahmed, Aybaba Hançrli̇oğullari, Ali̇ Riza Tosun","doi":"10.56557/ajomcor/2022/v29i27922","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition NER is a subset of information extraction that seeks to recognize and categorize named things in text data into specified categories, such as people's names, organizations' names, geographic locations, and so on. This task has recently attracted a lot of attention due to the discovery it has the potential to boost the performance of a variety of NLP applications. In the domains of Question Answering and Summarization Systems, Information Retrieval and Extraction, Machine Translation, Video Annotation, Semantic Web Search, and Bioinformatics, the majority of difficulties require named entity recognition. Arabic is an inflectional language, which allows for non-concatenative morphological operations on the root. The purpose of this study is to extract and recognize entity names from Arabic articles. We proposed an algorithm for determining names from roots using patterns. We developed it in Python and leveraged the \"pyqt5\" visual package to see the results immediately, as well as modify and add patterns easily. To replicate the names, we used a random sample of 400 names and 45 different patterns. The algorithm correctly identified 370 names easily and quickly, yielding a success rate of 93%. All names with the same recognized names will be known in the same way by the method and do not need any manipulation in code or design. The names that are not recognized by our algorithm have no roots in the list of known Arabic roots. Our research shows that the approach can recognize names with roots with high speed and accuracy, but it is not possible to identify nouns that are not in the Arabic language using this method. As a result, we recommend using a hybrid method that incorporates multiple concepts.","PeriodicalId":200824,"journal":{"name":"Asian Journal of Mathematics and Computer Research","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Mathematics and Computer Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56557/ajomcor/2022/v29i27922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Named Entity Recognition NER is a subset of information extraction that seeks to recognize and categorize named things in text data into specified categories, such as people's names, organizations' names, geographic locations, and so on. This task has recently attracted a lot of attention due to the discovery it has the potential to boost the performance of a variety of NLP applications. In the domains of Question Answering and Summarization Systems, Information Retrieval and Extraction, Machine Translation, Video Annotation, Semantic Web Search, and Bioinformatics, the majority of difficulties require named entity recognition. Arabic is an inflectional language, which allows for non-concatenative morphological operations on the root. The purpose of this study is to extract and recognize entity names from Arabic articles. We proposed an algorithm for determining names from roots using patterns. We developed it in Python and leveraged the "pyqt5" visual package to see the results immediately, as well as modify and add patterns easily. To replicate the names, we used a random sample of 400 names and 45 different patterns. The algorithm correctly identified 370 names easily and quickly, yielding a success rate of 93%. All names with the same recognized names will be known in the same way by the method and do not need any manipulation in code or design. The names that are not recognized by our algorithm have no roots in the list of known Arabic roots. Our research shows that the approach can recognize names with roots with high speed and accuracy, but it is not possible to identify nouns that are not in the Arabic language using this method. As a result, we recommend using a hybrid method that incorporates multiple concepts.

查看原文本刊更多论文

利用根和模式来识别阿拉伯命名实体

命名实体识别NER是信息提取的一个子集，它试图识别文本数据中的命名事物并将其分类为指定的类别，例如人名、组织名称、地理位置等。这项任务最近引起了很多关注，因为它有可能提高各种NLP应用程序的性能。在问答和摘要系统、信息检索和提取、机器翻译、视频注释、语义网络搜索和生物信息学等领域，大多数困难需要命名实体识别。阿拉伯语是一种屈折语言，它允许词根上的非连接形态操作。本研究的目的是从阿拉伯语文章中提取和识别实体名称。我们提出了一种使用模式从根中确定名称的算法。我们用Python开发它，并利用“pyqt5”可视化包来立即查看结果，以及轻松地修改和添加模式。为了复制这些名字，我们随机抽取了400个名字和45种不同的模式。该算法轻松快速地正确识别了370个名字，成功率为93%。所有具有相同可识别名称的名称将被方法以相同的方式识别，并且不需要在代码或设计中进行任何操作。我们的算法不能识别的名字在已知的阿拉伯根列表中没有根。我们的研究表明，该方法可以快速准确地识别带有词根的名称，但无法识别非阿拉伯语的名词。因此，我们建议使用包含多个概念的混合方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Asian Journal of Mathematics and Computer Research

自引率

0.00%

发文量