{"title":"A Critical Survey on Arabic Named Entity Recognition and Diacritization Systems","authors":"Muhammad Nabil Rateb, S. Alansary","doi":"10.1109/ESOLEC54569.2022.10009095","DOIUrl":null,"url":null,"abstract":"Language technologies are considered a subdivision of the Artificial intelligence (AI) field, which sheds light on how toolkits are programmed to simulate the natural language of humans. Over the last decennia, there has been a unique advancement in the Natura Language Processing (NLP) field, namely regarding the Arabic language. Arabic is the language spoken by almost two billion Muslims worldwide and is one of the six officially acknowledged languages by the UN organization. This paper is dedicated to a survey on three cutting-edge toolkits utilized to process and analyze the Arabic language: Cameltools, Farasa, and Madamira. This paper presents a background on the challenges that have confronted Arabic Natura Language Processing (ANLP), predominantly concerning diacritization, and Named Entity Recognition (NER) systems. Next, it illustrates what are the main components of Cameltools, Farasa, and Madamira. After that, the evaluation processes of the three toolkits shall be presented and their results will be compared. Finally, the paper shall present observations based on the previous comparison. The survey reveals that Camel is the best since it has been inspired by the designs of the best toolkits provided in the field. Farasa outpaces Madamira in all comparisons regarding ANER and Arabic diacritization.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Conference on Language Engineering (ESOLEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESOLEC54569.2022.10009095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Language technologies are considered a subdivision of the Artificial intelligence (AI) field, which sheds light on how toolkits are programmed to simulate the natural language of humans. Over the last decennia, there has been a unique advancement in the Natura Language Processing (NLP) field, namely regarding the Arabic language. Arabic is the language spoken by almost two billion Muslims worldwide and is one of the six officially acknowledged languages by the UN organization. This paper is dedicated to a survey on three cutting-edge toolkits utilized to process and analyze the Arabic language: Cameltools, Farasa, and Madamira. This paper presents a background on the challenges that have confronted Arabic Natura Language Processing (ANLP), predominantly concerning diacritization, and Named Entity Recognition (NER) systems. Next, it illustrates what are the main components of Cameltools, Farasa, and Madamira. After that, the evaluation processes of the three toolkits shall be presented and their results will be compared. Finally, the paper shall present observations based on the previous comparison. The survey reveals that Camel is the best since it has been inspired by the designs of the best toolkits provided in the field. Farasa outpaces Madamira in all comparisons regarding ANER and Arabic diacritization.