探索farasa和CAMeL标记器对阿拉伯语方言推文的性能

Aseel Alfaidi, H. Alwadei, Areej Alshutayri, Shahd Alahdal
{"title":"探索farasa和CAMeL标记器对阿拉伯语方言推文的性能","authors":"Aseel Alfaidi, H. Alwadei, Areej Alshutayri, Shahd Alahdal","doi":"10.34028/iajit/20/3/7","DOIUrl":null,"url":null,"abstract":"In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA.","PeriodicalId":13624,"journal":{"name":"Int. Arab J. Inf. Technol.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the performance of farasa and CAMeL taggers for arabic dialect tweets\",\"authors\":\"Aseel Alfaidi, H. Alwadei, Areej Alshutayri, Shahd Alahdal\",\"doi\":\"10.34028/iajit/20/3/7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA.\",\"PeriodicalId\":13624,\"journal\":{\"name\":\"Int. Arab J. Inf. Technol.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. Arab J. Inf. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/20/3/7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. Arab J. Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/3/7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

词性标注是自然语言处理(NLP)中的一个重要步骤;它是许多应用程序的基本需求,例如信息提取、机器翻译和语法检查。已经为许多语言开发了成功的POS标记器,包括阿拉伯语。目前,社交媒体的普及增加了方言的多样性,因为人们在网上交流中使用方言。因此,对于研究人员来说,对一些人类能理解但计算机不能理解的单词进行分类变得更加困难。此外,大多数阿拉伯语POS研究都集中在现代标准阿拉伯语(MSA)上,而阿拉伯语方言(DA)的研究较少。本文旨在评价两种阿拉伯文标注器在阿拉伯文方言推文上的使用性能,确定适合哪一种标注器,从而对现有阿拉伯文方言推文标注器进行改进。我们使用了Farasa和CAMeL标记器,它们通常用于分析阿拉伯文本,并且被认为是阿拉伯语的最佳标记器。结果表明,CAMeL标记器的准确率分别为92%和83%,优于Farasa标记器。换句话说,使用MSA和DA训练的混合POS标注器返回的结果比使用MSA训练的结果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring the performance of farasa and CAMeL taggers for arabic dialect tweets
In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信