Investigation on sentiment analysis for Arabic reviews

Ashraf Elnagar
{"title":"Investigation on sentiment analysis for Arabic reviews","authors":"Ashraf Elnagar","doi":"10.1109/AICCSA.2016.7945623","DOIUrl":null,"url":null,"abstract":"Arabic language has very rich vocabulary. It is manifested in different forms. The formal, Modern Standard Arabic (MSA), and the informal, colloquial or dialects. Dialectical languages become important as a result of the proliferation of social networks which resulted in the vast unstructured dialectical text available on the web. Unique properties of modern standard Arabic and dialects present major challenges to build sentiment analysis systems by adopting similar models designed for the English language. In this paper, we present a supervised Arabic sentiment analysis using a bag-of-words feature. We further examine using a set of key words (lexicon) for better polarity classification. The testing of the system is carried out on the freely-available Arabic books' reviews (LABR) dataset. LABR includes both modern standard Arabic and Egyptian dialectal reviews. We used both balanced and unbalanced datasets. Clearly, the balanced data set is small in size and, henceforth, a large-scale balanced dataset is required for training of the classifier model. Further, we compared the computed predicted sentiments against the actual reviews for a specific book. Findings, by annotators, had indicated ambiguity between a review and its rating when verified alongside the predicted sentiment, which provided a more reasonable result. Moreover, working with dialects and sarcasm is exceedingly exciting. Experimental results on the adopted logistic classifier model and LABR are encouraging and promising. However, a key prerequisite is the availability of rich and well represented datasets in order to develop robust and efficient Arabic sentiment analyzers.","PeriodicalId":448329,"journal":{"name":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2016.7945623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Arabic language has very rich vocabulary. It is manifested in different forms. The formal, Modern Standard Arabic (MSA), and the informal, colloquial or dialects. Dialectical languages become important as a result of the proliferation of social networks which resulted in the vast unstructured dialectical text available on the web. Unique properties of modern standard Arabic and dialects present major challenges to build sentiment analysis systems by adopting similar models designed for the English language. In this paper, we present a supervised Arabic sentiment analysis using a bag-of-words feature. We further examine using a set of key words (lexicon) for better polarity classification. The testing of the system is carried out on the freely-available Arabic books' reviews (LABR) dataset. LABR includes both modern standard Arabic and Egyptian dialectal reviews. We used both balanced and unbalanced datasets. Clearly, the balanced data set is small in size and, henceforth, a large-scale balanced dataset is required for training of the classifier model. Further, we compared the computed predicted sentiments against the actual reviews for a specific book. Findings, by annotators, had indicated ambiguity between a review and its rating when verified alongside the predicted sentiment, which provided a more reasonable result. Moreover, working with dialects and sarcasm is exceedingly exciting. Experimental results on the adopted logistic classifier model and LABR are encouraging and promising. However, a key prerequisite is the availability of rich and well represented datasets in order to develop robust and efficient Arabic sentiment analyzers.
阿拉伯语评论的情感分析研究
阿拉伯语有非常丰富的词汇。它以不同的形式表现出来。正式的,现代标准阿拉伯语(MSA)和非正式的,口语或方言。由于社交网络的激增,导致网络上出现了大量非结构化的辩证文本,辩证语言变得重要起来。现代标准阿拉伯语和方言的独特属性为采用为英语设计的类似模型构建情感分析系统提出了重大挑战。在本文中,我们提出了一个使用词袋特征的有监督的阿拉伯语情感分析。我们进一步研究使用一组关键词(词典)来更好地进行极性分类。系统的测试是在免费提供的阿拉伯语图书评论(LABR)数据集上进行的。LABR包括现代标准阿拉伯语和埃及方言评论。我们同时使用平衡和非平衡数据集。显然,平衡数据集的规模很小,因此,需要一个大规模的平衡数据集来训练分类器模型。此外,我们将计算出的预测情绪与某本书的实际评论进行了比较。注释者的发现表明,当与预测的情绪一起验证时,评论与其评级之间存在歧义,这提供了更合理的结果。此外,使用方言和讽刺是非常令人兴奋的。所采用的逻辑分类器模型和LABR的实验结果是令人鼓舞和有希望的。然而,一个关键的先决条件是丰富和良好的数据集的可用性,以便开发强大和高效的阿拉伯语情感分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信