A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysis

Dhaou Ghoul , Jérémy Patrix , Gaël Lejeune , Jérôme Verny
{"title":"A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysis","authors":"Dhaou Ghoul ,&nbsp;Jérémy Patrix ,&nbsp;Gaël Lejeune ,&nbsp;Jérôme Verny","doi":"10.1016/j.nlp.2024.100100","DOIUrl":null,"url":null,"abstract":"<div><p>For sentiment analysis of short texts (e.g. movie reviews, tweets, etc.), one approach is to build machine learning models that can determine their tones (positive, negative, neutral). However, these natural language processing (NLP) studies are missing when there is a lack of high-quality and large-scale training data for specific languages such as Arabic. In this paper, we present three machine learning models designed to classify sentiment Arabic tweets developed for a Kaggle competition. We present a Voting Ensemble classifier taking advantage of both character-level and word-level features. We also propose an AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with preprocessing using Farasa Segmenter. Finally, we combine these first two approaches as a third approach (Voting Ensemble classifier using AraBERT embeddings). Performance measures of results show improvement over previous efforts for all models. The third model exhibits strong performance with a 73.98% F-score score. The work presented here could be useful for future studies and for new Arabic sentiment analysis online services or competitions.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100100"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000487/pdfft?md5=0cdd68616cd0023e6f056de98e086b2d&pid=1-s2.0-S2949719124000487-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

For sentiment analysis of short texts (e.g. movie reviews, tweets, etc.), one approach is to build machine learning models that can determine their tones (positive, negative, neutral). However, these natural language processing (NLP) studies are missing when there is a lack of high-quality and large-scale training data for specific languages such as Arabic. In this paper, we present three machine learning models designed to classify sentiment Arabic tweets developed for a Kaggle competition. We present a Voting Ensemble classifier taking advantage of both character-level and word-level features. We also propose an AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with preprocessing using Farasa Segmenter. Finally, we combine these first two approaches as a third approach (Voting Ensemble classifier using AraBERT embeddings). Performance measures of results show improvement over previous efforts for all models. The third model exhibits strong performance with a 73.98% F-score score. The work presented here could be useful for future studies and for new Arabic sentiment analysis online services or competitions.

用于阿拉伯语情感分析的 AraBERT 和投票集合分类器组合模型
对于短文(如电影评论、推特等)的情感分析,一种方法是建立机器学习模型,以确定其语气(正面、负面、中性)。然而,如果缺乏阿拉伯语等特定语言的高质量和大规模训练数据,这些自然语言处理 (NLP) 研究就会缺失。在本文中,我们介绍了为 Kaggle 竞赛开发的三种机器学习模型,旨在对阿拉伯语推文进行情感分类。我们利用字符级和单词级特征,提出了投票集合分类器。我们还提出了使用 Farasa Segmenter 进行预处理的 AraBERT(来自变换器的阿拉伯语双向编码器表示)模型。最后,我们将前两种方法合并为第三种方法(使用 AraBERT 嵌入的投票集合分类器)。结果表明,所有模型的性能都比以前有所提高。第三个模型表现出强劲的性能,F 分数高达 73.98%。本文介绍的工作对今后的研究以及新的阿拉伯语情感分析在线服务或竞赛都很有帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信