Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models

Journal of Computer Science Pub Date : 2024-02-01 DOI:10.3844/jcssp.2024.157.167

Ghizlane Bourahouat, Manar Abourezq, N. Daoudi

{"title":"Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models","authors":"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi","doi":"10.3844/jcssp.2024.157.167","DOIUrl":null,"url":null,"abstract":": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"49 29","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.157.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

: This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects

查看原文本刊更多论文

使用基于阿拉伯语 BERT 的模型改进摩洛哥方言情感分析

:本研究探讨了自然语言处理中情感分析这一关键任务，尤其侧重于阿拉伯语，特别是方言阿拉伯语，由于其固有的挑战，对该领域的研究相对不足。我们的方法以摩洛哥阿拉伯语的情感分析为中心，利用在阿拉伯语中预先训练好的 BERT 模型，即 AraBERT、QARIB、ALBERT、AraELECTRA 和 CAMeLBERT。这些模型与 SVM 和 CNN 等深度学习和机器学习算法集成，并对预训练模型进行了额外的微调。此外，我们还通过在三个不同的数据集上评估模型来检验数据不平衡的影响：一个不平衡数据集、一个通过抽样不足获得的平衡数据集，以及一个通过将初始数据集与另一个不平衡数据集相结合而创建的平衡数据集。值得注意的是，我们提出的方法表现出了令人印象深刻的准确性，即使在不平衡数据上使用 QARIB 模型，也能达到 96% 的显著准确率。这项研究的新颖之处在于将预先训练好的阿拉伯语 BERT 模型整合到摩洛哥情感分析中，并探索如何将其与 CNN 和 SVM 算法结合使用。此外，我们的研究结果表明，与结合 CNN 或 SVM 的应用相比，使用基于 BERT 的模型会产生更优越的结果，这标志着摩洛哥阿拉伯语情感分析的重大进步。我们的方法通过与最先进方法的对比分析凸显了其有效性，为阿拉伯语方言情感分析的进步提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Science Computer Science-Computer Networks and Communications

CiteScore

1.70

自引率

0.00%

发文量

期刊介绍： Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.