{"title":"使用基于阿拉伯语 BERT 的模型改进摩洛哥方言情感分析","authors":"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi","doi":"10.3844/jcssp.2024.157.167","DOIUrl":null,"url":null,"abstract":": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models\",\"authors\":\"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi\",\"doi\":\"10.3844/jcssp.2024.157.167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects\",\"PeriodicalId\":40005,\"journal\":{\"name\":\"Journal of Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3844/jcssp.2024.157.167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.157.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models
: This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects
期刊介绍:
Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.