{"title":"Sentiment analysis through word embedding using AraBERT: Moroccan dialect use case","authors":"Yassir Matrane, F. Benabbou, N. Sael","doi":"10.1109/ICDATA52997.2021.00024","DOIUrl":null,"url":null,"abstract":"Nowadays, Sentiment Analysis (SA) represents a big chunk of Natural Language Processing (NLP) problems. The latter makes it possible to assign feelings and polarity to portions of text, which comes handy in multiple areas of social conduct such as product reviewing in business, determining political opinions of the masses and other uses. Nevertheless, sentiment analysis can be tricky when dealing with unstructured languages due to the lack of conventional syntactic and morphological structures. In this paper, we discuss several attempts of the literature at solving the challenge of Sentiment analysis of regional dialects, and we propose an approach based on AraBERT word embedding for Moroccan dialect (MD) sentiment analysis. The method goes through a pipeline of steps starting with preprocessing, lexicon-based translation and feature extraction. Afterwards we conduct a comparative study, in 2-way classification, of machine learning algorithms as SVM, DT, LR, RF, NB and deep learning algorithms such as LSTM, BiLSTM and LSTM-CNN from state of art. On the other hand, we managed to train our model with four different outputs in 4 way classification. As a result, BiLSTM proved to be the best in both 2-way classification scoring 83% accuracy, and in 4-way classification achieving scores ranging between 62% and 92% of accuracy for each of the 4 classes.","PeriodicalId":231714,"journal":{"name":"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDATA52997.2021.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Nowadays, Sentiment Analysis (SA) represents a big chunk of Natural Language Processing (NLP) problems. The latter makes it possible to assign feelings and polarity to portions of text, which comes handy in multiple areas of social conduct such as product reviewing in business, determining political opinions of the masses and other uses. Nevertheless, sentiment analysis can be tricky when dealing with unstructured languages due to the lack of conventional syntactic and morphological structures. In this paper, we discuss several attempts of the literature at solving the challenge of Sentiment analysis of regional dialects, and we propose an approach based on AraBERT word embedding for Moroccan dialect (MD) sentiment analysis. The method goes through a pipeline of steps starting with preprocessing, lexicon-based translation and feature extraction. Afterwards we conduct a comparative study, in 2-way classification, of machine learning algorithms as SVM, DT, LR, RF, NB and deep learning algorithms such as LSTM, BiLSTM and LSTM-CNN from state of art. On the other hand, we managed to train our model with four different outputs in 4 way classification. As a result, BiLSTM proved to be the best in both 2-way classification scoring 83% accuracy, and in 4-way classification achieving scores ranging between 62% and 92% of accuracy for each of the 4 classes.