{"title":"基于AraBERT摩洛哥方言用例的词嵌入情感分析","authors":"Yassir Matrane, F. Benabbou, N. Sael","doi":"10.1109/ICDATA52997.2021.00024","DOIUrl":null,"url":null,"abstract":"Nowadays, Sentiment Analysis (SA) represents a big chunk of Natural Language Processing (NLP) problems. The latter makes it possible to assign feelings and polarity to portions of text, which comes handy in multiple areas of social conduct such as product reviewing in business, determining political opinions of the masses and other uses. Nevertheless, sentiment analysis can be tricky when dealing with unstructured languages due to the lack of conventional syntactic and morphological structures. In this paper, we discuss several attempts of the literature at solving the challenge of Sentiment analysis of regional dialects, and we propose an approach based on AraBERT word embedding for Moroccan dialect (MD) sentiment analysis. The method goes through a pipeline of steps starting with preprocessing, lexicon-based translation and feature extraction. Afterwards we conduct a comparative study, in 2-way classification, of machine learning algorithms as SVM, DT, LR, RF, NB and deep learning algorithms such as LSTM, BiLSTM and LSTM-CNN from state of art. On the other hand, we managed to train our model with four different outputs in 4 way classification. As a result, BiLSTM proved to be the best in both 2-way classification scoring 83% accuracy, and in 4-way classification achieving scores ranging between 62% and 92% of accuracy for each of the 4 classes.","PeriodicalId":231714,"journal":{"name":"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Sentiment analysis through word embedding using AraBERT: Moroccan dialect use case\",\"authors\":\"Yassir Matrane, F. Benabbou, N. Sael\",\"doi\":\"10.1109/ICDATA52997.2021.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, Sentiment Analysis (SA) represents a big chunk of Natural Language Processing (NLP) problems. The latter makes it possible to assign feelings and polarity to portions of text, which comes handy in multiple areas of social conduct such as product reviewing in business, determining political opinions of the masses and other uses. Nevertheless, sentiment analysis can be tricky when dealing with unstructured languages due to the lack of conventional syntactic and morphological structures. In this paper, we discuss several attempts of the literature at solving the challenge of Sentiment analysis of regional dialects, and we propose an approach based on AraBERT word embedding for Moroccan dialect (MD) sentiment analysis. The method goes through a pipeline of steps starting with preprocessing, lexicon-based translation and feature extraction. Afterwards we conduct a comparative study, in 2-way classification, of machine learning algorithms as SVM, DT, LR, RF, NB and deep learning algorithms such as LSTM, BiLSTM and LSTM-CNN from state of art. On the other hand, we managed to train our model with four different outputs in 4 way classification. As a result, BiLSTM proved to be the best in both 2-way classification scoring 83% accuracy, and in 4-way classification achieving scores ranging between 62% and 92% of accuracy for each of the 4 classes.\",\"PeriodicalId\":231714,\"journal\":{\"name\":\"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDATA52997.2021.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDATA52997.2021.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment analysis through word embedding using AraBERT: Moroccan dialect use case
Nowadays, Sentiment Analysis (SA) represents a big chunk of Natural Language Processing (NLP) problems. The latter makes it possible to assign feelings and polarity to portions of text, which comes handy in multiple areas of social conduct such as product reviewing in business, determining political opinions of the masses and other uses. Nevertheless, sentiment analysis can be tricky when dealing with unstructured languages due to the lack of conventional syntactic and morphological structures. In this paper, we discuss several attempts of the literature at solving the challenge of Sentiment analysis of regional dialects, and we propose an approach based on AraBERT word embedding for Moroccan dialect (MD) sentiment analysis. The method goes through a pipeline of steps starting with preprocessing, lexicon-based translation and feature extraction. Afterwards we conduct a comparative study, in 2-way classification, of machine learning algorithms as SVM, DT, LR, RF, NB and deep learning algorithms such as LSTM, BiLSTM and LSTM-CNN from state of art. On the other hand, we managed to train our model with four different outputs in 4 way classification. As a result, BiLSTM proved to be the best in both 2-way classification scoring 83% accuracy, and in 4-way classification achieving scores ranging between 62% and 92% of accuracy for each of the 4 classes.