Shahad Abuuznien, Zena Abdelmohsin, Ehsan Abdu, Izzeldein Amin
{"title":"Sentiment Analysis for Sudanese Arabic Dialect Using comparative Supervised Learning approach","authors":"Shahad Abuuznien, Zena Abdelmohsin, Ehsan Abdu, Izzeldein Amin","doi":"10.1109/ICCCEEE49695.2021.9429560","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is several methods, techniques, and tools that are used to determine the polarity of the text (positive, negative, or neutral). The most popular approaches to address this problem, is the machine learning approach, lexicon-based approach, and hybrid approach. This project focuses on extracting and analyzing Sudanese social media feeds about ridesharing services. This project aims to tackle the issue of Sudanese Arabic dialect analysis by conducting a comparative analysis to measure the performance of the machine learning algorithms using Sudanese dialect corpus comparing different preprocessing approaches. For this study, a stop word list that combines a modern standard Arabic list and a Sudanese stop word list was built to be conducted through the analysis as one of the preprocessing steps. with four classifiers applied on a dataset consist of 2116 tweets. In particular, Naïve Bayes (NB), Support vector machine (SVM), Logistic Regression, and K-Nearest Neighbor (KNN) had been trained and measured the performance. The results of the selected classifiers against the dataset which had been applied to various preprocessing steps revealed that SVM with stemming only gives the highest F1-score (0.71), and the best accuracy (0.95).","PeriodicalId":359802,"journal":{"name":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE49695.2021.9429560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Sentiment analysis is several methods, techniques, and tools that are used to determine the polarity of the text (positive, negative, or neutral). The most popular approaches to address this problem, is the machine learning approach, lexicon-based approach, and hybrid approach. This project focuses on extracting and analyzing Sudanese social media feeds about ridesharing services. This project aims to tackle the issue of Sudanese Arabic dialect analysis by conducting a comparative analysis to measure the performance of the machine learning algorithms using Sudanese dialect corpus comparing different preprocessing approaches. For this study, a stop word list that combines a modern standard Arabic list and a Sudanese stop word list was built to be conducted through the analysis as one of the preprocessing steps. with four classifiers applied on a dataset consist of 2116 tweets. In particular, Naïve Bayes (NB), Support vector machine (SVM), Logistic Regression, and K-Nearest Neighbor (KNN) had been trained and measured the performance. The results of the selected classifiers against the dataset which had been applied to various preprocessing steps revealed that SVM with stemming only gives the highest F1-score (0.71), and the best accuracy (0.95).