Shahad Abuuznien, Zena Abdelmohsin, Ehsan Abdu, Izzeldein Amin
{"title":"用比较监督学习方法分析苏丹阿拉伯语方言的情感","authors":"Shahad Abuuznien, Zena Abdelmohsin, Ehsan Abdu, Izzeldein Amin","doi":"10.1109/ICCCEEE49695.2021.9429560","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is several methods, techniques, and tools that are used to determine the polarity of the text (positive, negative, or neutral). The most popular approaches to address this problem, is the machine learning approach, lexicon-based approach, and hybrid approach. This project focuses on extracting and analyzing Sudanese social media feeds about ridesharing services. This project aims to tackle the issue of Sudanese Arabic dialect analysis by conducting a comparative analysis to measure the performance of the machine learning algorithms using Sudanese dialect corpus comparing different preprocessing approaches. For this study, a stop word list that combines a modern standard Arabic list and a Sudanese stop word list was built to be conducted through the analysis as one of the preprocessing steps. with four classifiers applied on a dataset consist of 2116 tweets. In particular, Naïve Bayes (NB), Support vector machine (SVM), Logistic Regression, and K-Nearest Neighbor (KNN) had been trained and measured the performance. The results of the selected classifiers against the dataset which had been applied to various preprocessing steps revealed that SVM with stemming only gives the highest F1-score (0.71), and the best accuracy (0.95).","PeriodicalId":359802,"journal":{"name":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Sentiment Analysis for Sudanese Arabic Dialect Using comparative Supervised Learning approach\",\"authors\":\"Shahad Abuuznien, Zena Abdelmohsin, Ehsan Abdu, Izzeldein Amin\",\"doi\":\"10.1109/ICCCEEE49695.2021.9429560\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is several methods, techniques, and tools that are used to determine the polarity of the text (positive, negative, or neutral). The most popular approaches to address this problem, is the machine learning approach, lexicon-based approach, and hybrid approach. This project focuses on extracting and analyzing Sudanese social media feeds about ridesharing services. This project aims to tackle the issue of Sudanese Arabic dialect analysis by conducting a comparative analysis to measure the performance of the machine learning algorithms using Sudanese dialect corpus comparing different preprocessing approaches. For this study, a stop word list that combines a modern standard Arabic list and a Sudanese stop word list was built to be conducted through the analysis as one of the preprocessing steps. with four classifiers applied on a dataset consist of 2116 tweets. In particular, Naïve Bayes (NB), Support vector machine (SVM), Logistic Regression, and K-Nearest Neighbor (KNN) had been trained and measured the performance. The results of the selected classifiers against the dataset which had been applied to various preprocessing steps revealed that SVM with stemming only gives the highest F1-score (0.71), and the best accuracy (0.95).\",\"PeriodicalId\":359802,\"journal\":{\"name\":\"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCEEE49695.2021.9429560\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE49695.2021.9429560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment Analysis for Sudanese Arabic Dialect Using comparative Supervised Learning approach
Sentiment analysis is several methods, techniques, and tools that are used to determine the polarity of the text (positive, negative, or neutral). The most popular approaches to address this problem, is the machine learning approach, lexicon-based approach, and hybrid approach. This project focuses on extracting and analyzing Sudanese social media feeds about ridesharing services. This project aims to tackle the issue of Sudanese Arabic dialect analysis by conducting a comparative analysis to measure the performance of the machine learning algorithms using Sudanese dialect corpus comparing different preprocessing approaches. For this study, a stop word list that combines a modern standard Arabic list and a Sudanese stop word list was built to be conducted through the analysis as one of the preprocessing steps. with four classifiers applied on a dataset consist of 2116 tweets. In particular, Naïve Bayes (NB), Support vector machine (SVM), Logistic Regression, and K-Nearest Neighbor (KNN) had been trained and measured the performance. The results of the selected classifiers against the dataset which had been applied to various preprocessing steps revealed that SVM with stemming only gives the highest F1-score (0.71), and the best accuracy (0.95).