Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin
{"title":"基于监督学习的阿拉伯语方言情感分析","authors":"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin","doi":"10.1109/ICCCEEE.2018.8515862","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.","PeriodicalId":6567,"journal":{"name":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"52 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Sentiment Analysis for Arabic Dialect Using Supervised Learning\",\"authors\":\"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin\",\"doi\":\"10.1109/ICCCEEE.2018.8515862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.\",\"PeriodicalId\":6567,\"journal\":{\"name\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"volume\":\"52 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCEEE.2018.8515862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE.2018.8515862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment Analysis for Arabic Dialect Using Supervised Learning
Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.