基于监督学习的阿拉伯语方言情感分析

2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE) Pub Date : 2018-08-01 DOI:10.1109/ICCCEEE.2018.8515862

Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin

{"title":"基于监督学习的阿拉伯语方言情感分析","authors":"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin","doi":"10.1109/ICCCEEE.2018.8515862","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.","PeriodicalId":6567,"journal":{"name":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"52 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Sentiment Analysis for Arabic Dialect Using Supervised Learning\",\"authors\":\"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin\",\"doi\":\"10.1109/ICCCEEE.2018.8515862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.\",\"PeriodicalId\":6567,\"journal\":{\"name\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"volume\":\"52 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCEEE.2018.8515862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE.2018.8515862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

情感分析是从文本中提取主观意见的一套程序。一般来说，情感分析有两种技术:机器学习方法和基于词典的方法。这项工作的重点是提取和分析用苏丹阿拉伯语方言编写的Twitter数据，以观察有关苏丹电信服务质量的固执己见的模式。文本分类领域的一个重要限制是只关注英语语言。有必要通过开发阿拉伯语情感分析的有效方法和工具来弥合这一差距。此外，还需要可靠的语料库和词典。在本研究中，在包含4712条tweet的数据集上训练了四个分类器。即Naïve贝叶斯、支持向量机、多项逻辑回归和k近邻对分类器的性能进行比较分析。当这些算法与tweets数据集运行时，结果显示SVM给出了最高的f1分数(72.0)，而KNN (k=2)达到了最好的精度，它等于92.0。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Analysis for Arabic Dialect Using Supervised Learning

Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)

自引率

0.00%

发文量