基于监督学习的阿拉伯语方言情感分析

Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin
{"title":"基于监督学习的阿拉伯语方言情感分析","authors":"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin","doi":"10.1109/ICCCEEE.2018.8515862","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.","PeriodicalId":6567,"journal":{"name":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"52 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Sentiment Analysis for Arabic Dialect Using Supervised Learning\",\"authors\":\"Rua Ismail, Mawada Omer, Mawada Tabir, Noor Mahadi, Izzeldein Amin\",\"doi\":\"10.1109/ICCCEEE.2018.8515862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.\",\"PeriodicalId\":6567,\"journal\":{\"name\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"volume\":\"52 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCEEE.2018.8515862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE.2018.8515862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

情感分析是从文本中提取主观意见的一套程序。一般来说,情感分析有两种技术:机器学习方法和基于词典的方法。这项工作的重点是提取和分析用苏丹阿拉伯语方言编写的Twitter数据,以观察有关苏丹电信服务质量的固执己见的模式。文本分类领域的一个重要限制是只关注英语语言。有必要通过开发阿拉伯语情感分析的有效方法和工具来弥合这一差距。此外,还需要可靠的语料库和词典。在本研究中,在包含4712条tweet的数据集上训练了四个分类器。即Naïve贝叶斯、支持向量机、多项逻辑回归和k近邻对分类器的性能进行比较分析。当这些算法与tweets数据集运行时,结果显示SVM给出了最高的f1分数(72.0),而KNN (k=2)达到了最好的精度,它等于92.0。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sentiment Analysis for Arabic Dialect Using Supervised Learning
Sentiment analysis is a set of procedures used to extract subjective opinions from the text. Generally, there are two techniques for sentiment analysis, machine learning method, and lexicon-based method. This work focuses on extracting and analyzing Twitter data written in Sudanese Arabic dialect to observe opinionated patterns regarding the quality of telecommunication services operating in Sudan. One of the significant limitations in the field of text classification is the exclusive focus on the English language. There is a need to bridge this gap by developing efficient methods and tools for sentiment analysis in the Arabic language. Moreover, reliable corpus and lexicons are needed. For this study, four classifiers were trained on a dataset consist of 4712 tweets. Namely Naïve Bayes, SVM, Multinomial Logistic Regression and K-Nearest Neighbor to conduct a comparative analysis on the performance of the classifiers. These algorithms when ran against the tweets dataset the results revealed that SVM gives the highest F1-score (72.0) while the best accuracy was achieved by KNN (k=2) and it equals to 92.0.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信