Sentence-level sentiment analysis in Persian

Mohammad Ehsan Basiri, Arman Kabiri
{"title":"Sentence-level sentiment analysis in Persian","authors":"Mohammad Ehsan Basiri, Arman Kabiri","doi":"10.1109/PRIA.2017.7983023","DOIUrl":null,"url":null,"abstract":"Sentiment analysis (SA) is a subfield of natural language processing and data mining which concerns the problem of extracting useful information from users' comments on the Web. Although researchers have been studying different problems in SA for more than one decade, most studies concentrate on English and languages like Persian have not received the attention they deserved. Resource scarcity for assessing sentiment analysis studies is the main limiting factor in Persian. This paper addresses the problem of resource scarcity by introducing two new resources; a sentence-level dataset for sentiment analysis in Persian, SPerSent and a new Persian lexicon, CNRC. SPerSent contains 150000 sentences, each associated with two labels; a binary label indicating the polarity of the sentence, and a five-star rating. These labels are obtained automatically using a lexicon-based method. Specifically, three lexicons are used independently to label each sentence. Then, the majority voting and average methods are used to aggregate the results for polarity and five-star labels, respectively. Finally, a well-known machine learning method, Naïve Bayes, is used to evaluate the SPerSent.","PeriodicalId":336066,"journal":{"name":"2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRIA.2017.7983023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

Sentiment analysis (SA) is a subfield of natural language processing and data mining which concerns the problem of extracting useful information from users' comments on the Web. Although researchers have been studying different problems in SA for more than one decade, most studies concentrate on English and languages like Persian have not received the attention they deserved. Resource scarcity for assessing sentiment analysis studies is the main limiting factor in Persian. This paper addresses the problem of resource scarcity by introducing two new resources; a sentence-level dataset for sentiment analysis in Persian, SPerSent and a new Persian lexicon, CNRC. SPerSent contains 150000 sentences, each associated with two labels; a binary label indicating the polarity of the sentence, and a five-star rating. These labels are obtained automatically using a lexicon-based method. Specifically, three lexicons are used independently to label each sentence. Then, the majority voting and average methods are used to aggregate the results for polarity and five-star labels, respectively. Finally, a well-known machine learning method, Naïve Bayes, is used to evaluate the SPerSent.
波斯语句子级情感分析
情感分析是自然语言处理和数据挖掘的一个分支,主要研究如何从用户的评论中提取有用的信息。尽管十多年来研究人员一直在研究SA中的不同问题,但大多数研究都集中在英语上,而波斯语等语言没有得到应有的重视。评估情感分析研究的资源稀缺是波斯语的主要限制因素。本文通过引入两种新资源来解决资源稀缺问题;一个用于波斯语情感分析的句子级数据集,SPerSent和一个新的波斯语词典CNRC。SPerSent包含150,000个句子,每个句子与两个标签相关联;表示句子极性的二元标签,以及五星评级。这些标签使用基于词典的方法自动获得。具体来说,三个词汇被独立地用于标记每个句子。然后,使用多数投票法和平均法分别对极性和五星级标签的结果进行汇总。最后,使用著名的机器学习方法Naïve Bayes来评估SPerSent。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信