A hybrid composite features based sentence level sentiment analyzer

Q2 Decision Sciences
Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir
{"title":"A hybrid composite features based sentence level sentiment analyzer","authors":"Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir","doi":"10.11591/ijai.v12.i1.pp284-294","DOIUrl":null,"url":null,"abstract":"Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers. ","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i1.pp284-294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers. 
一种基于混合复合特征的句子级情感分析器
当前基于词汇和机器学习的情绪分析方法仍然受到双重限制。首先,人工词汇构建和机器训练耗时且容易出错。其次,预测的准确性要求句子及其相应的训练文本应属于同一领域。在本文中,我们对四种情绪分类器进行了实验评估,即支持向量机、朴素贝叶斯、逻辑回归和随机森林。我们使用三个真实世界的数据集来量化这些模型中每一个的质量,这些数据集包括50000条电影评论、10662句句子和300条普通电影评论。具体来说,我们研究了各种自然语言处理(NLP)管道对预测情感取向质量的影响。此外,我们还测量了整合WordNet获取的词汇语义知识对扩展句子中的原始单词的影响。研究结果表明,使用不同的NLP管道和语义关系会影响情绪分析器的质量。特别地,结果表明,耦合引理化和基于知识的n-gram特征被证明产生了更高精度的结果。通过这种耦合,支持向量机(SVM)分类器的准确率提高到90.43%,而使用其他三个分类器的准确度分别为86.83%、90.11%和86.20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IAES International Journal of Artificial Intelligence
IAES International Journal of Artificial Intelligence Decision Sciences-Information Systems and Management
CiteScore
3.90
自引率
0.00%
发文量
170
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信