An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification

N. Dina, Sri Devi Ravana, N. Idris
{"title":"An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification","authors":"N. Dina, Sri Devi Ravana, N. Idris","doi":"10.1109/SKIMA57145.2022.10029452","DOIUrl":null,"url":null,"abstract":"Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.","PeriodicalId":277436,"journal":{"name":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA57145.2022.10029452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.
情感分类中混合特征选择技术的实验研究
文本情感分类旨在从非结构化文本数据中提取有用信息,并将其情感分为积极和消极两类。文本数据中的不相关特征和高维特征空间是情感分类中常见的问题,因为它们会降低分类性能。为了解决这些问题,本研究将使用术语频率-逆文档频率(TF-IDF)和支持向量机-递归特征消除(SVM-RFE)的混合特征选择应用于三个文本数据集:IMDB, Yelp和Amazon。使用TF-IDF选择情感特征,并通过SVM-RFE进一步细化。最后,利用支持向量机判断情感是积极的还是消极的。该研究在两个数据集上优于现有技术:IMDB数据集的准确率为88%,Yelp数据集的准确率为84.5%。同时,亚马逊数据集的准确率低于现有研究,为81.5%。这些结果表明了该技术的不一致性,并为进一步研究其他用于情感分类的混合特征选择技术提供了机会,以提高所有数据集的准确性。此外,结果表明,该技术提高了分类性能,减少了63%的特征空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信