A Comparative Analysis of Feature Selection Algorithms in Cross Domain Sentiment Classification

Lipika Goel, Sonam Gupta, Avdhesh Gupta, Neha Nandal, Siddhi Nath Ranjan, Pradeep Gupta
{"title":"A Comparative Analysis of Feature Selection Algorithms in Cross Domain\nSentiment Classification","authors":"Lipika Goel, Sonam Gupta, Avdhesh Gupta, Neha Nandal, Siddhi Nath Ranjan, Pradeep Gupta","doi":"10.2174/0126662558276889240125062857","DOIUrl":null,"url":null,"abstract":"\n\nCross-domain Sentiment Classification is a well-researched field in\nsentiment analysis. The biggest challenge in CDSC arises from the differences in domains and\nfeatures, which cause a decrease in model performance when applying source domain features\nto predict sentiment in the target domain. To address this challenge, several feature selection\nmethods can be employed to identify the most relevant features for training and testing in\nCDSC.\n\n\n\nThe primary objective of this study is to perform a comparative analysis of different\nfeature selection methods on the various CDSC tasks. In this study, statistical test-based feature\nselection methods using 18 classifiers for the CDSC task has been implemented. The impact\nof these feature selection methods on Amazon product reviews, specifically those in the\nDVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments\nwere conducted for each feature selection method by varying source and target domain pairs\nfrom the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation\nmeasures are accuracy and f-score.\n\n\n\nFrom the experiments, it has been inferred that the CSDC task depends on various factors\nfor a good performance, from the right domain selection to the right feature selection\nmethod. We have concluded that the best training dataset is Electronics as it gives more precise\nresults while testing in either domain selected for our study.\n\n\n\nCross-domain sentiment analysis is a dynamic and interdisciplinary field that offers\nvaluable insights for understanding how sentiment varies across different domains.\n","PeriodicalId":506582,"journal":{"name":"Recent Advances in Computer Science and Communications","volume":"18 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Computer Science and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0126662558276889240125062857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cross-domain Sentiment Classification is a well-researched field in sentiment analysis. The biggest challenge in CDSC arises from the differences in domains and features, which cause a decrease in model performance when applying source domain features to predict sentiment in the target domain. To address this challenge, several feature selection methods can be employed to identify the most relevant features for training and testing in CDSC. The primary objective of this study is to perform a comparative analysis of different feature selection methods on the various CDSC tasks. In this study, statistical test-based feature selection methods using 18 classifiers for the CDSC task has been implemented. The impact of these feature selection methods on Amazon product reviews, specifically those in the DVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments were conducted for each feature selection method by varying source and target domain pairs from the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation measures are accuracy and f-score. From the experiments, it has been inferred that the CSDC task depends on various factors for a good performance, from the right domain selection to the right feature selection method. We have concluded that the best training dataset is Electronics as it gives more precise results while testing in either domain selected for our study. Cross-domain sentiment analysis is a dynamic and interdisciplinary field that offers valuable insights for understanding how sentiment varies across different domains.
跨域情感分类中特征选择算法的比较分析
跨域情感分类(Cross-domain Sentiment Classification)是情感分析中一个研究得比较透彻的领域。跨域情感分类的最大挑战来自于领域和特征的差异,当应用源领域特征预测目标领域情感时,会导致模型性能下降。本研究的主要目的是对不同的特征选择方法在 CDSC 任务中的应用进行比较分析。本研究的主要目的是比较分析不同特征选择方法对 CDSC 各项任务的影响。比较了这些特征选择方法对亚马逊产品评论的影响,特别是对 DVD、电子产品、厨房和电视领域的产品评论的影响。通过改变亚马逊产品评论数据集中的源域和目标域对,并使用 18 个分类器,对每种特征选择方法进行了 12x18 次实验。从实验中可以推断出,CSDC 任务要想取得良好的性能,取决于从正确的领域选择到正确的特征选择方法等多种因素。我们得出的结论是,最好的训练数据集是电子数据集,因为它能在我们研究选择的任一领域进行测试时提供更精确的结果。跨领域情感分析是一个充满活力的跨学科领域,它为了解不同领域的情感变化提供了宝贵的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信