跨域情感分类中特征选择算法的比较分析

Recent Advances in Computer Science and Communications Pub Date : 2024-02-02 DOI:10.2174/0126662558276889240125062857

Lipika Goel, Sonam Gupta, Avdhesh Gupta, Neha Nandal, Siddhi Nath Ranjan, Pradeep Gupta

{"title":"跨域情感分类中特征选择算法的比较分析","authors":"Lipika Goel, Sonam Gupta, Avdhesh Gupta, Neha Nandal, Siddhi Nath Ranjan, Pradeep Gupta","doi":"10.2174/0126662558276889240125062857","DOIUrl":null,"url":null,"abstract":"\n\nCross-domain Sentiment Classification is a well-researched field in\nsentiment analysis. The biggest challenge in CDSC arises from the differences in domains and\nfeatures, which cause a decrease in model performance when applying source domain features\nto predict sentiment in the target domain. To address this challenge, several feature selection\nmethods can be employed to identify the most relevant features for training and testing in\nCDSC.\n\n\n\nThe primary objective of this study is to perform a comparative analysis of different\nfeature selection methods on the various CDSC tasks. In this study, statistical test-based feature\nselection methods using 18 classifiers for the CDSC task has been implemented. The impact\nof these feature selection methods on Amazon product reviews, specifically those in the\nDVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments\nwere conducted for each feature selection method by varying source and target domain pairs\nfrom the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation\nmeasures are accuracy and f-score.\n\n\n\nFrom the experiments, it has been inferred that the CSDC task depends on various factors\nfor a good performance, from the right domain selection to the right feature selection\nmethod. We have concluded that the best training dataset is Electronics as it gives more precise\nresults while testing in either domain selected for our study.\n\n\n\nCross-domain sentiment analysis is a dynamic and interdisciplinary field that offers\nvaluable insights for understanding how sentiment varies across different domains.\n","PeriodicalId":506582,"journal":{"name":"Recent Advances in Computer Science and Communications","volume":"18 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Analysis of Feature Selection Algorithms in Cross Domain\\nSentiment Classification\",\"authors\":\"Lipika Goel, Sonam Gupta, Avdhesh Gupta, Neha Nandal, Siddhi Nath Ranjan, Pradeep Gupta\",\"doi\":\"10.2174/0126662558276889240125062857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n\\nCross-domain Sentiment Classification is a well-researched field in\\nsentiment analysis. The biggest challenge in CDSC arises from the differences in domains and\\nfeatures, which cause a decrease in model performance when applying source domain features\\nto predict sentiment in the target domain. To address this challenge, several feature selection\\nmethods can be employed to identify the most relevant features for training and testing in\\nCDSC.\\n\\n\\n\\nThe primary objective of this study is to perform a comparative analysis of different\\nfeature selection methods on the various CDSC tasks. In this study, statistical test-based feature\\nselection methods using 18 classifiers for the CDSC task has been implemented. The impact\\nof these feature selection methods on Amazon product reviews, specifically those in the\\nDVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments\\nwere conducted for each feature selection method by varying source and target domain pairs\\nfrom the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation\\nmeasures are accuracy and f-score.\\n\\n\\n\\nFrom the experiments, it has been inferred that the CSDC task depends on various factors\\nfor a good performance, from the right domain selection to the right feature selection\\nmethod. We have concluded that the best training dataset is Electronics as it gives more precise\\nresults while testing in either domain selected for our study.\\n\\n\\n\\nCross-domain sentiment analysis is a dynamic and interdisciplinary field that offers\\nvaluable insights for understanding how sentiment varies across different domains.\\n\",\"PeriodicalId\":506582,\"journal\":{\"name\":\"Recent Advances in Computer Science and Communications\",\"volume\":\"18 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Computer Science and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/0126662558276889240125062857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Computer Science and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0126662558276889240125062857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

跨域情感分类（Cross-domain Sentiment Classification）是情感分析中一个研究得比较透彻的领域。跨域情感分类的最大挑战来自于领域和特征的差异，当应用源领域特征预测目标领域情感时，会导致模型性能下降。本研究的主要目的是对不同的特征选择方法在 CDSC 任务中的应用进行比较分析。本研究的主要目的是比较分析不同特征选择方法对 CDSC 各项任务的影响。比较了这些特征选择方法对亚马逊产品评论的影响，特别是对 DVD、电子产品、厨房和电视领域的产品评论的影响。通过改变亚马逊产品评论数据集中的源域和目标域对，并使用 18 个分类器，对每种特征选择方法进行了 12x18 次实验。从实验中可以推断出，CSDC 任务要想取得良好的性能，取决于从正确的领域选择到正确的特征选择方法等多种因素。我们得出的结论是，最好的训练数据集是电子数据集，因为它能在我们研究选择的任一领域进行测试时提供更精确的结果。跨领域情感分析是一个充满活力的跨学科领域，它为了解不同领域的情感变化提供了宝贵的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Analysis of Feature Selection Algorithms in Cross Domain Sentiment Classification

Cross-domain Sentiment Classification is a well-researched field in sentiment analysis. The biggest challenge in CDSC arises from the differences in domains and features, which cause a decrease in model performance when applying source domain features to predict sentiment in the target domain. To address this challenge, several feature selection methods can be employed to identify the most relevant features for training and testing in CDSC. The primary objective of this study is to perform a comparative analysis of different feature selection methods on the various CDSC tasks. In this study, statistical test-based feature selection methods using 18 classifiers for the CDSC task has been implemented. The impact of these feature selection methods on Amazon product reviews, specifically those in the DVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments were conducted for each feature selection method by varying source and target domain pairs from the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation measures are accuracy and f-score. From the experiments, it has been inferred that the CSDC task depends on various factors for a good performance, from the right domain selection to the right feature selection method. We have concluded that the best training dataset is Electronics as it gives more precise results while testing in either domain selected for our study. Cross-domain sentiment analysis is a dynamic and interdisciplinary field that offers valuable insights for understanding how sentiment varies across different domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Recent Advances in Computer Science and Communications

自引率

0.00%

发文量