基于方面的情感分析的有监督和无监督机器学习算法的实证分析

IF 0.5 Q4 COMPUTER SCIENCE, THEORY & METHODS
Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur
{"title":"基于方面的情感分析的有监督和无监督机器学习算法的实证分析","authors":"Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur","doi":"10.2478/acss-2023-0012","DOIUrl":null,"url":null,"abstract":"Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"139 1","pages":"125 - 136"},"PeriodicalIF":0.5000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis\",\"authors\":\"Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur\",\"doi\":\"10.2478/acss-2023-0012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.\",\"PeriodicalId\":41960,\"journal\":{\"name\":\"Applied Computer Systems\",\"volume\":\"139 1\",\"pages\":\"125 - 136\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/acss-2023-0012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2023-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

基于机器学习的情感分析是一种跨学科的意见挖掘方法,特别是在媒体和传播研究领域。尽管他们的背景不同,但研究人员已经合作测试,训练和再次测试机器学习方法,以收集,分析并从大型数据集中提取有意义的见解。本研究将微博(tweets)文本分为对某一特定现象的积极回应和消极回应。本研究还展示了情感审查语料库的编制过程,清理文本主体使其成为有意义的文本,发现人们对文本的情感,并解释研究结果。到目前为止,还没有对废除第370条后的民意进行研究,这为这项科学研究增添了新颖性。本研究包括从Twitter收集的数据集,其中包括关于废除第370条的66.7%的正面推文和34.3%的负面推文。实验测试表明,所提出的方法比以前提出的方法更有效。本研究重点比较了用于情感分析的无监督基于词典的模型(TextBlob、AFINN、Vader Sentiment)和监督机器学习模型(KNN、SVM、Random Forest和Naïve Bayes)。这是首次就废除第370条进行网络舆论调查。作者收集了超过20万条推文的推特数据。清理后,选择29732条tweet进行分析。从结果来看,在监督学习中,Random Forest表现最好,而在无监督学习中,TextBlob的准确率最高,分别达到99%和88%。所提出的监督机器学习模型的性能参数也超过了最近在2023年进行的情感分析研究的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis
Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computer Systems
Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-
自引率
10.00%
发文量
9
审稿时长
30 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信