Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis

IF 0.8 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems Pub Date : 2023-06-01 DOI:10.2478/acss-2023-0012

Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur

{"title":"Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis","authors":"Satwinder Singh, Harpreet Kaur, Rubal Kanozia, Gurpreet Kaur","doi":"10.2478/acss-2023-0012","DOIUrl":null,"url":null,"abstract":"Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"139 1","pages":"125 - 136"},"PeriodicalIF":0.8000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2023-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Machine learning based sentiment analysis is an interdisciplinary approach in opinion mining, particularly in the field of media and communication research. In spite of their different backgrounds, researchers have collaborated to test, train and again retest the machine learning approach to collect, analyse and withdraw a meaningful insight from large datasets. This research classifies the texts of micro-blog (tweets) into positive and negative responses about a particular phenomenon. The study also demonstrates the process of compilation of corpus for review of sentiments, cleaning the body of text to make it a meaningful text, find people’s emotions about it, and interpret the findings. Till date the public sentiment after abrogation of Article 370 has not been studied, which adds the novelty to this scientific study. This study includes the dataset collection from Twitter that comprises 66.7 % of positive tweets and 34.3 % of negative tweets of the people about the abrogation of Article 370. Experimental testing reveals that the proposed methodology is much more effective than the previously proposed methodology. This study focuses on comparison of unsupervised lexicon-based models (TextBlob, AFINN, Vader Sentiment) and supervised machine learning models (KNN, SVM, Random Forest and Naïve Bayes) for sentiment analysis. This is the first study with cyber public opinion over the abrogation of Article 370. Twitter data of more than 2 lakh tweets were collected by the authors. After cleaning, 29732 tweets were selected for analysis. As per the results among supervised learning, Random Forest performs the best, whereas among unsupervised learning TextBlob achieves the highest accuracy of 99 % and 88 %, respectively. Performance parameters of the proposed supervised machine learning models also surpass the result of the recent study performed in 2023 for sentiment analysis.

查看原文本刊更多论文

基于方面的情感分析的有监督和无监督机器学习算法的实证分析

基于机器学习的情感分析是一种跨学科的意见挖掘方法，特别是在媒体和传播研究领域。尽管他们的背景不同，但研究人员已经合作测试，训练和再次测试机器学习方法，以收集，分析并从大型数据集中提取有意义的见解。本研究将微博(tweets)文本分为对某一特定现象的积极回应和消极回应。本研究还展示了情感审查语料库的编制过程，清理文本主体使其成为有意义的文本，发现人们对文本的情感，并解释研究结果。到目前为止，还没有对废除第370条后的民意进行研究，这为这项科学研究增添了新颖性。本研究包括从Twitter收集的数据集，其中包括关于废除第370条的66.7%的正面推文和34.3%的负面推文。实验测试表明，所提出的方法比以前提出的方法更有效。本研究重点比较了用于情感分析的无监督基于词典的模型(TextBlob、AFINN、Vader Sentiment)和监督机器学习模型(KNN、SVM、Random Forest和Naïve Bayes)。这是首次就废除第370条进行网络舆论调查。作者收集了超过20万条推文的推特数据。清理后，选择29732条tweet进行分析。从结果来看，在监督学习中，Random Forest表现最好，而在无监督学习中，TextBlob的准确率最高，分别达到99%和88%。所提出的监督机器学习模型的性能参数也超过了最近在2023年进行的情感分析研究的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-

自引率

10.00%

发文量

审稿时长

30 weeks