An Empirical Evaluation of Adapting Hybrid Parameters for CNN-based Sentiment Analysis

IF 0.6 Q3 MULTIDISCIPLINARY SCIENCES

Pertanika Journal of Science and Technology Pub Date : 2024-04-01 DOI:10.47836/pjst.32.3.05

Mohammed Maree, Mujahed Eleyat, Shatha Rabayah

{"title":"An Empirical Evaluation of Adapting Hybrid Parameters for CNN-based Sentiment Analysis","authors":"Mohammed Maree, Mujahed Eleyat, Shatha Rabayah","doi":"10.47836/pjst.32.3.05","DOIUrl":null,"url":null,"abstract":"Sentiment analysis aims to understand human emotions and perceptions through various machine-learning pipelines. However, feature engineering and inherent semantic gap constraints often hinder conventional machine learning techniques and limit their accuracy. Newer neural network models have been proposed to automate the feature learning process and enrich learned features with word contextual embeddings to identify their semantic orientations to address these challenges. This article aims to analyze the influence of different factors on the accuracy of sentiment classification predictions by employing Feedforward and Convolutional Neural Networks. To assess the performance of these neural network models, we utilize four diverse real-world datasets, namely 50,000 movie reviews from IMDB, 10,662 sentences from LightSide Movie_Reviews, 300 public movie reviews, and 1,600,000 tweets extracted from Sentiment140. We experimentally investigate the impact of exploiting GloVe word embeddings on enriching feature vectors extracted from sentiment sentences. Findings indicate that using larger dimensions of GloVe word embeddings increases the sentiment classification accuracy. In particular, results demonstrate that the accuracy of the CNN with a larger feature map, a smaller filter window, and the ReLU activation function in the convolutional layer was 90.56% using the IMDB dataset. In comparison, it was 80.73% and 77.64% using the sentiment140 and the 300 sentiment sentences dataset, respectively. However, it is worth mentioning that, with large-size sentiment sentences (LightSide’s Movie Reviews) and using the same parameters, only a 64.44% level of accuracy was achieved.","PeriodicalId":46234,"journal":{"name":"Pertanika Journal of Science and Technology","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pertanika Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47836/pjst.32.3.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Sentiment analysis aims to understand human emotions and perceptions through various machine-learning pipelines. However, feature engineering and inherent semantic gap constraints often hinder conventional machine learning techniques and limit their accuracy. Newer neural network models have been proposed to automate the feature learning process and enrich learned features with word contextual embeddings to identify their semantic orientations to address these challenges. This article aims to analyze the influence of different factors on the accuracy of sentiment classification predictions by employing Feedforward and Convolutional Neural Networks. To assess the performance of these neural network models, we utilize four diverse real-world datasets, namely 50,000 movie reviews from IMDB, 10,662 sentences from LightSide Movie_Reviews, 300 public movie reviews, and 1,600,000 tweets extracted from Sentiment140. We experimentally investigate the impact of exploiting GloVe word embeddings on enriching feature vectors extracted from sentiment sentences. Findings indicate that using larger dimensions of GloVe word embeddings increases the sentiment classification accuracy. In particular, results demonstrate that the accuracy of the CNN with a larger feature map, a smaller filter window, and the ReLU activation function in the convolutional layer was 90.56% using the IMDB dataset. In comparison, it was 80.73% and 77.64% using the sentiment140 and the 300 sentiment sentences dataset, respectively. However, it is worth mentioning that, with large-size sentiment sentences (LightSide’s Movie Reviews) and using the same parameters, only a 64.44% level of accuracy was achieved.

查看原文本刊更多论文

为基于 CNN 的情感分析调整混合参数的经验评估

情感分析旨在通过各种机器学习管道了解人类的情感和认知。然而，特征工程和固有的语义差距限制往往会阻碍传统的机器学习技术，并限制其准确性。为了应对这些挑战，人们提出了更新的神经网络模型，以实现特征学习过程的自动化，并通过单词上下文嵌入来丰富所学特征，从而确定其语义取向。本文旨在通过使用前馈神经网络和卷积神经网络，分析不同因素对情感分类预测准确性的影响。为了评估这些神经网络模型的性能，我们使用了四个不同的真实世界数据集，即来自 IMDB 的 50,000 篇电影评论、来自 LightSide Movie_Reviews 的 10,662 个句子、300 篇公开电影评论以及从 Sentiment140 中提取的 1,600,000 篇推文。我们通过实验研究了利用 GloVe 词嵌入对丰富从情感句子中提取的特征向量的影响。研究结果表明，使用更大维度的 GloVe 词嵌入可以提高情感分类的准确性。特别是，结果表明，在使用 IMDB 数据集时，采用较大特征图、较小滤波窗口和卷积层 ReLU 激活函数的 CNN 的准确率为 90.56%。相比之下，使用 sentiment140 和 300 个情感句子数据集的准确率分别为 80.73% 和 77.64%。不过，值得一提的是，在使用大尺寸情感句子（LightSide 的电影评论）和相同参数时，准确率仅为 64.44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pertanika Journal of Science and Technology MULTIDISCIPLINARY SCIENCES-

CiteScore

1.50

自引率

16.70%

发文量

178

期刊介绍： Pertanika Journal of Science and Technology aims to provide a forum for high quality research related to science and engineering research. Areas relevant to the scope of the journal include: bioinformatics, bioscience, biotechnology and bio-molecular sciences, chemistry, computer science, ecology, engineering, engineering design, environmental control and management, mathematics and statistics, medicine and health sciences, nanotechnology, physics, safety and emergency management, and related fields of study.