Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research Pub Date : 2024-12-16 DOI:10.1016/j.bdr.2024.100505

S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri

{"title":"Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis","authors":"S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri","doi":"10.1016/j.bdr.2024.100505","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100505"},"PeriodicalIF":4.2000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000807","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.

查看原文本刊更多论文

基于位置注意力的双向深度堆叠自编码器，用于面向情感分析

随着互联网技术和社交网络的快速发展，网络上基于文本的信息的产生越来越多。为了简化自然语言处理（NLP）任务，分析提供的输入文本背后的情感是非常重要的。为了有效地分析情感的极性（积极、消极和中性），对文本中的极性进行分类是一项必不可少的工作。现有的一些研究试图根据文本输入中的情感对方面进行准确分类。然而，现有的方法由于方面覆盖率低、处理歧义语言效率低、特征提取不当、缺乏上下文理解和过拟合等问题而性能有限。因此，本研究旨在开发一种有效的词嵌入方案，采用一种新的混合深度学习技术，在社交媒体文本中进行基于方面的情感分析。首先，对收集到的原始输入文本数据进行预处理，通过启动标记化、词干提取、词序化、重复删除、停止词删除、空集删除和空行删除来减少不需要的数据。从预处理文本中提取所需信息使用三种不同的词级嵌入方法：基于评分词典的Word2Vec，手套建模和来自变形金刚的扩展双向编码器表示（E-BERT）。在提取足够的特征后，对这些方面进行分析，并通过一种新的基于位置-注意力的双向深度堆叠自动编码器（PA_BiDSAE）模型分类出准确的情感极性。在该分类中，BiLSTM网络与深度堆叠自编码器（DSAE）模型相结合，对情感进行分类。实验分析使用Python软件完成，并使用三个公开可用的数据集模拟所提出的模型：SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 （Laptop）和SemEval Challenge 2015 （Restaurant）。性能分析表明，所提出的混合深度学习模型在准确率、精密度、召回率、特异性、F1评分和kappa测度等方面都取得了较好的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.