S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri
{"title":"Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis","authors":"S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri","doi":"10.1016/j.bdr.2024.100505","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100505"},"PeriodicalIF":3.5000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000807","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.
期刊介绍:
The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic.
The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.