{"title":"A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset","authors":"S. Huspi, Haisal Dauda Abubakar, M. Umar","doi":"10.11113/ijic.v12n1.344","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis is a Natural Language Processing (NLP) domain related to the identification or extraction of user sentiments or opinions from written language. Although the approaches to achieve the goals may vary, Machine Learning (ML) methods are gradually becoming the preferred method because of their ability to automatically draw useful insight from data regardless of their complexity. However, an important prerequisite for most ML algorithms to learn from text data is to encode them into numerical vectors. Popular approaches to this include word level representation methods TF-IDF, distributed word representations (word2vec) and distributed document representations (doc2vec). Each of these methods has demonstrated remarkable success in representing the encoded text, however we found that no method has been set to be excellence in all tasks. Motivated by this challenge, an improved scheme of pairwise fusion are proposed for sentiment classification of book reviews. In the experimental findings, Artificial Neural Networks (ANN) and Logistic Regression (LR) classifiers showed that the proposed scheme improved the performance compared to the single method vectorization method. We see that TF-IDF-word2vec performed best among other methods with a mean accuracy of 91.0% (ANN) and 92.5% (LR); showed an improvement of 0.7% and 0.2% respectively over TF-IDF which is the best single vector method. Thus, the proposed method can used as a compact alternative to the popular bag-of-n-gram models as it captures contextual information of encoded document with a less sparse data.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"16 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v12n1.344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Sentiment Analysis is a Natural Language Processing (NLP) domain related to the identification or extraction of user sentiments or opinions from written language. Although the approaches to achieve the goals may vary, Machine Learning (ML) methods are gradually becoming the preferred method because of their ability to automatically draw useful insight from data regardless of their complexity. However, an important prerequisite for most ML algorithms to learn from text data is to encode them into numerical vectors. Popular approaches to this include word level representation methods TF-IDF, distributed word representations (word2vec) and distributed document representations (doc2vec). Each of these methods has demonstrated remarkable success in representing the encoded text, however we found that no method has been set to be excellence in all tasks. Motivated by this challenge, an improved scheme of pairwise fusion are proposed for sentiment classification of book reviews. In the experimental findings, Artificial Neural Networks (ANN) and Logistic Regression (LR) classifiers showed that the proposed scheme improved the performance compared to the single method vectorization method. We see that TF-IDF-word2vec performed best among other methods with a mean accuracy of 91.0% (ANN) and 92.5% (LR); showed an improvement of 0.7% and 0.2% respectively over TF-IDF which is the best single vector method. Thus, the proposed method can used as a compact alternative to the popular bag-of-n-gram models as it captures contextual information of encoded document with a less sparse data.
期刊介绍:
The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly