Muhammad Shahid Iqbal Malik, Aftab Nawaz, M. Jamjoom, Dmitry I. Ignatov
{"title":"Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness","authors":"Muhammad Shahid Iqbal Malik, Aftab Nawaz, M. Jamjoom, Dmitry I. Ignatov","doi":"10.3233/ida-230349","DOIUrl":null,"url":null,"abstract":"Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"9 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-230349","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms.
期刊介绍:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.