Muhammad Shahid Iqbal Malik, Aftab Nawaz, M. Jamjoom, Dmitry I. Ignatov
{"title":"ELMo 嵌入和语义模型在预测评论有用性方面的效果","authors":"Muhammad Shahid Iqbal Malik, Aftab Nawaz, M. Jamjoom, Dmitry I. Ignatov","doi":"10.3233/ida-230349","DOIUrl":null,"url":null,"abstract":"Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"9 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness\",\"authors\":\"Muhammad Shahid Iqbal Malik, Aftab Nawaz, M. Jamjoom, Dmitry I. Ignatov\",\"doi\":\"10.3233/ida-230349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms.\",\"PeriodicalId\":50355,\"journal\":{\"name\":\"Intelligent Data Analysis\",\"volume\":\"9 2\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Data Analysis\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/ida-230349\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-230349","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
在线产品评论(OPR)是消费者在网上购物时交流产品体验的常用媒介。以往的研究利用基于频率、语言、元数据、可读性和评论者属性来调查在线产品评论的有用性。在本研究中,我们探讨了稳健的上下文词嵌入、主题和语言模型对预测 OPR 有用性的影响。此外,我们还采用了基于包装器的特征选择技术,从各类特征中选择有效的子集。该框架采用了五种特征生成技术,包括 word2vec、FastText、Global Vectors for Word Representation (GloVe)、Latent Dirichlet Allocation (LDA) 和 Embeddings from Language Models (ELMo)。建议的框架在两个亚马逊数据集(视频游戏和健康与个人护理)上进行了评估。结果表明,ELMo 模型的性能优于六个标准基线模型,包括经过微调的双向变换器编码器表示(BERT)模型。此外,ELMo 在两个数据集上的平均平方误差(MSE)分别为 0.0887 和 0.0786,而使用包装方法时的平均平方误差(MSE)分别为 0.0791 和 0.0708。与微调后的 BERT 模型相比,这两个数据集的 MSE 分别降低了 1.43% 和 1.63%。不过,LDA 模型的性能与微调 BERT 模型相当,但优于其他五个基线模型。通过发现产品评论的重要因素,所提出的框架展示了良好的泛化能力,可以在其他投票平台上进行评估。
Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness
Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms.
期刊介绍:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.