有效词向量对基于深度学习的在线评论主观分类的影响

Journal of Machine and Computing Pub Date : 2024-07-05 DOI:10.53759/7669/jmc202404069

Priya Kamath B, G. M., D. U, Ritika Nandi, S. Urolagin

{"title":"有效词向量对基于深度学习的在线评论主观分类的影响","authors":"Priya Kamath B, G. M., D. U, Ritika Nandi, S. Urolagin","doi":"10.53759/7669/jmc202404069","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis tasks are made considerably simpler by extracting subjective statements from online reviews, thereby reducing the overhead of the classifiers. The review dataset encompasses both subjective and objective sentences, where subjective writing expresses the author's opinions, and objective text presents factual information. Assessing the subjectivity of review statements involves categorizing them as objective or subjective. The effectiveness of word vectors plays a crucial role in this process, as they capture the semantics and contextual cues of a subjective language. This study investigates the significance of employing sophisticated word vector representations to enhance the detection of subjective reviews. Several methodologies for generating word vectors have been investigated, encompassing both conventional approaches, such as Word2Vec and Global Vectors for word representation, and recent innovations, such as like Bidirectional Encoder Representations from Transformers (BERT), ALBERT, and Embeddings from Language Models. These neural word embeddings were applied using Keras and Scikit-Learn. The analysis focuses on Cornell subjectivity review data within the restaurant domain, and metrics evaluating performance, such as accuracy, F1-score, recall, and precision, are assessed on a dataset containing subjective reviews. A wide range of conventional vector models and deep learning-based word embeddings are utilized for subjective review classification, frequently in combination with deep learning architectures like Long Short-Term Memory (LSTM). Notably, pre-trained BERT-base word embeddings exhibited exceptional accuracy of 96.4%, surpassing the performance of all other models considered in this study. It has been observed that BERT-base is expensive because of its larger structure.","PeriodicalId":516151,"journal":{"name":"Journal of Machine and Computing","volume":" 28","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impact of Effective Word Vectors on Deep Learning Based Subjective Classification of Online Reviews\",\"authors\":\"Priya Kamath B, G. M., D. U, Ritika Nandi, S. Urolagin\",\"doi\":\"10.53759/7669/jmc202404069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment Analysis tasks are made considerably simpler by extracting subjective statements from online reviews, thereby reducing the overhead of the classifiers. The review dataset encompasses both subjective and objective sentences, where subjective writing expresses the author's opinions, and objective text presents factual information. Assessing the subjectivity of review statements involves categorizing them as objective or subjective. The effectiveness of word vectors plays a crucial role in this process, as they capture the semantics and contextual cues of a subjective language. This study investigates the significance of employing sophisticated word vector representations to enhance the detection of subjective reviews. Several methodologies for generating word vectors have been investigated, encompassing both conventional approaches, such as Word2Vec and Global Vectors for word representation, and recent innovations, such as like Bidirectional Encoder Representations from Transformers (BERT), ALBERT, and Embeddings from Language Models. These neural word embeddings were applied using Keras and Scikit-Learn. The analysis focuses on Cornell subjectivity review data within the restaurant domain, and metrics evaluating performance, such as accuracy, F1-score, recall, and precision, are assessed on a dataset containing subjective reviews. A wide range of conventional vector models and deep learning-based word embeddings are utilized for subjective review classification, frequently in combination with deep learning architectures like Long Short-Term Memory (LSTM). Notably, pre-trained BERT-base word embeddings exhibited exceptional accuracy of 96.4%, surpassing the performance of all other models considered in this study. It has been observed that BERT-base is expensive because of its larger structure.\",\"PeriodicalId\":516151,\"journal\":{\"name\":\"Journal of Machine and Computing\",\"volume\":\" 28\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Machine and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.53759/7669/jmc202404069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53759/7669/jmc202404069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

通过从在线评论中提取主观语句，可以大大简化情感分析任务，从而减少分类器的开销。评论数据集包括主观句子和客观句子，其中主观文字表达了作者的观点，而客观文字则呈现了事实信息。要评估评论语句的主观性，就需要将其分为客观和主观两类。在这一过程中，词语向量的有效性起着至关重要的作用，因为它们能捕捉到主观语言的语义和语境线索。本研究探讨了采用复杂的词向量表示法来加强主观评论检测的意义。我们研究了几种生成词向量的方法，既包括传统方法，如用于词表示的 Word2Vec 和 Global Vectors，也包括最新的创新方法，如来自变换器的双向编码器表示（BERT）、ALBERT 和来自语言模型的嵌入。这些神经词嵌入使用 Keras 和 Scikit-Learn 进行。分析重点是餐厅领域中的康奈尔主观评论数据，并在包含主观评论的数据集上评估了准确率、F1 分数、召回率和精确度等性能评估指标。主观评论分类采用了多种传统向量模型和基于深度学习的词嵌入，并经常与长短期记忆（LSTM）等深度学习架构相结合。值得注意的是，预训练的基于 BERT 的词嵌入显示出 96.4% 的超高准确率，超过了本研究中考虑的所有其他模型。据观察，BERT-base 由于结构较大，因此成本较高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Impact of Effective Word Vectors on Deep Learning Based Subjective Classification of Online Reviews

Sentiment Analysis tasks are made considerably simpler by extracting subjective statements from online reviews, thereby reducing the overhead of the classifiers. The review dataset encompasses both subjective and objective sentences, where subjective writing expresses the author's opinions, and objective text presents factual information. Assessing the subjectivity of review statements involves categorizing them as objective or subjective. The effectiveness of word vectors plays a crucial role in this process, as they capture the semantics and contextual cues of a subjective language. This study investigates the significance of employing sophisticated word vector representations to enhance the detection of subjective reviews. Several methodologies for generating word vectors have been investigated, encompassing both conventional approaches, such as Word2Vec and Global Vectors for word representation, and recent innovations, such as like Bidirectional Encoder Representations from Transformers (BERT), ALBERT, and Embeddings from Language Models. These neural word embeddings were applied using Keras and Scikit-Learn. The analysis focuses on Cornell subjectivity review data within the restaurant domain, and metrics evaluating performance, such as accuracy, F1-score, recall, and precision, are assessed on a dataset containing subjective reviews. A wide range of conventional vector models and deep learning-based word embeddings are utilized for subjective review classification, frequently in combination with deep learning architectures like Long Short-Term Memory (LSTM). Notably, pre-trained BERT-base word embeddings exhibited exceptional accuracy of 96.4%, surpassing the performance of all other models considered in this study. It has been observed that BERT-base is expensive because of its larger structure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Machine and Computing

CiteScore

1.80

自引率

0.00%

发文量