Comparing Sentiment Analysis and Document Representation Methods of Amazon Reviews

Katic Tamara, Nemanja Milićević
{"title":"Comparing Sentiment Analysis and Document Representation Methods of Amazon Reviews","authors":"Katic Tamara, Nemanja Milićević","doi":"10.1109/SISY.2018.8524814","DOIUrl":null,"url":null,"abstract":"In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.","PeriodicalId":6647,"journal":{"name":"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)","volume":"33 1","pages":"000283-000286"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISY.2018.8524814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.
亚马逊评论的情感分析和文档表示方法比较
在过去的几年里,情绪分析取得了很大的进展。情感分析已经在几个应用程序中用于识别人、产品、品牌、服务等的意见,例如,可以改善公司的业务。其中一些应用程序声称具有比诸如词袋表示之类的信息检索方法更有效的文档表示模型。文档表示模型对解决词袋表示的一些限制越来越感兴趣。本文对亚马逊评论的几种情感分析和文档表示方法进行了比较。本文将传统的词袋、图袋及其TF-IDF变体模型与线性分类器(如Logistic回归和SVM)和深度学习模型(如基于词的卷积神经网络(ConvNets)和简单长短期记忆(LSTM)递归神经网络)相结合。测试了各种文档表示技术,如段落向量或使用预训练的Word2Vec和Glove词嵌入来计算文档中每个词的向量,并使用元素平均聚合词向量。研究表明,深度学习模型在我们的大数据集上比传统模型表现得更好。LSTM的准确率最高,为95.55%。随着训练集规模的增加,深度学习模型通常比传统模型工作得更好。我们表现最好的模型可以用于未来零售商店产品评论的自动情感分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信