Empirical evaluation of Amazon fine food reviews using Text Mining

K. Harsha, S. Yuva Nitya, Sravani Kota, K. Satyanarayana, Jaya Lakshmi
{"title":"Empirical evaluation of Amazon fine food reviews using Text Mining","authors":"K. Harsha, S. Yuva Nitya, Sravani Kota, K. Satyanarayana, Jaya Lakshmi","doi":"10.1109/I2CT57861.2023.10126349","DOIUrl":null,"url":null,"abstract":"Approximately 1.6 million individuals use the e-commerce website “amazon” to buy things from a variety of categories, including food. Reviewing products by consumers who have already purchased them is beneficial to those who are considering doing so, however reviews can be either positive or negative. The buyer finds it difficult to read through such many evaluations before making a purchase, but machine learning ideas and training models make it possible. Our objective is to categorize the reviews based on the attributes that are present in the dataset in order to address issues like these. Redundancy is present in data when it is presented to us in its raw form. So, since evaluations with a score of 3 are regarded as impartial, we delete them along with redundancy. After that, we use the NLP tool kit (a column in the data set) to preprocess the text by removing any stop words (such as in, as, is, on, and punctuation), and we lowercase each letter. The suggested approach renders the text into machine-understandable language using word embedding techniques. Text processing is necessary because customer reviews written in language that is understood by humans cannot be read by machines. The data must be in a machine-readable language in order to apply any classification technique. We separate the data into train and test set after the preprocessing is complete. After the training is complete, we use this model on a test set of data to determine its accuracy. Next, we utilize classification methods like logistic regression and XG Boost to see how accurate our model is. This study’s conclusion involves using the model we developed to predict the review based on previous reviews. In this project, we build a model, feed it with existing reviews, apply it to upcoming reviews, and then forecast if the product is good or not. For this work we have taken the data set from Kaggle.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Approximately 1.6 million individuals use the e-commerce website “amazon” to buy things from a variety of categories, including food. Reviewing products by consumers who have already purchased them is beneficial to those who are considering doing so, however reviews can be either positive or negative. The buyer finds it difficult to read through such many evaluations before making a purchase, but machine learning ideas and training models make it possible. Our objective is to categorize the reviews based on the attributes that are present in the dataset in order to address issues like these. Redundancy is present in data when it is presented to us in its raw form. So, since evaluations with a score of 3 are regarded as impartial, we delete them along with redundancy. After that, we use the NLP tool kit (a column in the data set) to preprocess the text by removing any stop words (such as in, as, is, on, and punctuation), and we lowercase each letter. The suggested approach renders the text into machine-understandable language using word embedding techniques. Text processing is necessary because customer reviews written in language that is understood by humans cannot be read by machines. The data must be in a machine-readable language in order to apply any classification technique. We separate the data into train and test set after the preprocessing is complete. After the training is complete, we use this model on a test set of data to determine its accuracy. Next, we utilize classification methods like logistic regression and XG Boost to see how accurate our model is. This study’s conclusion involves using the model we developed to predict the review based on previous reviews. In this project, we build a model, feed it with existing reviews, apply it to upcoming reviews, and then forecast if the product is good or not. For this work we have taken the data set from Kaggle.
使用文本挖掘对亚马逊美食评论进行实证评价
大约有160万人使用电子商务网站“亚马逊”购买包括食品在内的各种商品。对于那些正在考虑购买产品的人来说,评论已经购买的产品是有益的,但是评论可以是正面的,也可以是负面的。我们的目标是根据数据集中存在的属性对评论进行分类,以解决类似的问题。当数据以原始形式呈现给我们时,冗余就存在于数据中。因此,由于得分为3的评估被认为是公正的,我们将它们连同冗余一起删除。之后,我们使用NLP工具包(数据集中的一列)通过删除任何停止词(例如in、as、is、on和标点符号)来预处理文本,并将每个字母小写。该方法使用词嵌入技术将文本转换为机器可理解的语言。文本处理是必要的,因为用人类能理解的语言写的客户评论不能被机器阅读。数据必须是机器可读的语言,以便应用任何分类技术。预处理完成后将数据分为训练集和测试集。训练完成后,我们在一组测试数据上使用该模型来确定其准确性。接下来,我们使用逻辑回归和XG Boost等分类方法来查看我们的模型有多准确。本研究的结论包括使用我们开发的模型来预测基于先前评论的评论。在这个项目中,我们构建了一个模型,为它提供现有的评论,将其应用于即将到来的评论,然后预测产品是好是坏。为了这项工作,我们从Kaggle获取了数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信