Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language

2022 25th International Conference on Computer and Information Technology (ICCIT) Pub Date : 2022-12-17 DOI:10.1109/ICCIT57492.2022.10055515

R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah

{"title":"Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language","authors":"R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah","doi":"10.1109/ICCIT57492.2022.10055515","DOIUrl":null,"url":null,"abstract":"In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from \"1\" to \"5\". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from "1" to "5". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.

查看原文本刊更多论文

率洞察:不同机器学习和深度学习方法在孟加拉语产品评论评级预测中的比较研究

在这个数字营销的当代时代，电子商务已经成为最受欢迎的日常购物方式之一。自新冠肺炎疫情以来，网上购物行为永远改变为人与人之间的互动减少或根本没有。因此，电子商务企业越来越难以观察和评估市场趋势，特别是通过消费者行为分析来进行观察和评估。为了识别行为模式和客户评价-评级差异，对产品评论的广泛分析是一个重要的研究领域。缺乏基准语料库和语言处理技术，预测孟加拉语的评论评分已经变得越来越成问题。本文利用从电子商务网站DarazBD1收集的我们自己构建的数据集，深入分析了孟加拉语文本评论的产品评论评级预测方法。我们获得了带有标签的产品评论，这些标签被称为从“1”到“5”的五个情感等级的评级。值得注意的是，我们使用自动抓取系统建立了一个平衡良好的数据集，并且通过人工注释过程花费了大量的时间和精力来维持质量标准。探索了逻辑回归、随机森林、多项naïve贝叶斯、支持向量机等多种机器学习模型方法，SVM的分类准确率得分最高，达到78.63%。随后，将Word2Vec、FastText和GloVe与CNN、Bi-LSTM以及CNN和Bi-LSTM的组合三种深度神经网络(DNN)架构进行嵌入，CNN+Bi-LSTM在DNN架构中准确率最高，达到75.25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 25th International Conference on Computer and Information Technology (ICCIT)

自引率

0.00%

发文量