R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah
{"title":"Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language","authors":"R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah","doi":"10.1109/ICCIT57492.2022.10055515","DOIUrl":null,"url":null,"abstract":"In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from \"1\" to \"5\". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from "1" to "5". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.