Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain

UNP Journal of Statistics and Data Science Pub Date : 2024-02-25 DOI:10.24036/ujsds/vol2-iss1/147

Cindy Caterine, Syafriandi Yolanda, Yenni Kurniawati, Dina Fitria

{"title":"Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain","authors":"Cindy Caterine, Syafriandi Yolanda, Yenni Kurniawati, Dina Fitria","doi":"10.24036/ujsds/vol2-iss1/147","DOIUrl":null,"url":null,"abstract":"DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"42 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol2-iss1/147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.

查看原文本刊更多论文

使用基于信息增益的奈夫贝叶斯分类器算法对 Google Play 商店中的 DANA 应用程序评论进行情感分析

DANA 是一个数字支付平台，提供各种功能，方便用户在线支付、转账和充值余额。DANA 应用程序用户提供了各种评论，其中既有建设性意见，也有批评性意见，这些评论对 DANA 应用程序开发人员来说都是宝贵的意见。本研究的目的是使用奈维贝叶斯分类器方法和信息增益特征选择，对 Google Play 商店服务上的 DANA 应用程序用户评论进行情感分类的结果进行评估。此外，本研究还旨在评估应用 IG 特征选择对所得模型性能的影响。在本研究中，评论被分为两类，即基于词典标签的正面评论和负面评论。此外，在建立模型之前，按照 80% 训练数据和 20% 测试数据的比例进行了数据加权、特征选择和数据划分。共有两个模型，即无特征选择模型（NBC 模型）和有特征选择模型（NBC-IG 模型）。评估结果表明，带有 1106 个特征的 NBC 模型表现良好，准确率为 82.91%，精确率为 83.96%，召回率为 90.23%。同时，具有 536 个特征的 NBC-IG 模型表现更佳，准确率为 85.09%，精确率为 85.79%，召回率为 92.09%。在 NBC 模型中应用 IG 值限制参数大于 0.01 的 IG 特征选择，成功地减少了 570 个特征，并提高了模型性能，准确率提高了 2.18%，精确率提高了 1.83%，召回率提高了 1.86%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

UNP Journal of Statistics and Data Science

自引率

0.00%

发文量