Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain

Cindy Caterine, Syafriandi Yolanda, Yenni Kurniawati, Dina Fitria
{"title":"Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain","authors":"Cindy Caterine, Syafriandi Yolanda, Yenni Kurniawati, Dina Fitria","doi":"10.24036/ujsds/vol2-iss1/147","DOIUrl":null,"url":null,"abstract":"DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"42 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol2-iss1/147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.
使用基于信息增益的奈夫贝叶斯分类器算法对 Google Play 商店中的 DANA 应用程序评论进行情感分析
DANA 是一个数字支付平台,提供各种功能,方便用户在线支付、转账和充值余额。DANA 应用程序用户提供了各种评论,其中既有建设性意见,也有批评性意见,这些评论对 DANA 应用程序开发人员来说都是宝贵的意见。本研究的目的是使用奈维贝叶斯分类器方法和信息增益特征选择,对 Google Play 商店服务上的 DANA 应用程序用户评论进行情感分类的结果进行评估。此外,本研究还旨在评估应用 IG 特征选择对所得模型性能的影响。在本研究中,评论被分为两类,即基于词典标签的正面评论和负面评论。此外,在建立模型之前,按照 80% 训练数据和 20% 测试数据的比例进行了数据加权、特征选择和数据划分。共有两个模型,即无特征选择模型(NBC 模型)和有特征选择模型(NBC-IG 模型)。评估结果表明,带有 1106 个特征的 NBC 模型表现良好,准确率为 82.91%,精确率为 83.96%,召回率为 90.23%。同时,具有 536 个特征的 NBC-IG 模型表现更佳,准确率为 85.09%,精确率为 85.79%,召回率为 92.09%。在 NBC 模型中应用 IG 值限制参数大于 0.01 的 IG 特征选择,成功地减少了 570 个特征,并提高了模型性能,准确率提高了 2.18%,精确率提高了 1.83%,召回率提高了 1.86%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信