Sentiment Analysis on a Large Indonesian Product Review Dataset

Journal of Information Systems Engineering and Business Intelligence Pub Date : 2024-02-28 DOI:10.20473/jisebi.10.1.167-178

A. Romadhony, Said Al Faraby, Rita Rismala, U. N. Wisesty, Anditya Arifianto

{"title":"Sentiment Analysis on a Large Indonesian Product Review Dataset","authors":"A. Romadhony, Said Al Faraby, Rita Rismala, U. N. Wisesty, Anditya Arifianto","doi":"10.20473/jisebi.10.1.167-178","DOIUrl":null,"url":null,"abstract":"Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task.\nObjective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings.\nMethods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM.\nResult: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets.\nConclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches.\n \nKeywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":"242 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.10.1.167-178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task. Objective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings. Methods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM. Result: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets. Conclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches. Keywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis

查看原文本刊更多论文

印度尼西亚大型产品评论数据集的情感分析

背景：公开可用的大型数据集在自然语言处理/计算语言学研究领域的发展中发挥着重要作用。然而，到目前为止，只有少数大型印尼语数据集可用于研究目的，包括情感分析数据集，而情感分析被认为是最受欢迎的任务：这项工作的目的是利用各种特征和方法，对大型印尼语产品评论数据集进行情感分析。我们执行了两项任务：将评论分为三类（正面、负面、中性）和预测评分：情感分析是在 FDReview 数据集上进行的，该数据集包含 70 多万条评论。该分析将情感作为一个分类问题来处理，并采用了以下方法：多项式奈夫贝叶斯（MNB）、支持向量机（SVM）、LSTM 和 BiLSTM：实验结果表明，在使用传统方法进行性能比较时，MNB 在评级预测方面的性能优于 SVM，而 SVM 在评论分类任务中表现出更好的性能。此外，实验结果还表明，BiLSTM 方法在这两项任务中的表现均优于所有其他方法。此外，本研究还包括在平衡和非平衡小型样本数据集上进行的实验：对实验结果的分析表明，基于深度学习的方法仅在大型数据集设置中表现较好。来自小型平衡数据集的结果表明，与深度学习方法相比，传统的机器学习方法表现出了竞争力。关键词印尼评论数据集大型数据集评分预测情感分析

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Systems Engineering and Business Intelligence

CiteScore

0.30

自引率

0.00%

发文量