Aspect Based Sentiment Analysis: Restaurant Online Review Platform in Indonesia with Unsupervised Scraped Corpus in Indonesian Language

Samuel Mahatmaputra Tedjojuwono, Clement Neonardi
{"title":"Aspect Based Sentiment Analysis: Restaurant Online Review Platform in Indonesia with Unsupervised Scraped Corpus in Indonesian Language","authors":"Samuel Mahatmaputra Tedjojuwono, Clement Neonardi","doi":"10.1109/iccsai53272.2021.9609794","DOIUrl":null,"url":null,"abstract":"The paper has designed a dynamic dashboard that will show a summarized information of restaurants in Indonesia on four distinct metrics which are Food, Service, Ambience and Covid Safety. Each metrics shown will have their own ratings which shows the detailed score for each aspect of the restaurant. The data inside the dashboard have been developed by using semi supervised learning of aspect-based sentiment analysis approach. The idea is to analyze past reviews/comments of each restaurant in the current restaurant's online review platform and extract the sentiment as well as the aspect of each of the reviews. The restaurant lists and the reviews have been collected through web scraping method on one of the most used online review platforms in Indonesia which is Tripadvisor. Scraped data has been cleaned through several process of data pre-processing by utilizing Sastrawi and NLTK library for Indonesian languages. The machine learning tools that will extract the aspect and sentiments in every of the reviews will be built by applying Monkeylearn machine learning platform through APIs. Cleaned datasets have been imported into the platform for data annotations of model training to identify the set of words belongs in each aspect categories as well as their sentiment values. Although after reaching the end of the analysis, this paper has concluded that accuracy of the analysis may not be ideal due to lack of negative sentiment dataset being gathered which affects the model during the training process. In conclusion, the feature has successfully been built and implemented as well as deployed into a web server which supported by Ngrok services however, there are still more room for improvement regarding the analysis of the model.","PeriodicalId":426993,"journal":{"name":"2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccsai53272.2021.9609794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The paper has designed a dynamic dashboard that will show a summarized information of restaurants in Indonesia on four distinct metrics which are Food, Service, Ambience and Covid Safety. Each metrics shown will have their own ratings which shows the detailed score for each aspect of the restaurant. The data inside the dashboard have been developed by using semi supervised learning of aspect-based sentiment analysis approach. The idea is to analyze past reviews/comments of each restaurant in the current restaurant's online review platform and extract the sentiment as well as the aspect of each of the reviews. The restaurant lists and the reviews have been collected through web scraping method on one of the most used online review platforms in Indonesia which is Tripadvisor. Scraped data has been cleaned through several process of data pre-processing by utilizing Sastrawi and NLTK library for Indonesian languages. The machine learning tools that will extract the aspect and sentiments in every of the reviews will be built by applying Monkeylearn machine learning platform through APIs. Cleaned datasets have been imported into the platform for data annotations of model training to identify the set of words belongs in each aspect categories as well as their sentiment values. Although after reaching the end of the analysis, this paper has concluded that accuracy of the analysis may not be ideal due to lack of negative sentiment dataset being gathered which affects the model during the training process. In conclusion, the feature has successfully been built and implemented as well as deployed into a web server which supported by Ngrok services however, there are still more room for improvement regarding the analysis of the model.
基于面向的情感分析:印尼餐馆在线评论平台的无监督印尼语抓取语料库
该论文设计了一个动态仪表板,将显示印度尼西亚餐馆的四个不同指标的汇总信息,即食品、服务、氛围和新冠安全。所显示的每个指标都有自己的评分,显示了餐厅各个方面的详细评分。仪表板内部的数据是使用基于方面的情感分析方法的半监督学习开发的。这个想法是分析每家餐厅在当前的在线评论平台上过去的评论/评论,并提取每条评论的情感和方面。餐厅名单和评论是通过网络抓取方法在印度尼西亚最常用的在线评论平台之一猫途鹰上收集的。利用印尼语言的savastri和NLTK库,通过几个数据预处理过程对抓取数据进行清理。将通过api应用Monkeylearn机器学习平台构建提取每个评论中的方面和情感的机器学习工具。将清洗后的数据集导入平台进行模型训练的数据标注,识别各方面类别所属的词集及其情感值。虽然在分析结束后,本文得出结论,由于缺乏收集到的负面情绪数据集,在训练过程中影响了模型,因此分析的准确性可能并不理想。总之,该特性已经成功地构建和实现,并部署到Ngrok服务支持的web服务器中。然而,关于模型的分析,还有更多的改进空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信