Topic Modeling on WhatsApp User Reviews Using Latent Dirichlet Allocation

I. Kharisudin, Hera Masri'an
{"title":"Topic Modeling on WhatsApp User Reviews Using Latent Dirichlet Allocation","authors":"I. Kharisudin, Hera Masri'an","doi":"10.15294/sji.v9i1.34941","DOIUrl":null,"url":null,"abstract":"Abstract.Purpose: Topic modeling is a practical algorithm for identifying topics in text data. This study aims to find issues of WhatsApp user reviews using Latent Dirichlet Allocation (LDA) and describe the characteristics of each case.Method: We used 1710 WhatsApp user reviews written 7-13 August 2020 on Google Play. This research was conducted with a qualitative method consisting of five stages: problem identification, data retrieval, preprocessing, modeling, and analysis. The modeling stage consists of making a Document-Term Matrix (DTM), determining the number of iterations and topics, and building a model. We use perplexity as to the indicator in determining the number of iterations and topics. A lower perplexity value indicates a better model performance. The analysis phase includes observations on the top terms and documents to label and describe the characteristics of each topic. Result: Topic modeling produces word-topic and document-topic assignments. The word-topic assignment contains words with high probability (top terms). Document-topic assignment reveals documents that have a high probability (top documents). The topics most frequently discussed were voice and video calls with 104 reviews, 86 reviews of call quality, photo and video quality with 100 reviews, and voice messages with 75 reviews. Novelty: In this research, a topic model has been generated for a user review of the WhatsApp application using Latent Dirichlet Allocation. The number of iterations in the modeling was determined based on the observation of the perplexity value, instead of randomly assigning iterations.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v9i1.34941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract.Purpose: Topic modeling is a practical algorithm for identifying topics in text data. This study aims to find issues of WhatsApp user reviews using Latent Dirichlet Allocation (LDA) and describe the characteristics of each case.Method: We used 1710 WhatsApp user reviews written 7-13 August 2020 on Google Play. This research was conducted with a qualitative method consisting of five stages: problem identification, data retrieval, preprocessing, modeling, and analysis. The modeling stage consists of making a Document-Term Matrix (DTM), determining the number of iterations and topics, and building a model. We use perplexity as to the indicator in determining the number of iterations and topics. A lower perplexity value indicates a better model performance. The analysis phase includes observations on the top terms and documents to label and describe the characteristics of each topic. Result: Topic modeling produces word-topic and document-topic assignments. The word-topic assignment contains words with high probability (top terms). Document-topic assignment reveals documents that have a high probability (top documents). The topics most frequently discussed were voice and video calls with 104 reviews, 86 reviews of call quality, photo and video quality with 100 reviews, and voice messages with 75 reviews. Novelty: In this research, a topic model has been generated for a user review of the WhatsApp application using Latent Dirichlet Allocation. The number of iterations in the modeling was determined based on the observation of the perplexity value, instead of randomly assigning iterations.
使用潜在Dirichlet分配的WhatsApp用户评论主题建模
摘要:目的:主题建模是一种用于识别文本数据中主题的实用算法。本研究旨在使用潜在狄利克雷分配(LDA)发现WhatsApp用户评论的问题,并描述每种情况的特征。方法:我们使用了2020年8月7日至13日在Google Play上撰写的1710条WhatsApp用户评论。本研究采用定性方法,包括五个阶段:问题识别、数据检索、预处理、建模和分析。建模阶段包括制作文档术语矩阵(DTM)、确定迭代次数和主题以及构建模型。我们使用困惑作为确定迭代次数和主题的指标。困惑值越低表示模型性能越好。分析阶段包括对热门术语和文档的观察,以标记和描述每个主题的特征。结果:主题建模生成单词主题和文档主题分配。单词主题分配包含具有高概率的单词(顶部术语)。文档主题分配显示具有高概率的文档(顶级文档)。最常讨论的话题是语音和视频通话,有104条评论,86条通话质量评论,100条评论的照片和视频质量评论,以及75条评论的语音信息。新颖性:在这项研究中,使用Latent Dirichlet Allocation为WhatsApp应用程序的用户评论生成了一个主题模型。建模中的迭代次数是基于对困惑值的观察来确定的,而不是随机分配迭代次数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信