Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)

A. Puspaningrum, D. Siahaan, C. Fatichah
{"title":"Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)","authors":"A. Puspaningrum, D. Siahaan, C. Fatichah","doi":"10.1109/ICITEED.2018.8534785","DOIUrl":null,"url":null,"abstract":"User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.
基于LDA相似度和TF-ICF的移动应用评论标注
用户评论挖掘吸引了许多研究人员分析和开发创新模型。这些模型为软件开发人员在软件维护和软件发展过程中做出决策提供了技术建议。其中一个建议是用户评论分类。有许多分类已经被广泛使用,即bug错误、特性请求和非信息性。有许多方法可以对用户评论进行分类。其中一种分类方法是潜狄利克雷分配(LDA)。LDA是一种能够映射文档中隐藏主题的主题建模方法。因此,将隐藏主题映射到类别的技术之一是计算隐藏主题与预定义能指术语表之间的术语相似度值。然而,每个类别有限的能指术语列表成为一个问题。同时,项频率-逆语料库频率(TF-ICF)能够提取集群上的重要项。为此,本文提出了一种将TF-ICF与基于相似度的LDA聚类(LDAS TF-ICF)相结合的方法来克服它。分类结果采用查全率、查全率和f1分进行计算。结果表明,该方法优于LDA。当使用75%的扩展词表时,LDAS TF-ICF的精度、召回率和dan f-measure值分别为0.564、0.507和0.491,表现最佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信